Analyzer Module
The Analyzer module provides utilities for measuring optimization impact and tracking token savings.
TokenTracker
Context manager for tracking token usage before and after refinement operations.
prompt_refiner.analyzer.TokenTracker
Context manager for tracking token usage in refiners/pipelines.
Wraps any Refiner (operation, pipeline, or strategy) and tracks token counts before and after processing. Users provide their own token counting function for maximum flexibility.
Example
def count_tokens(text: str) -> int: ... # User's custom counter (tiktoken, character-based, etc.) ... return len(text) // 4
refiner = StripHTML() | NormalizeWhitespace() with TokenTracker(refiner, count_tokens) as tracker: ... result = tracker.process("
Hello World")print(tracker.stats)
Initialize token tracker.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
refiner
|
Refiner
|
Any Refiner (operation or pipeline) to track |
required |
token_counter
|
Callable[[str], int]
|
Function that counts tokens in text. Should accept a string and return an integer token count. |
required |
Source code in src/prompt_refiner/analyzer/token_tracker.py
Attributes
stats
property
Get token statistics.
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with: |
dict
|
|
dict
|
|
dict
|
|
dict
|
|
dict
|
Returns empty dict if process() hasn't been called yet. |
Example
with TokenTracker(StripHTML(), lambda t: len(t)//4) as tracker: ... tracker.process("
Test") ... stats = tracker.stats ... print(f"Saved {stats['saved_tokens']} tokens") Saved 3 tokens
original_text
property
Get the original input text.
Returns:
| Type | Description |
|---|---|
Optional[str]
|
The text passed to process(), or None if process() hasn't been called |
result
property
Get the processed result text.
Returns:
| Type | Description |
|---|---|
Optional[str]
|
The refined text returned by process(), or None if process() hasn't been called |
Functions
__enter__
__exit__
process
Process text through refiner and track tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text to process |
required |
Returns:
| Type | Description |
|---|---|
str
|
Processed text from the refiner |
Example
with TokenTracker(StripHTML(), lambda t: len(t)//4) as tracker: ... result = tracker.process("
Hello
") ... print(tracker.stats["saved_tokens"]) 3
Source code in src/prompt_refiner/analyzer/token_tracker.py
Basic Usage
from prompt_refiner import TokenTracker, StripHTML, character_based_counter
refiner = StripHTML()
with TokenTracker(refiner, character_based_counter) as tracker:
result = tracker.process("<div>Hello World</div>")
print(tracker.stats)
# {'original_tokens': 6, 'refined_tokens': 3, 'saved_tokens': 3, 'saving_percent': '50.0%'}
Pipeline Tracking
from prompt_refiner import (
TokenTracker,
StripHTML,
NormalizeWhitespace,
character_based_counter,
)
# Track entire pipeline
pipeline = StripHTML() | NormalizeWhitespace()
with TokenTracker(pipeline, character_based_counter) as tracker:
result = tracker.process("<p>Hello World </p>")
stats = tracker.stats
print(f"Saved {stats['saved_tokens']} tokens ({stats['saving_percent']})")
# Saved 4 tokens (66.7%)
Strategy Tracking
from prompt_refiner import (
TokenTracker,
StandardStrategy,
character_based_counter,
)
# Track preset strategies
strategy = StandardStrategy()
with TokenTracker(strategy, character_based_counter) as tracker:
result = tracker.process("<div>Messy input text</div>")
print(tracker.stats)
Token Counter Functions
Built-in token counting functions for different use cases.
character_based_counter
prompt_refiner.analyzer.character_based_counter
Estimate tokens using character-based approximation.
Uses conservative estimate: 1 token ≈ 4 characters. Fast but less accurate than model-specific counters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text to count tokens for |
required |
Returns:
| Type | Description |
|---|---|
int
|
Estimated token count |
Example
character_based_counter("Hello World") 3 character_based_counter("A longer sentence with many words") 9
Source code in src/prompt_refiner/analyzer/token_counters.py
Fast approximation using ~1 token ≈ 4 characters. Good for general use.
from prompt_refiner import character_based_counter
tokens = character_based_counter("Hello World")
print(tokens) # 3
word_based_counter
prompt_refiner.analyzer.word_based_counter
Estimate tokens using word count approximation.
Uses estimate: 1 token ≈ 1 word. Reasonable for English text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text to count tokens for |
required |
Returns:
| Type | Description |
|---|---|
int
|
Estimated token count based on word splits |
Example
word_based_counter("Hello World") 2 word_based_counter("A longer sentence with many words") 6
Source code in src/prompt_refiner/analyzer/token_counters.py
Simple approximation using ~1 token ≈ 1 word. Reasonable for English text.
from prompt_refiner import word_based_counter
tokens = word_based_counter("Hello World")
print(tokens) # 2
create_tiktoken_counter
prompt_refiner.analyzer.create_tiktoken_counter
Create a tiktoken-based counter for precise token counting.
Requires tiktoken to be installed. Use this for accurate token counts when working with specific models.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Model name for tokenizer selection (e.g., "gpt-4", "gpt-3.5-turbo") |
'gpt-4'
|
Returns:
| Type | Description |
|---|---|
Callable[[str], int]
|
Token counting function that uses tiktoken |
Raises:
| Type | Description |
|---|---|
ImportError
|
If tiktoken is not installed |
Example
counter = create_tiktoken_counter(model="gpt-4") counter("Hello World") 2
If tiktoken not installed:
try: ... counter = create_tiktoken_counter() ... except ImportError as e: ... print("Install tiktoken: pip install llm-prompt-refiner[token]")
Source code in src/prompt_refiner/analyzer/token_counters.py
Precise token counting using OpenAI's tiktoken. Requires optional dependency.
from prompt_refiner import create_tiktoken_counter
# Requires: pip install llm-prompt-refiner[token]
counter = create_tiktoken_counter(model="gpt-4")
tokens = counter("Hello World")
print(tokens) # Exact token count for GPT-4
Optional Dependency
create_tiktoken_counter requires tiktoken to be installed:
If tiktoken is not available, use character_based_counter or word_based_counter instead.
Common Use Cases
ROI Demonstration
Track token savings to demonstrate optimization value:
from prompt_refiner import (
TokenTracker,
StandardStrategy,
character_based_counter,
)
# Your messy input
original = "<div>Lots of HTML and extra whitespace</div>"
# Track optimization
strategy = StandardStrategy()
with TokenTracker(strategy, character_based_counter) as tracker:
result = tracker.process(original)
# Show ROI
stats = tracker.stats
print(f"Original: {stats['original_tokens']} tokens")
print(f"Refined: {stats['refined_tokens']} tokens")
print(f"Saved: {stats['saved_tokens']} tokens ({stats['saving_percent']})")
# Calculate cost savings (example: $0.03 per 1K tokens)
cost_per_token = 0.03 / 1000
savings = stats['saved_tokens'] * cost_per_token
print(f"Cost savings: ${savings:.4f} per request")
A/B Testing Different Strategies
Compare multiple optimization approaches:
from prompt_refiner import (
TokenTracker,
MinimalStrategy,
StandardStrategy,
AggressiveStrategy,
character_based_counter,
)
original = "Your test text here..."
# Test different strategies
strategies = {
"Minimal": MinimalStrategy(),
"Standard": StandardStrategy(),
"Aggressive": AggressiveStrategy(),
}
for name, strategy in strategies.items():
with TokenTracker(strategy, character_based_counter) as tracker:
result = tracker.process(original)
stats = tracker.stats
print(f"{name}: {stats['saved_tokens']} tokens saved ({stats['saving_percent']})")
Monitoring and Logging
Track optimization in production:
import logging
from prompt_refiner import (
TokenTracker,
StandardStrategy,
character_based_counter,
)
logger = logging.getLogger(__name__)
def process_user_input(text: str) -> str:
strategy = StandardStrategy()
with TokenTracker(strategy, character_based_counter) as tracker:
result = tracker.process(text)
stats = tracker.stats
logger.info(
f"Processed input: "
f"original={stats['original_tokens']} tokens, "
f"refined={stats['refined_tokens']} tokens, "
f"saved={stats['saved_tokens']} tokens ({stats['saving_percent']})"
)
return result
Packer Token Tracking
Packers have built-in token tracking support:
from prompt_refiner import MessagesPacker, character_based_counter
packer = MessagesPacker(
track_tokens=True,
token_counter=character_based_counter,
system="<div>You are helpful.</div>",
context=["<p>Doc 1</p>", "<p>Doc 2</p>"],
query="<span>What's the weather?</span>",
)
messages = packer.pack()
# Get token savings from automatic cleaning
stats = packer.token_stats
print(f"Saved {stats['saved_tokens']} tokens through automatic refinement")
Choosing a Token Counter
Which Counter Should I Use?
For development and testing:
- Use character_based_counter - fast and no dependencies
For production cost estimation:
- Use create_tiktoken_counter(model="gpt-4") for precise costs
- Requires: pip install llm-prompt-refiner[token]
For simple approximation:
- Use word_based_counter for English text
Tips
Context Manager Best Practice
Always use TokenTracker as a context manager with with statement:
Custom Token Counters
You can provide any callable that takes a string and returns an int: