Skip to content

Analyzer Module

The Analyzer module provides utilities for measuring optimization impact and tracking token savings.

TokenTracker

Context manager for tracking token usage before and after refinement operations.

prompt_refiner.analyzer.TokenTracker

TokenTracker(refiner, token_counter)

Context manager for tracking token usage in refiners/pipelines.

Wraps any Refiner (operation, pipeline, or strategy) and tracks token counts before and after processing. Users provide their own token counting function for maximum flexibility.

Example

def count_tokens(text: str) -> int: ... # User's custom counter (tiktoken, character-based, etc.) ... return len(text) // 4

refiner = StripHTML() | NormalizeWhitespace() with TokenTracker(refiner, count_tokens) as tracker: ... result = tracker.process("

Hello World
")

print(tracker.stats)

Initialize token tracker.

Parameters:

Name Type Description Default
refiner Refiner

Any Refiner (operation or pipeline) to track

required
token_counter Callable[[str], int]

Function that counts tokens in text. Should accept a string and return an integer token count.

required
Source code in src/prompt_refiner/analyzer/token_tracker.py
def __init__(
    self,
    refiner: Refiner,
    token_counter: Callable[[str], int],
):
    """
    Initialize token tracker.

    Args:
        refiner: Any Refiner (operation or pipeline) to track
        token_counter: Function that counts tokens in text.
            Should accept a string and return an integer token count.
    """
    self._refiner = refiner
    self._counter = token_counter
    self._original_tokens: Optional[int] = None
    self._refined_tokens: Optional[int] = None
    self._original_text: Optional[str] = None
    self._result: Optional[str] = None

Attributes

stats property
stats

Get token statistics.

Returns:

Type Description
dict

Dictionary with:

dict
  • original_tokens: Tokens before processing
dict
  • refined_tokens: Tokens after processing
dict
  • saved_tokens: Tokens saved (original - refined)
dict
  • saving_percent: Percentage saved as formatted string (e.g., "12.5%")
dict

Returns empty dict if process() hasn't been called yet.

Example

with TokenTracker(StripHTML(), lambda t: len(t)//4) as tracker: ... tracker.process("

Test
") ... stats = tracker.stats ... print(f"Saved {stats['saved_tokens']} tokens") Saved 3 tokens

original_text property
original_text

Get the original input text.

Returns:

Type Description
Optional[str]

The text passed to process(), or None if process() hasn't been called

result property
result

Get the processed result text.

Returns:

Type Description
Optional[str]

The refined text returned by process(), or None if process() hasn't been called

Functions

__enter__
__enter__()

Enter context - returns self for method access.

Source code in src/prompt_refiner/analyzer/token_tracker.py
def __enter__(self) -> "TokenTracker":
    """Enter context - returns self for method access."""
    return self
__exit__
__exit__(exc_type, exc_val, exc_tb)

Exit context - cleanup if needed.

Source code in src/prompt_refiner/analyzer/token_tracker.py
def __exit__(self, exc_type, exc_val, exc_tb):
    """Exit context - cleanup if needed."""
    # No cleanup needed, but required for context manager protocol
    pass
process
process(text)

Process text through refiner and track tokens.

Parameters:

Name Type Description Default
text str

Input text to process

required

Returns:

Type Description
str

Processed text from the refiner

Example

with TokenTracker(StripHTML(), lambda t: len(t)//4) as tracker: ... result = tracker.process("

Hello

") ... print(tracker.stats["saved_tokens"]) 3

Source code in src/prompt_refiner/analyzer/token_tracker.py
def process(self, text: str) -> str:
    """
    Process text through refiner and track tokens.

    Args:
        text: Input text to process

    Returns:
        Processed text from the refiner

    Example:
        >>> with TokenTracker(StripHTML(), lambda t: len(t)//4) as tracker:
        ...     result = tracker.process("<p>Hello</p>")
        ...     print(tracker.stats["saved_tokens"])
        3
    """
    # Track original
    self._original_text = text
    self._original_tokens = self._counter(text)

    # Process through refiner
    self._result = self._refiner.process(text)

    # Track refined
    self._refined_tokens = self._counter(self._result)

    return self._result

Basic Usage

from prompt_refiner import TokenTracker, StripHTML, character_based_counter

refiner = StripHTML()

with TokenTracker(refiner, character_based_counter) as tracker:
    result = tracker.process("<div>Hello World</div>")

print(tracker.stats)
# {'original_tokens': 6, 'refined_tokens': 3, 'saved_tokens': 3, 'saving_percent': '50.0%'}

Pipeline Tracking

from prompt_refiner import (
    TokenTracker,
    StripHTML,
    NormalizeWhitespace,
    character_based_counter,
)

# Track entire pipeline
pipeline = StripHTML() | NormalizeWhitespace()

with TokenTracker(pipeline, character_based_counter) as tracker:
    result = tracker.process("<p>Hello    World   </p>")

stats = tracker.stats
print(f"Saved {stats['saved_tokens']} tokens ({stats['saving_percent']})")
# Saved 4 tokens (66.7%)

Strategy Tracking

from prompt_refiner import (
    TokenTracker,
    StandardStrategy,
    character_based_counter,
)

# Track preset strategies
strategy = StandardStrategy()

with TokenTracker(strategy, character_based_counter) as tracker:
    result = tracker.process("<div>Messy    input   text</div>")

print(tracker.stats)

Token Counter Functions

Built-in token counting functions for different use cases.

character_based_counter

prompt_refiner.analyzer.character_based_counter

character_based_counter(text)

Estimate tokens using character-based approximation.

Uses conservative estimate: 1 token ≈ 4 characters. Fast but less accurate than model-specific counters.

Parameters:

Name Type Description Default
text str

Input text to count tokens for

required

Returns:

Type Description
int

Estimated token count

Example

character_based_counter("Hello World") 3 character_based_counter("A longer sentence with many words") 9

Source code in src/prompt_refiner/analyzer/token_counters.py
def character_based_counter(text: str) -> int:
    """
    Estimate tokens using character-based approximation.

    Uses conservative estimate: 1 token ≈ 4 characters.
    Fast but less accurate than model-specific counters.

    Args:
        text: Input text to count tokens for

    Returns:
        Estimated token count

    Example:
        >>> character_based_counter("Hello World")
        3
        >>> character_based_counter("A longer sentence with many words")
        9
    """
    if not text:
        return 0
    return math.ceil(len(text) / 4)

Fast approximation using ~1 token ≈ 4 characters. Good for general use.

from prompt_refiner import character_based_counter

tokens = character_based_counter("Hello World")
print(tokens)  # 3

word_based_counter

prompt_refiner.analyzer.word_based_counter

word_based_counter(text)

Estimate tokens using word count approximation.

Uses estimate: 1 token ≈ 1 word. Reasonable for English text.

Parameters:

Name Type Description Default
text str

Input text to count tokens for

required

Returns:

Type Description
int

Estimated token count based on word splits

Example

word_based_counter("Hello World") 2 word_based_counter("A longer sentence with many words") 6

Source code in src/prompt_refiner/analyzer/token_counters.py
def word_based_counter(text: str) -> int:
    """
    Estimate tokens using word count approximation.

    Uses estimate: 1 token ≈ 1 word.
    Reasonable for English text.

    Args:
        text: Input text to count tokens for

    Returns:
        Estimated token count based on word splits

    Example:
        >>> word_based_counter("Hello World")
        2
        >>> word_based_counter("A longer sentence with many words")
        6
    """
    if not text:
        return 0
    return len(text.split())

Simple approximation using ~1 token ≈ 1 word. Reasonable for English text.

from prompt_refiner import word_based_counter

tokens = word_based_counter("Hello World")
print(tokens)  # 2

create_tiktoken_counter

prompt_refiner.analyzer.create_tiktoken_counter

create_tiktoken_counter(model='gpt-4')

Create a tiktoken-based counter for precise token counting.

Requires tiktoken to be installed. Use this for accurate token counts when working with specific models.

Parameters:

Name Type Description Default
model str

Model name for tokenizer selection (e.g., "gpt-4", "gpt-3.5-turbo")

'gpt-4'

Returns:

Type Description
Callable[[str], int]

Token counting function that uses tiktoken

Raises:

Type Description
ImportError

If tiktoken is not installed

Example

counter = create_tiktoken_counter(model="gpt-4") counter("Hello World") 2

If tiktoken not installed:

try: ... counter = create_tiktoken_counter() ... except ImportError as e: ... print("Install tiktoken: pip install llm-prompt-refiner[token]")

Source code in src/prompt_refiner/analyzer/token_counters.py
def create_tiktoken_counter(model: str = "gpt-4") -> Callable[[str], int]:
    """
    Create a tiktoken-based counter for precise token counting.

    Requires tiktoken to be installed. Use this for accurate token counts
    when working with specific models.

    Args:
        model: Model name for tokenizer selection (e.g., "gpt-4", "gpt-3.5-turbo")

    Returns:
        Token counting function that uses tiktoken

    Raises:
        ImportError: If tiktoken is not installed

    Example:
        >>> counter = create_tiktoken_counter(model="gpt-4")
        >>> counter("Hello World")
        2

        >>> # If tiktoken not installed:
        >>> try:
        ...     counter = create_tiktoken_counter()
        ... except ImportError as e:
        ...     print("Install tiktoken: pip install llm-prompt-refiner[token]")
    """
    try:
        import tiktoken
    except ImportError:
        raise ImportError(
            "tiktoken is required for precise token counting. "
            "Install with: pip install llm-prompt-refiner[token]"
        )

    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        # Fallback to cl100k_base (used by gpt-4, gpt-3.5-turbo)
        encoding = tiktoken.get_encoding("cl100k_base")

    def counter(text: str) -> int:
        """Count tokens using tiktoken encoding."""
        if not text:
            return 0
        return len(encoding.encode(text))

    return counter

Precise token counting using OpenAI's tiktoken. Requires optional dependency.

from prompt_refiner import create_tiktoken_counter

# Requires: pip install llm-prompt-refiner[token]
counter = create_tiktoken_counter(model="gpt-4")

tokens = counter("Hello World")
print(tokens)  # Exact token count for GPT-4

Optional Dependency

create_tiktoken_counter requires tiktoken to be installed:

pip install llm-prompt-refiner[token]

If tiktoken is not available, use character_based_counter or word_based_counter instead.

Common Use Cases

ROI Demonstration

Track token savings to demonstrate optimization value:

from prompt_refiner import (
    TokenTracker,
    StandardStrategy,
    character_based_counter,
)

# Your messy input
original = "<div>Lots of HTML and   extra   whitespace</div>"

# Track optimization
strategy = StandardStrategy()
with TokenTracker(strategy, character_based_counter) as tracker:
    result = tracker.process(original)

# Show ROI
stats = tracker.stats
print(f"Original: {stats['original_tokens']} tokens")
print(f"Refined: {stats['refined_tokens']} tokens")
print(f"Saved: {stats['saved_tokens']} tokens ({stats['saving_percent']})")

# Calculate cost savings (example: $0.03 per 1K tokens)
cost_per_token = 0.03 / 1000
savings = stats['saved_tokens'] * cost_per_token
print(f"Cost savings: ${savings:.4f} per request")

A/B Testing Different Strategies

Compare multiple optimization approaches:

from prompt_refiner import (
    TokenTracker,
    MinimalStrategy,
    StandardStrategy,
    AggressiveStrategy,
    character_based_counter,
)

original = "Your test text here..."

# Test different strategies
strategies = {
    "Minimal": MinimalStrategy(),
    "Standard": StandardStrategy(),
    "Aggressive": AggressiveStrategy(),
}

for name, strategy in strategies.items():
    with TokenTracker(strategy, character_based_counter) as tracker:
        result = tracker.process(original)

    stats = tracker.stats
    print(f"{name}: {stats['saved_tokens']} tokens saved ({stats['saving_percent']})")

Monitoring and Logging

Track optimization in production:

import logging
from prompt_refiner import (
    TokenTracker,
    StandardStrategy,
    character_based_counter,
)

logger = logging.getLogger(__name__)

def process_user_input(text: str) -> str:
    strategy = StandardStrategy()

    with TokenTracker(strategy, character_based_counter) as tracker:
        result = tracker.process(text)

    stats = tracker.stats
    logger.info(
        f"Processed input: "
        f"original={stats['original_tokens']} tokens, "
        f"refined={stats['refined_tokens']} tokens, "
        f"saved={stats['saved_tokens']} tokens ({stats['saving_percent']})"
    )

    return result

Packer Token Tracking

Packers have built-in token tracking support:

from prompt_refiner import MessagesPacker, character_based_counter

packer = MessagesPacker(
    track_tokens=True,
    token_counter=character_based_counter,
    system="<div>You are helpful.</div>",
    context=["<p>Doc 1</p>", "<p>Doc 2</p>"],
    query="<span>What's the weather?</span>",
)

messages = packer.pack()

# Get token savings from automatic cleaning
stats = packer.token_stats
print(f"Saved {stats['saved_tokens']} tokens through automatic refinement")

Choosing a Token Counter

Which Counter Should I Use?

For development and testing: - Use character_based_counter - fast and no dependencies

For production cost estimation: - Use create_tiktoken_counter(model="gpt-4") for precise costs - Requires: pip install llm-prompt-refiner[token]

For simple approximation: - Use word_based_counter for English text

Tips

Context Manager Best Practice

Always use TokenTracker as a context manager with with statement:

with TokenTracker(refiner, counter) as tracker:
    result = tracker.process(text)
# Stats available after processing
stats = tracker.stats

Custom Token Counters

You can provide any callable that takes a string and returns an int:

def my_custom_counter(text: str) -> int:
    # Your custom logic here
    return len(text) // 3  # Example: 1 token ≈ 3 chars

with TokenTracker(refiner, my_custom_counter) as tracker:
    result = tracker.process(text)

Access to Original and Result

TokenTracker provides properties to access the original and refined text:

with TokenTracker(refiner, counter) as tracker:
    result = tracker.process(text)

print(tracker.original_text)  # Original input
print(tracker.result)         # Refined output