Analyzer Module

The Analyzer module provides utilities for measuring optimization impact and tracking token savings.

TokenTracker

Context manager for tracking token usage before and after refinement operations.

prompt_refiner.analyzer.TokenTracker

TokenTracker(refiner, token_counter)

Context manager for tracking token usage in refiners/pipelines.

Wraps any Refiner (operation, pipeline, or strategy) and tracks token counts before and after processing. Users provide their own token counting function for maximum flexibility.

Example

def count_tokens(text: str) -> int: ... # User's custom counter (tiktoken, character-based, etc.) ... return len(text) // 4

refiner = StripHTML() | NormalizeWhitespace() with TokenTracker(refiner, count_tokens) as tracker: ... result = tracker.process("
Hello World
")

print(tracker.stats)

Initialize token tracker.

Parameters:

Name	Type	Description	Default
`refiner`	`Refiner`	Any Refiner (operation or pipeline) to track	required
`token_counter`	`Callable[[str], int]`	Function that counts tokens in text. Should accept a string and return an integer token count.	required

Source code in src/prompt_refiner/analyzer/token_tracker.py

def __init__(
    self,
    refiner: Refiner,
    token_counter: Callable[[str], int],
):
    """
    Initialize token tracker.

    Args:
        refiner: Any Refiner (operation or pipeline) to track
        token_counter: Function that counts tokens in text.
            Should accept a string and return an integer token count.
    """
    self._refiner = refiner
    self._counter = token_counter
    self._original_tokens: Optional[int] = None
    self._refined_tokens: Optional[int] = None
    self._original_text: Optional[str] = None
    self._result: Optional[str] = None

Attributes

stats `property`

stats

Get token statistics.

Returns:

Type	Description
`dict`	Dictionary with:
`dict`	original_tokens: Tokens before processing
`dict`	refined_tokens: Tokens after processing
`dict`	saved_tokens: Tokens saved (original - refined)
`dict`	saving_percent: Percentage saved as formatted string (e.g., "12.5%")
`dict`	Returns empty dict if process() hasn't been called yet.

Example

with TokenTracker(StripHTML(), lambda t: len(t)//4) as tracker: ... tracker.process("
Test
") ... stats = tracker.stats ... print(f"Saved {stats['saved_tokens']} tokens") Saved 3 tokens

original_text `property`

original_text

Get the original input text.

Returns:

Type	Description
`Optional[str]`	The text passed to process(), or None if process() hasn't been called

result `property`

result

Get the processed result text.

Returns:

Type	Description
`Optional[str]`	The refined text returned by process(), or None if process() hasn't been called

Functions

enter

__enter__()

Enter context - returns self for method access.

Source code in src/prompt_refiner/analyzer/token_tracker.py

def __enter__(self) -> "TokenTracker":
    """Enter context - returns self for method access."""
    return self

exit

__exit__(exc_type, exc_val, exc_tb)

Exit context - cleanup if needed.

Source code in src/prompt_refiner/analyzer/token_tracker.py

def __exit__(self, exc_type, exc_val, exc_tb):
    """Exit context - cleanup if needed."""
    # No cleanup needed, but required for context manager protocol
    pass

process

process(text)

Process text through refiner and track tokens.

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text to process	required

Returns:

Type	Description
`str`	Processed text from the refiner

Example

with TokenTracker(StripHTML(), lambda t: len(t)//4) as tracker: ... result = tracker.process("
Hello
") ... print(tracker.stats["saved_tokens"]) 3

Source code in src/prompt_refiner/analyzer/token_tracker.py

def process(self, text: str) -> str:
    """
    Process text through refiner and track tokens.

    Args:
        text: Input text to process

    Returns:
        Processed text from the refiner

    Example:
        >>> with TokenTracker(StripHTML(), lambda t: len(t)//4) as tracker:
        ...     result = tracker.process("<p>Hello</p>")
        ...     print(tracker.stats["saved_tokens"])
        3
    """
    # Track original
    self._original_text = text
    self._original_tokens = self._counter(text)

    # Process through refiner
    self._result = self._refiner.process(text)

    # Track refined
    self._refined_tokens = self._counter(self._result)

    return self._result

Basic Usage

from prompt_refiner import TokenTracker, StripHTML, character_based_counter

refiner = StripHTML()

with TokenTracker(refiner, character_based_counter) as tracker:
    result = tracker.process("<div>Hello World</div>")

print(tracker.stats)
# {'original_tokens': 6, 'refined_tokens': 3, 'saved_tokens': 3, 'saving_percent': '50.0%'}

Pipeline Tracking

from prompt_refiner import (
    TokenTracker,
    StripHTML,
    NormalizeWhitespace,
    character_based_counter,
)

# Track entire pipeline
pipeline = StripHTML() | NormalizeWhitespace()

with TokenTracker(pipeline, character_based_counter) as tracker:
    result = tracker.process("<p>Hello    World   </p>")

stats = tracker.stats
print(f"Saved {stats['saved_tokens']} tokens ({stats['saving_percent']})")
# Saved 4 tokens (66.7%)

Strategy Tracking

from prompt_refiner import (
    TokenTracker,
    StandardStrategy,
    character_based_counter,
)

# Track preset strategies
strategy = StandardStrategy()

with TokenTracker(strategy, character_based_counter) as tracker:
    result = tracker.process("<div>Messy    input   text</div>")

print(tracker.stats)

Token Counter Functions

Built-in token counting functions for different use cases.

character_based_counter

prompt_refiner.analyzer.character_based_counter

character_based_counter(text)

Estimate tokens using character-based approximation.

Uses conservative estimate: 1 token ≈ 4 characters. Fast but less accurate than model-specific counters.

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text to count tokens for	required

Returns:

Type	Description
`int`	Estimated token count

Example

character_based_counter("Hello World") 3 character_based_counter("A longer sentence with many words") 9

Source code in src/prompt_refiner/analyzer/token_counters.py

def character_based_counter(text: str) -> int:
    """
    Estimate tokens using character-based approximation.

    Uses conservative estimate: 1 token ≈ 4 characters.
    Fast but less accurate than model-specific counters.

    Args:
        text: Input text to count tokens for

    Returns:
        Estimated token count

    Example:
        >>> character_based_counter("Hello World")
        3
        >>> character_based_counter("A longer sentence with many words")
        9
    """
    if not text:
        return 0
    return math.ceil(len(text) / 4)

Fast approximation using ~1 token ≈ 4 characters. Good for general use.

from prompt_refiner import character_based_counter

tokens = character_based_counter("Hello World")
print(tokens)  # 3

word_based_counter

prompt_refiner.analyzer.word_based_counter

word_based_counter(text)

Estimate tokens using word count approximation.

Uses estimate: 1 token ≈ 1 word. Reasonable for English text.

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text to count tokens for	required

Returns:

Type	Description
`int`	Estimated token count based on word splits

Example

word_based_counter("Hello World") 2 word_based_counter("A longer sentence with many words") 6

Source code in src/prompt_refiner/analyzer/token_counters.py

def word_based_counter(text: str) -> int:
    """
    Estimate tokens using word count approximation.

    Uses estimate: 1 token ≈ 1 word.
    Reasonable for English text.

    Args:
        text: Input text to count tokens for

    Returns:
        Estimated token count based on word splits

    Example:
        >>> word_based_counter("Hello World")
        2
        >>> word_based_counter("A longer sentence with many words")
        6
    """
    if not text:
        return 0
    return len(text.split())

Simple approximation using ~1 token ≈ 1 word. Reasonable for English text.

from prompt_refiner import word_based_counter

tokens = word_based_counter("Hello World")
print(tokens)  # 2

create_tiktoken_counter

prompt_refiner.analyzer.create_tiktoken_counter

create_tiktoken_counter(model='gpt-4')

Create a tiktoken-based counter for precise token counting.

Requires tiktoken to be installed. Use this for accurate token counts when working with specific models.

Parameters:

Name	Type	Description	Default
`model`	`str`	Model name for tokenizer selection (e.g., "gpt-4", "gpt-3.5-turbo")	`'gpt-4'`

Returns:

Type	Description
`Callable[[str], int]`	Token counting function that uses tiktoken

Raises:

Type	Description
`ImportError`	If tiktoken is not installed

Example

counter = create_tiktoken_counter(model="gpt-4") counter("Hello World") 2

If tiktoken not installed:

try: ... counter = create_tiktoken_counter() ... except ImportError as e: ... print("Install tiktoken: pip install llm-prompt-refiner[token]")

Source code in src/prompt_refiner/analyzer/token_counters.py

def create_tiktoken_counter(model: str = "gpt-4") -> Callable[[str], int]:
    """
    Create a tiktoken-based counter for precise token counting.

    Requires tiktoken to be installed. Use this for accurate token counts
    when working with specific models.

    Args:
        model: Model name for tokenizer selection (e.g., "gpt-4", "gpt-3.5-turbo")

    Returns:
        Token counting function that uses tiktoken

    Raises:
        ImportError: If tiktoken is not installed

    Example:
        >>> counter = create_tiktoken_counter(model="gpt-4")
        >>> counter("Hello World")
        2

        >>> # If tiktoken not installed:
        >>> try:
        ...     counter = create_tiktoken_counter()
        ... except ImportError as e:
        ...     print("Install tiktoken: pip install llm-prompt-refiner[token]")
    """
    try:
        import tiktoken
    except ImportError:
        raise ImportError(
            "tiktoken is required for precise token counting. "
            "Install with: pip install llm-prompt-refiner[token]"
        )

    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        # Fallback to cl100k_base (used by gpt-4, gpt-3.5-turbo)
        encoding = tiktoken.get_encoding("cl100k_base")

    def counter(text: str) -> int:
        """Count tokens using tiktoken encoding."""
        if not text:
            return 0
        return len(encoding.encode(text))

    return counter

Precise token counting using OpenAI's tiktoken. Requires optional dependency.

from prompt_refiner import create_tiktoken_counter

# Requires: pip install llm-prompt-refiner[token]
counter = create_tiktoken_counter(model="gpt-4")

tokens = counter("Hello World")
print(tokens)  # Exact token count for GPT-4

Optional Dependency

create_tiktoken_counter requires tiktoken to be installed:

pip install llm-prompt-refiner[token]

If tiktoken is not available, use character_based_counter or word_based_counter instead.

Common Use Cases

ROI Demonstration

Track token savings to demonstrate optimization value:

from prompt_refiner import (
    TokenTracker,
    StandardStrategy,
    character_based_counter,
)

# Your messy input
original = "<div>Lots of HTML and   extra   whitespace</div>"

# Track optimization
strategy = StandardStrategy()
with TokenTracker(strategy, character_based_counter) as tracker:
    result = tracker.process(original)

# Show ROI
stats = tracker.stats
print(f"Original: {stats['original_tokens']} tokens")
print(f"Refined: {stats['refined_tokens']} tokens")
print(f"Saved: {stats['saved_tokens']} tokens ({stats['saving_percent']})")

# Calculate cost savings (example: $0.03 per 1K tokens)
cost_per_token = 0.03 / 1000
savings = stats['saved_tokens'] * cost_per_token
print(f"Cost savings: ${savings:.4f} per request")

A/B Testing Different Strategies

Compare multiple optimization approaches:

from prompt_refiner import (
    TokenTracker,
    MinimalStrategy,
    StandardStrategy,
    AggressiveStrategy,
    character_based_counter,
)

original = "Your test text here..."

# Test different strategies
strategies = {
    "Minimal": MinimalStrategy(),
    "Standard": StandardStrategy(),
    "Aggressive": AggressiveStrategy(),
}

for name, strategy in strategies.items():
    with TokenTracker(strategy, character_based_counter) as tracker:
        result = tracker.process(original)

    stats = tracker.stats
    print(f"{name}: {stats['saved_tokens']} tokens saved ({stats['saving_percent']})")

Monitoring and Logging

Track optimization in production:

import logging
from prompt_refiner import (
    TokenTracker,
    StandardStrategy,
    character_based_counter,
)

logger = logging.getLogger(__name__)

def process_user_input(text: str) -> str:
    strategy = StandardStrategy()

    with TokenTracker(strategy, character_based_counter) as tracker:
        result = tracker.process(text)

    stats = tracker.stats
    logger.info(
        f"Processed input: "
        f"original={stats['original_tokens']} tokens, "
        f"refined={stats['refined_tokens']} tokens, "
        f"saved={stats['saved_tokens']} tokens ({stats['saving_percent']})"
    )

    return result

Packer Token Tracking

Packers have built-in token tracking support:

from prompt_refiner import MessagesPacker, character_based_counter

packer = MessagesPacker(
    track_tokens=True,
    token_counter=character_based_counter,
    system="<div>You are helpful.</div>",
    context=["<p>Doc 1</p>", "<p>Doc 2</p>"],
    query="<span>What's the weather?</span>",
)

messages = packer.pack()

# Get token savings from automatic cleaning
stats = packer.token_stats
print(f"Saved {stats['saved_tokens']} tokens through automatic refinement")

Choosing a Token Counter

Which Counter Should I Use?

For development and testing: - Use character_based_counter - fast and no dependencies

For production cost estimation: - Use create_tiktoken_counter(model="gpt-4") for precise costs - Requires: pip install llm-prompt-refiner[token]

For simple approximation: - Use word_based_counter for English text

Tips

Context Manager Best Practice

Always use TokenTracker as a context manager with with statement:

with TokenTracker(refiner, counter) as tracker:
    result = tracker.process(text)
# Stats available after processing
stats = tracker.stats

Custom Token Counters

You can provide any callable that takes a string and returns an int:

def my_custom_counter(text: str) -> int:
    # Your custom logic here
    return len(text) // 3  # Example: 1 token ≈ 3 chars

with TokenTracker(refiner, my_custom_counter) as tracker:
    result = tracker.process(text)

Access to Original and Result

TokenTracker provides properties to access the original and refined text:

with TokenTracker(refiner, counter) as tracker:
    result = tracker.process(text)

print(tracker.original_text)  # Original input
print(tracker.result)         # Refined output

Analyzer Module

TokenTracker

prompt_refiner.analyzer.TokenTracker

Attributes

stats property

original_text property

result property

Functions

__enter__

__exit__

process

Basic Usage

Pipeline Tracking

Strategy Tracking

Token Counter Functions

character_based_counter

prompt_refiner.analyzer.character_based_counter

word_based_counter

prompt_refiner.analyzer.word_based_counter

create_tiktoken_counter

prompt_refiner.analyzer.create_tiktoken_counter

If tiktoken not installed:

Common Use Cases

ROI Demonstration

A/B Testing Different Strategies

Monitoring and Logging

Packer Token Tracking

Choosing a Token Counter

Tips

stats `property`

original_text `property`

result `property`

enter

exit