Strategy Module API Reference

The Strategy module provides benchmark-tested preset strategies for token optimization. Use these when you want quick savings without manually configuring individual operations.

Overview

Version 0.1.5+ introduces three preset strategies optimized for different use cases. Version 0.2.0 refactored strategies to inherit directly from Pipeline for a simpler API.

Strategy	Token Reduction	Quality	Use Case
Minimal	4.3%	98.7%	Maximum quality, minimal risk
Standard	4.8%	98.4%	RAG contexts with duplicates
Aggressive	15%	96.4%	Cost optimization, long contexts

Strategies now inherit from Pipeline, so you can use them directly without calling .create_refiner(). They're fully extensible with .pipe().

MinimalStrategy

Basic cleaning with minimal token reduction, prioritizing quality preservation.

prompt_refiner.strategy.MinimalStrategy

MinimalStrategy(
    strip_html=True, strip_html_to_markdown=False
)

Bases: Pipeline

Minimal strategy: Basic cleaning with minimal token reduction.

This strategy is itself a Pipeline, so you can use it directly or extend it.

Refiners: - StripHTML: Remove HTML tags (optional) - NormalizeWhitespace: Collapse excessive whitespace

Characteristics: - Token reduction: ~4.3% - Quality: 98.7% (cosine similarity) - Use case: When quality is paramount, minimal risk - Latency: 0.05ms per 1k tokens

Example

Use with defaults

strategy = MinimalStrategy() cleaned = strategy.run(text)

Customize operators

strategy = MinimalStrategy( ... strip_html_to_markdown=True ... ) cleaned = strategy.run(text)

Extend with additional operators

extended = MinimalStrategy().pipe(RedactPII()) cleaned = extended.run(text)

Initialize minimal strategy with configured operators.

Parameters:

Name	Type	Description	Default
`strip_html`	`bool`	Whether to include StripHTML operator (default: True)	`True`
`strip_html_to_markdown`	`bool`	Convert HTML to Markdown instead of stripping (default: False)	`False`

Source code in src/prompt_refiner/strategy/minimal.py

def __init__(
    self,
    strip_html: bool = True,
    strip_html_to_markdown: bool = False,
):
    """
    Initialize minimal strategy with configured operators.

    Args:
        strip_html: Whether to include StripHTML operator (default: True)
        strip_html_to_markdown: Convert HTML to Markdown instead of stripping (default: False)
    """
    operations = []

    if strip_html:
        operations.append(StripHTML(to_markdown=strip_html_to_markdown))

    operations.append(NormalizeWhitespace())

    # Initialize Pipeline with the configured operators
    super().__init__(operations)

Functions

Operations

StripHTML() - Remove HTML tags
NormalizeWhitespace() - Collapse excessive whitespace

Example

from prompt_refiner.strategy import MinimalStrategy

# Use strategy directly (v0.2.0+)
strategy = MinimalStrategy()
cleaned = strategy.run("<div>  Your HTML content  </div>")
# Output: "Your HTML content"

# With Markdown conversion
strategy = MinimalStrategy(strip_html_to_markdown=True)
cleaned = strategy.run("<strong>bold</strong> text")
# Output: "**bold** text"

# Extend with additional operations
from prompt_refiner import RedactPII
extended = MinimalStrategy().pipe(RedactPII(redact_types={"email"}))
cleaned = extended.run(text)

StandardStrategy

Enhanced cleaning with deduplication for RAG contexts with potential duplicates.

prompt_refiner.strategy.StandardStrategy

StandardStrategy(
    strip_html=True,
    strip_html_to_markdown=False,
    deduplicate_method="jaccard",
    deduplicate_similarity_threshold=0.8,
    deduplicate_granularity="sentence",
)

Bases: Pipeline

Standard strategy: Cleaning plus deduplication.

This strategy is itself a Pipeline, so you can use it directly or extend it.

Refiners: - StripHTML: Remove HTML tags (optional) - NormalizeWhitespace: Collapse excessive whitespace - Deduplicate: Remove similar content

Characteristics: - Token reduction: ~4.8% - Quality: 98.4% (cosine similarity) - Use case: RAG contexts with potential duplicates - Latency: 0.25ms per 1k tokens

Example

Use with defaults

strategy = StandardStrategy() cleaned = strategy.run(text)

Customize operator parameters

strategy = StandardStrategy( ... strip_html_to_markdown=True, ... deduplicate_method="levenshtein", ... deduplicate_similarity_threshold=0.9, ... deduplicate_granularity="paragraph" ... ) cleaned = strategy.run(text)

Extend with additional operators

extended = StandardStrategy().pipe(TruncateTokens(max_tokens=500)) cleaned = extended.run(text)

Initialize standard strategy with configured operators.

Parameters:

Name	Type	Description	Default
`strip_html`	`bool`	Whether to include StripHTML operator (default: True)	`True`
`strip_html_to_markdown`	`bool`	Convert HTML to Markdown instead of stripping (default: False)	`False`
`deduplicate_method`	`Literal['jaccard', 'levenshtein']`	Deduplication method (default: "jaccard")	`'jaccard'`
`deduplicate_similarity_threshold`	`float`	Similarity threshold (default: 0.8)	`0.8`
`deduplicate_granularity`	`Literal['sentence', 'paragraph']`	Deduplication granularity (default: "sentence")	`'sentence'`

Source code in src/prompt_refiner/strategy/standard.py

def __init__(
    self,
    # Parameters to configure StripHTML operator
    strip_html: bool = True,
    strip_html_to_markdown: bool = False,
    # Parameters to configure Deduplicate operator
    deduplicate_method: Literal["jaccard", "levenshtein"] = "jaccard",
    deduplicate_similarity_threshold: float = 0.8,
    deduplicate_granularity: Literal["sentence", "paragraph"] = "sentence",
):
    """
    Initialize standard strategy with configured operators.

    Args:
        strip_html: Whether to include StripHTML operator (default: True)
        strip_html_to_markdown: Convert HTML to Markdown instead of stripping (default: False)
        deduplicate_method: Deduplication method (default: "jaccard")
        deduplicate_similarity_threshold: Similarity threshold (default: 0.8)
        deduplicate_granularity: Deduplication granularity (default: "sentence")
    """
    operations = []

    if strip_html:
        operations.append(StripHTML(to_markdown=strip_html_to_markdown))

    operations.append(NormalizeWhitespace())

    operations.append(
        Deduplicate(
            method=deduplicate_method,
            similarity_threshold=deduplicate_similarity_threshold,
            granularity=deduplicate_granularity,
        )
    )

    # Initialize Pipeline with the configured operators
    super().__init__(operations)

Functions

Operations

StripHTML() - Remove HTML tags
NormalizeWhitespace() - Collapse excessive whitespace
Deduplicate() - Remove similar content (sentence-level, 0.8 threshold)

Example

from prompt_refiner.strategy import StandardStrategy

# Use strategy directly (v0.2.0+)
strategy = StandardStrategy()
text = "<div>Hello world. Hello world. Goodbye world.</div>"
cleaned = strategy.run(text)
# Output: "Hello world. Goodbye world."  (duplicate removed)

# Custom similarity threshold
strategy = StandardStrategy(deduplicate_similarity_threshold=0.7)

# Alternative deduplication method
strategy = StandardStrategy(deduplicate_method="levenshtein")

AggressiveStrategy

Maximum token reduction with deduplication and truncation for cost optimization.

prompt_refiner.strategy.AggressiveStrategy

AggressiveStrategy(
    strip_html=True,
    strip_html_to_markdown=False,
    deduplicate_method="jaccard",
    deduplicate_similarity_threshold=0.7,
    deduplicate_granularity="sentence",
)

Bases: Pipeline

Aggressive strategy: Maximum token reduction through aggressive deduplication.

This strategy is itself a Pipeline, so you can use it directly or extend it.

Refiners: - StripHTML: Remove HTML tags (optional) - NormalizeWhitespace: Collapse excessive whitespace - Deduplicate: Aggressively remove similar content (threshold: 0.7)

Characteristics: - Token reduction: ~5-10% (higher with duplicate content) - Quality: 96-98% (cosine similarity) - Use case: Cost optimization with duplicate/redundant content - Latency: 0.25ms per 1k tokens

Note: For token budget control, use Packer's max_tokens parameter instead.

Example

Use with defaults

strategy = AggressiveStrategy() cleaned = strategy.run(text)

Customize operator parameters

strategy = AggressiveStrategy( ... strip_html_to_markdown=True, ... deduplicate_method="levenshtein", ... deduplicate_similarity_threshold=0.6, ... deduplicate_granularity="paragraph" ... ) cleaned = strategy.run(text)

Extend with additional operators

extended = AggressiveStrategy().pipe(RedactPII()) cleaned = extended.run(text)

Initialize aggressive strategy with configured operators.

Parameters:

Name	Type	Description	Default
`strip_html`	`bool`	Whether to include StripHTML operator (default: True)	`True`
`strip_html_to_markdown`	`bool`	Convert HTML to Markdown instead of stripping (default: False)	`False`
`deduplicate_method`	`Literal['jaccard', 'levenshtein']`	Deduplication method (default: "jaccard")	`'jaccard'`
`deduplicate_similarity_threshold`	`float`	Similarity threshold for aggressive deduplication (default: 0.7)	`0.7`
`deduplicate_granularity`	`Literal['sentence', 'paragraph']`	Deduplication granularity (default: "sentence")	`'sentence'`

Source code in src/prompt_refiner/strategy/aggressive.py

def __init__(
    self,
    # Parameters to configure StripHTML operator
    strip_html: bool = True,
    strip_html_to_markdown: bool = False,
    # Parameters to configure Deduplicate operator
    deduplicate_method: Literal["jaccard", "levenshtein"] = "jaccard",
    deduplicate_similarity_threshold: float = 0.7,
    deduplicate_granularity: Literal["sentence", "paragraph"] = "sentence",
):
    """
    Initialize aggressive strategy with configured operators.

    Args:
        strip_html: Whether to include StripHTML operator (default: True)
        strip_html_to_markdown: Convert HTML to Markdown instead of stripping (default: False)
        deduplicate_method: Deduplication method (default: "jaccard")
        deduplicate_similarity_threshold: Similarity threshold for aggressive deduplication
            (default: 0.7)
        deduplicate_granularity: Deduplication granularity (default: "sentence")
    """
    operations = []

    if strip_html:
        operations.append(StripHTML(to_markdown=strip_html_to_markdown))

    operations.append(NormalizeWhitespace())

    operations.append(
        Deduplicate(
            method=deduplicate_method,
            similarity_threshold=deduplicate_similarity_threshold,
            granularity=deduplicate_granularity,
        )
    )

    # Initialize Pipeline with the configured operators
    super().__init__(operations)

Functions

Operations

StripHTML() - Remove HTML tags
NormalizeWhitespace() - Collapse excessive whitespace
Deduplicate() - Remove similar content (sentence-level, 0.7 threshold)
TruncateTokens() - Limit to max_tokens (default: 150)

Example

from prompt_refiner.strategy import AggressiveStrategy

# Use strategy directly (v0.2.0+) with default truncate_max_tokens=150
strategy = AggressiveStrategy()
long_text = "word " * 100  # 100 words
cleaned = strategy.run(long_text)
# Output: Truncated to ~150 tokens with duplicates removed

# Custom max_tokens and truncation strategy
strategy = AggressiveStrategy(
    truncate_max_tokens=200,
    truncate_strategy="tail"  # Keep last 200 tokens
)

# More aggressive deduplication
strategy = AggressiveStrategy(
    truncate_max_tokens=100,
    deduplicate_similarity_threshold=0.6  # More aggressive duplicate detection
)

Creating Custom Strategies

Custom strategies can be created by inheriting from Pipeline:

from prompt_refiner import Pipeline, StripHTML, NormalizeWhitespace, RedactPII

class CustomStrategy(Pipeline):
    def __init__(self, redact_pii: bool = True):
        operations = [StripHTML(), NormalizeWhitespace()]
        if redact_pii:
            operations.append(RedactPII(redact_types={"email", "phone"}))
        super().__init__(operations)

# Use custom strategy directly
strategy = CustomStrategy(redact_pii=True)
cleaned = strategy.run(text)

Usage Patterns

Basic Usage (v0.2.0+)

from prompt_refiner.strategy import MinimalStrategy, StandardStrategy, AggressiveStrategy

# Quick start with minimal - use directly
strategy = MinimalStrategy()
cleaned = strategy.run(text)

# Standard for RAG with duplicates
strategy = StandardStrategy()
cleaned = strategy.run(rag_context)

# Aggressive for cost optimization
strategy = AggressiveStrategy(truncate_max_tokens=200)
cleaned = strategy.run(long_context)

Composition with Additional Operations

Strategies inherit from Pipeline, so you can extend them with .pipe():

from prompt_refiner.strategy import MinimalStrategy
from prompt_refiner import RedactPII, Deduplicate

# Start with minimal, add PII redaction
extended = MinimalStrategy().pipe(RedactPII(redact_types={"email"}))
cleaned = extended.run(text)

# Start with standard, add more aggressive deduplication
from prompt_refiner.strategy import StandardStrategy
extended = StandardStrategy().pipe(Deduplicate(similarity_threshold=0.6))
cleaned = extended.run(text)

Using .process() Method

Strategies also support the .process() method from the Refiner interface:

from prompt_refiner.strategy import MinimalStrategy

strategy = MinimalStrategy()
cleaned = strategy.process(text)  # Equivalent to strategy.run(text)

Choosing a Strategy

Minimal Strategy

✅ Use when: - Quality is paramount - Minimal risk tolerance - Processing structured content - First time optimizing prompts

❌ Avoid when: - Budget constraints are tight - Dealing with very long contexts - Content has significant duplication

Standard Strategy

✅ Use when: - RAG contexts with potential duplicates - Balanced quality and savings needed - Processing web-scraped content - General-purpose optimization

❌ Avoid when: - Context is already clean and unique - Maximum quality preservation required - Very tight token budgets

Aggressive Strategy

✅ Use when: - Cost optimization is priority - Token budgets are tight - Processing very long contexts - Quality tolerance is lenient

❌ Avoid when: - Quality cannot be compromised - Context is already short - Truncation would remove critical info

Configuration Reference (v0.2.0+)

MinimalStrategy Parameters

Parameter	Type	Default	Description
`strip_html`	`bool`	`True`	Whether to strip HTML tags
`strip_html_to_markdown`	`bool`	`False`	Convert HTML to Markdown instead of stripping

StandardStrategy Parameters

Parameter	Type	Default	Description
`strip_html`	`bool`	`True`	Whether to strip HTML tags
`strip_html_to_markdown`	`bool`	`False`	Convert HTML to Markdown instead of stripping
`deduplicate_method`	`Literal["jaccard", "levenshtein"]`	`"jaccard"`	Deduplication algorithm
`deduplicate_similarity_threshold`	`float`	`0.8`	Threshold for deduplication (0.0-1.0)
`deduplicate_granularity`	`Literal["sentence", "paragraph"]`	`"sentence"`	Deduplication granularity

AggressiveStrategy Parameters

Parameter	Type	Default	Description
`truncate_max_tokens`	`int`	`150`	Maximum tokens to keep
`truncate_strategy`	`Literal["head", "tail", "middle_out"]`	`"head"`	Which part of text to keep
`strip_html`	`bool`	`True`	Whether to strip HTML tags
`strip_html_to_markdown`	`bool`	`False`	Convert HTML to Markdown instead of stripping
`deduplicate_method`	`Literal["jaccard", "levenshtein"]`	`"jaccard"`	Deduplication algorithm
`deduplicate_similarity_threshold`	`float`	`0.7`	Threshold for deduplication (0.0-1.0)
`deduplicate_granularity`	`Literal["sentence", "paragraph"]`	`"sentence"`	Deduplication granularity

Strategy Module API Reference

Overview

MinimalStrategy

prompt_refiner.strategy.MinimalStrategy

Use with defaults

Customize operators

Extend with additional operators

Functions

Operations

Example

StandardStrategy

prompt_refiner.strategy.StandardStrategy

Use with defaults

Customize operator parameters

Extend with additional operators

Functions

Operations

Example

AggressiveStrategy

prompt_refiner.strategy.AggressiveStrategy

Use with defaults

Customize operator parameters

Extend with additional operators

Functions

Operations

Example

Creating Custom Strategies

Usage Patterns

Basic Usage (v0.2.0+)

Composition with Additional Operations

Using .process() Method

Choosing a Strategy

Minimal Strategy

Standard Strategy

Aggressive Strategy

Configuration Reference (v0.2.0+)

MinimalStrategy Parameters

StandardStrategy Parameters

AggressiveStrategy Parameters

See Also