Skip to content

Strategy Module API Reference

The Strategy module provides benchmark-tested preset strategies for token optimization. Use these when you want quick savings without manually configuring individual operations.

Overview

Version 0.1.5+ introduces three preset strategies optimized for different use cases. Version 0.2.0 refactored strategies to inherit directly from Pipeline for a simpler API.

Strategy Token Reduction Quality Use Case
Minimal 4.3% 98.7% Maximum quality, minimal risk
Standard 4.8% 98.4% RAG contexts with duplicates
Aggressive 15% 96.4% Cost optimization, long contexts

Strategies now inherit from Pipeline, so you can use them directly without calling .create_refiner(). They're fully extensible with .pipe().

MinimalStrategy

Basic cleaning with minimal token reduction, prioritizing quality preservation.

prompt_refiner.strategy.MinimalStrategy

MinimalStrategy(
    strip_html=True, strip_html_to_markdown=False
)

Bases: Pipeline

Minimal strategy: Basic cleaning with minimal token reduction.

This strategy is itself a Pipeline, so you can use it directly or extend it.

Refiners: - StripHTML: Remove HTML tags (optional) - NormalizeWhitespace: Collapse excessive whitespace

Characteristics: - Token reduction: ~4.3% - Quality: 98.7% (cosine similarity) - Use case: When quality is paramount, minimal risk - Latency: 0.05ms per 1k tokens

Example

Use with defaults

strategy = MinimalStrategy() cleaned = strategy.run(text)

Customize operators

strategy = MinimalStrategy( ... strip_html_to_markdown=True ... ) cleaned = strategy.run(text)

Extend with additional operators

extended = MinimalStrategy().pipe(RedactPII()) cleaned = extended.run(text)

Initialize minimal strategy with configured operators.

Parameters:

Name Type Description Default
strip_html bool

Whether to include StripHTML operator (default: True)

True
strip_html_to_markdown bool

Convert HTML to Markdown instead of stripping (default: False)

False
Source code in src/prompt_refiner/strategy/minimal.py
def __init__(
    self,
    strip_html: bool = True,
    strip_html_to_markdown: bool = False,
):
    """
    Initialize minimal strategy with configured operators.

    Args:
        strip_html: Whether to include StripHTML operator (default: True)
        strip_html_to_markdown: Convert HTML to Markdown instead of stripping (default: False)
    """
    operations = []

    if strip_html:
        operations.append(StripHTML(to_markdown=strip_html_to_markdown))

    operations.append(NormalizeWhitespace())

    # Initialize Pipeline with the configured operators
    super().__init__(operations)

Functions

Operations

  • StripHTML() - Remove HTML tags
  • NormalizeWhitespace() - Collapse excessive whitespace

Example

from prompt_refiner.strategy import MinimalStrategy

# Use strategy directly (v0.2.0+)
strategy = MinimalStrategy()
cleaned = strategy.run("<div>  Your HTML content  </div>")
# Output: "Your HTML content"

# With Markdown conversion
strategy = MinimalStrategy(strip_html_to_markdown=True)
cleaned = strategy.run("<strong>bold</strong> text")
# Output: "**bold** text"

# Extend with additional operations
from prompt_refiner import RedactPII
extended = MinimalStrategy().pipe(RedactPII(redact_types={"email"}))
cleaned = extended.run(text)

StandardStrategy

Enhanced cleaning with deduplication for RAG contexts with potential duplicates.

prompt_refiner.strategy.StandardStrategy

StandardStrategy(
    strip_html=True,
    strip_html_to_markdown=False,
    deduplicate_method="jaccard",
    deduplicate_similarity_threshold=0.8,
    deduplicate_granularity="sentence",
)

Bases: Pipeline

Standard strategy: Cleaning plus deduplication.

This strategy is itself a Pipeline, so you can use it directly or extend it.

Refiners: - StripHTML: Remove HTML tags (optional) - NormalizeWhitespace: Collapse excessive whitespace - Deduplicate: Remove similar content

Characteristics: - Token reduction: ~4.8% - Quality: 98.4% (cosine similarity) - Use case: RAG contexts with potential duplicates - Latency: 0.25ms per 1k tokens

Example

Use with defaults

strategy = StandardStrategy() cleaned = strategy.run(text)

Customize operator parameters

strategy = StandardStrategy( ... strip_html_to_markdown=True, ... deduplicate_method="levenshtein", ... deduplicate_similarity_threshold=0.9, ... deduplicate_granularity="paragraph" ... ) cleaned = strategy.run(text)

Extend with additional operators

extended = StandardStrategy().pipe(TruncateTokens(max_tokens=500)) cleaned = extended.run(text)

Initialize standard strategy with configured operators.

Parameters:

Name Type Description Default
strip_html bool

Whether to include StripHTML operator (default: True)

True
strip_html_to_markdown bool

Convert HTML to Markdown instead of stripping (default: False)

False
deduplicate_method Literal['jaccard', 'levenshtein']

Deduplication method (default: "jaccard")

'jaccard'
deduplicate_similarity_threshold float

Similarity threshold (default: 0.8)

0.8
deduplicate_granularity Literal['sentence', 'paragraph']

Deduplication granularity (default: "sentence")

'sentence'
Source code in src/prompt_refiner/strategy/standard.py
def __init__(
    self,
    # Parameters to configure StripHTML operator
    strip_html: bool = True,
    strip_html_to_markdown: bool = False,
    # Parameters to configure Deduplicate operator
    deduplicate_method: Literal["jaccard", "levenshtein"] = "jaccard",
    deduplicate_similarity_threshold: float = 0.8,
    deduplicate_granularity: Literal["sentence", "paragraph"] = "sentence",
):
    """
    Initialize standard strategy with configured operators.

    Args:
        strip_html: Whether to include StripHTML operator (default: True)
        strip_html_to_markdown: Convert HTML to Markdown instead of stripping (default: False)
        deduplicate_method: Deduplication method (default: "jaccard")
        deduplicate_similarity_threshold: Similarity threshold (default: 0.8)
        deduplicate_granularity: Deduplication granularity (default: "sentence")
    """
    operations = []

    if strip_html:
        operations.append(StripHTML(to_markdown=strip_html_to_markdown))

    operations.append(NormalizeWhitespace())

    operations.append(
        Deduplicate(
            method=deduplicate_method,
            similarity_threshold=deduplicate_similarity_threshold,
            granularity=deduplicate_granularity,
        )
    )

    # Initialize Pipeline with the configured operators
    super().__init__(operations)

Functions

Operations

  • StripHTML() - Remove HTML tags
  • NormalizeWhitespace() - Collapse excessive whitespace
  • Deduplicate() - Remove similar content (sentence-level, 0.8 threshold)

Example

from prompt_refiner.strategy import StandardStrategy

# Use strategy directly (v0.2.0+)
strategy = StandardStrategy()
text = "<div>Hello world. Hello world. Goodbye world.</div>"
cleaned = strategy.run(text)
# Output: "Hello world. Goodbye world."  (duplicate removed)

# Custom similarity threshold
strategy = StandardStrategy(deduplicate_similarity_threshold=0.7)

# Alternative deduplication method
strategy = StandardStrategy(deduplicate_method="levenshtein")

AggressiveStrategy

Maximum token reduction with deduplication and truncation for cost optimization.

prompt_refiner.strategy.AggressiveStrategy

AggressiveStrategy(
    strip_html=True,
    strip_html_to_markdown=False,
    deduplicate_method="jaccard",
    deduplicate_similarity_threshold=0.7,
    deduplicate_granularity="sentence",
)

Bases: Pipeline

Aggressive strategy: Maximum token reduction through aggressive deduplication.

This strategy is itself a Pipeline, so you can use it directly or extend it.

Refiners: - StripHTML: Remove HTML tags (optional) - NormalizeWhitespace: Collapse excessive whitespace - Deduplicate: Aggressively remove similar content (threshold: 0.7)

Characteristics: - Token reduction: ~5-10% (higher with duplicate content) - Quality: 96-98% (cosine similarity) - Use case: Cost optimization with duplicate/redundant content - Latency: 0.25ms per 1k tokens

Note: For token budget control, use Packer's max_tokens parameter instead.

Example

Use with defaults

strategy = AggressiveStrategy() cleaned = strategy.run(text)

Customize operator parameters

strategy = AggressiveStrategy( ... strip_html_to_markdown=True, ... deduplicate_method="levenshtein", ... deduplicate_similarity_threshold=0.6, ... deduplicate_granularity="paragraph" ... ) cleaned = strategy.run(text)

Extend with additional operators

extended = AggressiveStrategy().pipe(RedactPII()) cleaned = extended.run(text)

Initialize aggressive strategy with configured operators.

Parameters:

Name Type Description Default
strip_html bool

Whether to include StripHTML operator (default: True)

True
strip_html_to_markdown bool

Convert HTML to Markdown instead of stripping (default: False)

False
deduplicate_method Literal['jaccard', 'levenshtein']

Deduplication method (default: "jaccard")

'jaccard'
deduplicate_similarity_threshold float

Similarity threshold for aggressive deduplication (default: 0.7)

0.7
deduplicate_granularity Literal['sentence', 'paragraph']

Deduplication granularity (default: "sentence")

'sentence'
Source code in src/prompt_refiner/strategy/aggressive.py
def __init__(
    self,
    # Parameters to configure StripHTML operator
    strip_html: bool = True,
    strip_html_to_markdown: bool = False,
    # Parameters to configure Deduplicate operator
    deduplicate_method: Literal["jaccard", "levenshtein"] = "jaccard",
    deduplicate_similarity_threshold: float = 0.7,
    deduplicate_granularity: Literal["sentence", "paragraph"] = "sentence",
):
    """
    Initialize aggressive strategy with configured operators.

    Args:
        strip_html: Whether to include StripHTML operator (default: True)
        strip_html_to_markdown: Convert HTML to Markdown instead of stripping (default: False)
        deduplicate_method: Deduplication method (default: "jaccard")
        deduplicate_similarity_threshold: Similarity threshold for aggressive deduplication
            (default: 0.7)
        deduplicate_granularity: Deduplication granularity (default: "sentence")
    """
    operations = []

    if strip_html:
        operations.append(StripHTML(to_markdown=strip_html_to_markdown))

    operations.append(NormalizeWhitespace())

    operations.append(
        Deduplicate(
            method=deduplicate_method,
            similarity_threshold=deduplicate_similarity_threshold,
            granularity=deduplicate_granularity,
        )
    )

    # Initialize Pipeline with the configured operators
    super().__init__(operations)

Functions

Operations

  • StripHTML() - Remove HTML tags
  • NormalizeWhitespace() - Collapse excessive whitespace
  • Deduplicate() - Remove similar content (sentence-level, 0.7 threshold)
  • TruncateTokens() - Limit to max_tokens (default: 150)

Example

from prompt_refiner.strategy import AggressiveStrategy

# Use strategy directly (v0.2.0+) with default truncate_max_tokens=150
strategy = AggressiveStrategy()
long_text = "word " * 100  # 100 words
cleaned = strategy.run(long_text)
# Output: Truncated to ~150 tokens with duplicates removed

# Custom max_tokens and truncation strategy
strategy = AggressiveStrategy(
    truncate_max_tokens=200,
    truncate_strategy="tail"  # Keep last 200 tokens
)

# More aggressive deduplication
strategy = AggressiveStrategy(
    truncate_max_tokens=100,
    deduplicate_similarity_threshold=0.6  # More aggressive duplicate detection
)

Creating Custom Strategies

Custom strategies can be created by inheriting from Pipeline:

from prompt_refiner import Pipeline, StripHTML, NormalizeWhitespace, RedactPII

class CustomStrategy(Pipeline):
    def __init__(self, redact_pii: bool = True):
        operations = [StripHTML(), NormalizeWhitespace()]
        if redact_pii:
            operations.append(RedactPII(redact_types={"email", "phone"}))
        super().__init__(operations)

# Use custom strategy directly
strategy = CustomStrategy(redact_pii=True)
cleaned = strategy.run(text)

Usage Patterns

Basic Usage (v0.2.0+)

from prompt_refiner.strategy import MinimalStrategy, StandardStrategy, AggressiveStrategy

# Quick start with minimal - use directly
strategy = MinimalStrategy()
cleaned = strategy.run(text)

# Standard for RAG with duplicates
strategy = StandardStrategy()
cleaned = strategy.run(rag_context)

# Aggressive for cost optimization
strategy = AggressiveStrategy(truncate_max_tokens=200)
cleaned = strategy.run(long_context)

Composition with Additional Operations

Strategies inherit from Pipeline, so you can extend them with .pipe():

from prompt_refiner.strategy import MinimalStrategy
from prompt_refiner import RedactPII, Deduplicate

# Start with minimal, add PII redaction
extended = MinimalStrategy().pipe(RedactPII(redact_types={"email"}))
cleaned = extended.run(text)

# Start with standard, add more aggressive deduplication
from prompt_refiner.strategy import StandardStrategy
extended = StandardStrategy().pipe(Deduplicate(similarity_threshold=0.6))
cleaned = extended.run(text)

Using .process() Method

Strategies also support the .process() method from the Refiner interface:

from prompt_refiner.strategy import MinimalStrategy

strategy = MinimalStrategy()
cleaned = strategy.process(text)  # Equivalent to strategy.run(text)

Choosing a Strategy

Minimal Strategy

Use when: - Quality is paramount - Minimal risk tolerance - Processing structured content - First time optimizing prompts

Avoid when: - Budget constraints are tight - Dealing with very long contexts - Content has significant duplication

Standard Strategy

Use when: - RAG contexts with potential duplicates - Balanced quality and savings needed - Processing web-scraped content - General-purpose optimization

Avoid when: - Context is already clean and unique - Maximum quality preservation required - Very tight token budgets

Aggressive Strategy

Use when: - Cost optimization is priority - Token budgets are tight - Processing very long contexts - Quality tolerance is lenient

Avoid when: - Quality cannot be compromised - Context is already short - Truncation would remove critical info

Configuration Reference (v0.2.0+)

MinimalStrategy Parameters

Parameter Type Default Description
strip_html bool True Whether to strip HTML tags
strip_html_to_markdown bool False Convert HTML to Markdown instead of stripping

StandardStrategy Parameters

Parameter Type Default Description
strip_html bool True Whether to strip HTML tags
strip_html_to_markdown bool False Convert HTML to Markdown instead of stripping
deduplicate_method Literal["jaccard", "levenshtein"] "jaccard" Deduplication algorithm
deduplicate_similarity_threshold float 0.8 Threshold for deduplication (0.0-1.0)
deduplicate_granularity Literal["sentence", "paragraph"] "sentence" Deduplication granularity

AggressiveStrategy Parameters

Parameter Type Default Description
truncate_max_tokens int 150 Maximum tokens to keep
truncate_strategy Literal["head", "tail", "middle_out"] "head" Which part of text to keep
strip_html bool True Whether to strip HTML tags
strip_html_to_markdown bool False Convert HTML to Markdown instead of stripping
deduplicate_method Literal["jaccard", "levenshtein"] "jaccard" Deduplication algorithm
deduplicate_similarity_threshold float 0.7 Threshold for deduplication (0.0-1.0)
deduplicate_granularity Literal["sentence", "paragraph"] "sentence" Deduplication granularity

See Also