Prompt Refiner
A lightweight Python library for building production LLM applications. Save 5-70% on API costs - from function calling optimization to RAG context management.
🎮 Try Interactive Demo
Launch the live demo on Hugging Face Spaces →
Experiment with text optimization and function calling compression in your browser - no installation required!
Overview
Prompt Refiner solves three core problems for production LLM applications:
- Function Calling Optimization - Compress tool schemas by 57% on average with 100% lossless compression
- Token Optimization - Clean dirty inputs (HTML, whitespace, PII) to reduce API costs by 5-15%
- Context Management - Pack system prompts, RAG docs, and chat history with smart priority-based selection
Perfect for AI agents, RAG applications, chatbots, and any production system that uses function calling or needs to manage LLM context windows efficiently.
Proven Effectiveness
Function Calling: Tested on 20 real-world API schemas (Stripe, Salesforce, HubSpot, Slack), achieving 56.9% average token reduction with 100% protocol field preservation and 100% callable (20/20 validated) with OpenAI function calling. Enterprise APIs see 70%+ reduction. A medium agent (10 tools, 500 calls/day) saves $541/month on GPT-4.
RAG & Text: Benchmarked on 30 real-world test cases, achieving 5-15% token reduction while maintaining 96-99% quality.
Performance: Processing overhead is < 0.5ms per 1k tokens - negligible compared to network and LLM latency.
Quick Start
Option 1: Preset Strategies (Easiest)
New in v0.1.5: Use benchmark-tested preset strategies for instant token optimization:
from prompt_refiner.strategy import MinimalStrategy, AggressiveStrategy
# Minimal: 4.3% reduction, 98.7% quality
refiner = MinimalStrategy().create_refiner()
cleaned = refiner.run("<div>Your HTML content</div>")
# Aggressive: 15% reduction, 96.4% quality
refiner = AggressiveStrategy(max_tokens=150).create_refiner()
cleaned = refiner.run(long_context)
Learn more about strategies →
Option 2: Custom Pipelines (Flexible)
Build custom cleaning pipelines with the pipe operator:
from prompt_refiner import StripHTML, NormalizeWhitespace, TruncateTokens
# Define a cleaning pipeline
pipeline = (
StripHTML()
| NormalizeWhitespace()
| TruncateTokens(max_tokens=1000, strategy="middle_out")
)
raw_input = "<div> User input with <b>lots</b> of spaces... </div>"
clean_prompt = pipeline.run(raw_input)
# Output: "User input with lots of spaces..."
Alternative: Fluent API
Prefer method chaining? Use Refiner().pipe():
6 Core Modules
Prompt Refiner is organized into 6 specialized modules:
Text Processing Operations
1. Cleaner - Clean Dirty Data
- StripHTML() - Remove HTML tags, convert to Markdown
- NormalizeWhitespace() - Collapse excessive whitespace
- FixUnicode() - Remove zero-width spaces and problematic Unicode
- JsonCleaner() - Strip nulls/empties from JSON, minify
2. Compressor - Reduce Size
- TruncateTokens() - Smart truncation with sentence boundaries
- Strategies:
"head","tail","middle_out"
- Strategies:
- Deduplicate() - Remove similar content (great for RAG)
Learn more about Compressor →
3. Scrubber - Security & Privacy
- RedactPII() - Automatically redact emails, phones, IPs, credit cards, URLs, SSNs
AI Agent & Function Calling
4. Tools - Function Calling Optimization (v0.1.6+)
Dramatically reduce token costs for AI agents by compressing tool schemas and responses:
- SchemaCompressor() - Compress tool/function schemas by 57% on average
- 100% lossless - all protocol fields preserved
- Works with OpenAI and Anthropic function calling
- Enterprise APIs: 70%+ reduction
- ResponseCompressor() - Compress verbose API responses by 30-70%
- Removes debug/trace/logs fields
- Truncates long strings and lists
- Preserves essential data structure
from prompt_refiner import SchemaCompressor, ResponseCompressor
from pydantic import BaseModel
# Compress tool schema (saves tokens on every request)
class SearchInput(BaseModel):
query: str
max_results: int = 10
tool_schema = pydantic_function_tool(SearchInput, name="search")
compressed = SchemaCompressor().process(tool_schema)
# Use compressed schema in OpenAI/Anthropic function calling
# Compress tool responses (saves tokens on responses)
verbose_response = {"results": [...], "debug_info": {...}}
compact = ResponseCompressor().process(verbose_response)
Context Budget Management
5. Packer - Intelligent Context Packing (v0.1.3+)
For RAG applications and chatbots, the Packer module manages context budgets with priority-based selection:
- MessagesPacker() - For chat completion APIs (OpenAI, Anthropic). Returns
List[Dict] - TextPacker() - For text completion APIs (Llama Base, GPT-3). Returns
str
Key Features:
- Smart priority-based selection (auto-prioritizes: system > query > context > history)
- JIT refinement with refine_with parameter
- Automatic format overhead calculation
- Semantic roles for clear intent
from prompt_refiner import MessagesPacker, StripHTML
packer = MessagesPacker(max_tokens=1000)
packer.add("You are helpful.", role="system")
# Clean RAG documents on-the-fly
packer.add(
"<div>RAG doc...</div>",
role="context",
refine_with=StripHTML()
)
packer.add("User question?", role="query")
messages = packer.pack() # Returns List[Dict] ready for chat APIs
5. Strategy - Preset Strategies (v0.1.5+)
For quick setup, use benchmark-tested preset strategies:
- MinimalStrategy - 4.3% reduction, 98.7% quality (HTML + Whitespace)
- StandardStrategy - 4.8% reduction, 98.4% quality (+ Deduplication)
- AggressiveStrategy - 15% reduction, 96.4% quality (+ Truncation)
from prompt_refiner.strategy import StandardStrategy
# Quick setup with preset
refiner = StandardStrategy().create_refiner()
cleaned = refiner.run("<div>Your HTML content</div>")
# Extend with additional operations
refiner.pipe(RedactPII(redact_types={"email"}))
Measurement & Analysis
Track optimization impact without transforming prompts:
- CountTokens() - Calculate token savings and ROI
- Estimation mode (default): Character-based approximation
- Precise mode (with tiktoken): Exact token counts
Complete Example
from prompt_refiner import (
# Core Modules
StripHTML, NormalizeWhitespace, FixUnicode, JsonCleaner, # Cleaner
Deduplicate, TruncateTokens, # Compressor
RedactPII, # Scrubber
# Measurement
CountTokens
)
original_text = """Your messy input here..."""
counter = CountTokens(original_text=original_text)
pipeline = (
# Clean
StripHTML(to_markdown=True)
| NormalizeWhitespace()
| FixUnicode()
# Compress
| Deduplicate(similarity_threshold=0.85)
| TruncateTokens(max_tokens=500, strategy="head")
# Secure
| RedactPII(redact_types={"email", "phone"})
# Analyze
| counter
)
result = pipeline.run(original_text)
print(counter.format_stats()) # Shows token savings
Next Steps
-
Get Started
Install Prompt Refiner and build your first pipeline in minutes
-
API Reference
Complete API documentation for all operations and modules
-
Examples
Browse practical examples for each module
-
Contributing
Learn how to contribute to the project