Tools Module
The Tools module optimizes AI agent function calling by compressing tool schemas and responses. Achieves 57% average token reduction with 100% lossless compression.
SchemaCompressor
Compress tool/function schemas (OpenAI, Anthropic format) while preserving 100% of protocol fields.
prompt_refiner.tools.SchemaCompressor
Bases: Refiner
Compress tool schemas to save tokens while preserving functionality.
This operation compresses tool schema definitions (e.g., OpenAI function calling schemas) by removing documentation overhead while keeping all protocol-level fields intact.
What is modified: - description fields (truncated and cleaned) - title fields (removed if configured) - examples fields (removed if configured) - markdown formatting (removed if configured) - excessive whitespace
What is never modified: - name - type - properties - required - enum - Any other protocol-level fields
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drop_examples
|
bool
|
Remove examples fields (default: True) |
True
|
drop_titles
|
bool
|
Remove title fields (default: True) |
True
|
drop_markdown_formatting
|
bool
|
Remove markdown formatting (default: True) |
True
|
Example
from prompt_refiner import SchemaCompressor
tools = [{ ... "type": "function", ... "function": { ... "name": "search_flights", ... "description": "Search for available flights between two airports. " ... "This is a very long description with examples...", ... "parameters": { ... "type": "object", ... "properties": { ... "origin": { ... "type": "string", ... "description": "Origin airport IATA code, like
LAX" ... } ... } ... } ... } ... }]compressor = SchemaCompressor(drop_markdown_formatting=True) compressed = compressor.process(tools)
Markdown removed, tokens saved!
Use Cases
- Function Calling: Reduce token usage in OpenAI/Anthropic function schemas
- Agent Systems: Optimize tool definitions in agent prompts
- Cost Reduction: Save 20-60% tokens on verbose tool schemas
- Context Management: Fit more tools within token budget
Initialize schema compressor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drop_examples
|
bool
|
Remove examples fields (default: True) |
True
|
drop_titles
|
bool
|
Remove title fields (default: True) |
True
|
drop_markdown_formatting
|
bool
|
Remove markdown formatting (default: True) |
True
|
Source code in src/prompt_refiner/tools/schema_compressor.py
Functions
process
Process a single tool schema and return compressed JSON.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool
|
JSON
|
Tool schema dictionary (e.g., OpenAI function calling schema) |
required |
Returns:
| Type | Description |
|---|---|
JSON
|
Compressed tool schema dictionary |
Example
tool = { ... "type": "function", ... "function": { ... "name": "search", ... "description": "Search for items...", ... "parameters": {...} ... } ... } compressor = SchemaCompressor() compressed = compressor.process(tool)
Source code in src/prompt_refiner/tools/schema_compressor.py
Key Features
- 57% average reduction across 20 real-world API schemas
- 100% lossless - all protocol fields preserved (name, type, required, enum)
- 100% callable (20/20 validated) - all compressed schemas work correctly with OpenAI function calling
- 70%+ reduction on enterprise APIs (HubSpot, Salesforce, OpenAI)
- Works with OpenAI and Anthropic function calling format
Examples
from prompt_refiner import SchemaCompressor
# Basic usage
tool_schema = {
"type": "function",
"function": {
"name": "search_products",
"description": "Search for products in the e-commerce catalog...",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query..."}
},
"required": ["query"]
}
}
}
compressor = SchemaCompressor()
compressed = compressor.process(tool_schema)
# Result: 30-70% smaller, functionally identical
# With Pydantic
from pydantic import BaseModel, Field
from openai.pydantic_function_tool import pydantic_function_tool
from prompt_refiner import SchemaCompressor
class SearchInput(BaseModel):
query: str = Field(description="The search query...")
category: str | None = Field(default=None, description="Filter by category...")
# Generate and compress schema
tool_schema = pydantic_function_tool(SearchInput, name="search")
compressed = SchemaCompressor().process(tool_schema)
# Use with OpenAI
response = client.chat.completions.create(
model="gpt-4",
messages=[...],
tools=[compressed] # Compressed but functionally identical
)
# Batch compression
tools = [search_schema, create_schema, update_schema, delete_schema]
compressor = SchemaCompressor()
compressed_tools = [compressor.process(tool) for tool in tools]
# Use all compressed tools
response = client.chat.completions.create(
model="gpt-4",
messages=[...],
tools=compressed_tools
)
What Gets Compressed
✅ Optimized (Documentation):
- description fields (main source of verbosity)
- Redundant explanations and examples
- Marketing language and filler words
- Overly detailed parameter descriptions
❌ Never Modified (Protocol):
- Function name
- Parameter names
- Parameter type (string, number, boolean, etc.)
- required fields list
- enum values
- default values
- JSON structure
100% Lossless
SchemaCompressor never modifies protocol fields. The compressed schema is functionally identical to the original - LLMs will call the function with the same arguments.
ResponseCompressor
Compress verbose API/tool responses before sending back to the LLM.
prompt_refiner.tools.ResponseCompressor
ResponseCompressor(
drop_keys=None,
drop_null_fields=True,
drop_empty_fields=True,
max_depth=8,
add_truncation_marker=True,
truncation_suffix="… (truncated)",
)
Bases: Refiner
Compress tool responses to reduce token usage before sending to LLM.
This operation compresses JSON-like tool responses by removing verbose content while preserving essential information. Perfect for agent systems that need to fit tool outputs within LLM context windows.
What is modified: - Long strings (truncated to 512 chars) - Long lists (truncated to 16 items) - Debug/trace/log fields (removed if in drop_keys) - Null values (removed if drop_null_fields=True) - Empty containers (removed if drop_empty_fields=True) - Deep nesting (truncated beyond max_depth)
What is preserved: - Overall structure (dict keys, list order) - Essential data fields - Numbers and booleans (never modified) - Type information
IMPORTANT: Use this ONLY for LLM-facing payloads. Do NOT use compressed output for business logic or APIs that expect complete data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drop_keys
|
Set[str] | None
|
Field names to remove (default: debug, trace, logs, etc.) |
None
|
drop_null_fields
|
bool
|
Remove fields with None values (default: True) |
True
|
drop_empty_fields
|
bool
|
Remove empty strings/lists/dicts (default: True) |
True
|
max_depth
|
int
|
Maximum nesting depth before truncation (default: 8) |
8
|
add_truncation_marker
|
bool
|
Add markers when truncating (default: True) |
True
|
truncation_suffix
|
str
|
Suffix for truncated content (default: "… (truncated)") |
'… (truncated)'
|
Example
from prompt_refiner import ResponseCompressor
Compress API response before sending to LLM
compressor = ResponseCompressor() response = { ... "results": ["item1", "item2"] * 100, # 200 items ... "debug": {"trace": "..."}, ... "data": "x" * 1000 ... } compressed = compressor.process(response)
Result: results limited to 16 items, debug removed, data truncated to 512 chars
Use Cases
- Agent Systems: Compress verbose tool outputs before sending to LLM
- API Integration: Reduce token usage from third-party API responses
- Cost Optimization: Save 30-70% tokens on verbose tool responses
- Context Management: Fit more tool results within token budget
Initialize ResponseCompressor with compression settings.
Source code in src/prompt_refiner/tools/response_compressor.py
Functions
process
Compress tool response data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
response
|
JSON
|
Tool response as dict |
required |
Returns:
| Type | Description |
|---|---|
JSON
|
Compressed response as dict |
Example
response = { ... "results": [{"data": "x" * 1000}] * 100, ... "debug": {"trace": "..."} ... } compressor = ResponseCompressor() compressed = compressor.process(response)
Result: debug removed, results truncated, data shortened
Source code in src/prompt_refiner/tools/response_compressor.py
Key Features
- 25.8% average reduction on 20 real-world API responses (range: 14-53%)
- Removes debug/trace/logs fields automatically
- Truncates long strings (> 512 chars) and lists (> 16 items)
- Preserves essential data structure
- 52.7% reduction on verbose responses like Stripe Payment API
Examples
from prompt_refiner import ResponseCompressor
# Basic usage
api_response = {
"results": [
{"id": 1, "name": "Product A", "price": 29.99},
{"id": 2, "name": "Product B", "price": 39.99},
# ... 100 more results
],
"debug_info": {
"query_time_ms": 45,
"cache_hit": True,
"server": "api-01"
},
"trace_id": "abc123...",
"logs": ["Started query", "Fetched from DB", ...]
}
compressor = ResponseCompressor()
compact = compressor.process(api_response)
# Result: Essential data kept, debug/trace/logs removed, long lists truncated
# In agent workflow
from prompt_refiner import SchemaCompressor, ResponseCompressor
import openai
import json
# 1. Compress tool schema
tool_schema = {...}
compressed_schema = SchemaCompressor().process(tool_schema)
# 2. Call LLM with compressed schema
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Search for Python books"}],
tools=[compressed_schema]
)
# 3. Execute tool
tool_call = response.choices[0].message.tool_calls[0]
function_args = json.loads(tool_call.function.arguments)
tool_response = search_books(**function_args) # Verbose response
# 4. Compress response before sending to LLM
compact_response = ResponseCompressor().process(tool_response)
# 5. Continue conversation with compressed response
final_response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": "Search for Python books"},
response.choices[0].message,
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(compact_response) # Compressed
}
]
)
What Gets Compressed
Removed:
- Debug fields (debug_info, trace_id, _debug)
- Log fields (logs, log, trace, _trace)
- Excessive metadata
Truncated: - Long strings (> 512 chars) - Long lists (> 16 items) - Deep nesting (> 10 levels)
Preserved: - Essential data fields - JSON structure - Data types
Configuration
ResponseCompressor uses sensible hardcoded limits:
- String limit: 512 characters
- List limit: 16 items
- Max depth: 10 levels
- Drop nulls: True (automatic)
- Drop empty containers: True (automatic)
No Configuration Needed
ResponseCompressor uses hardcoded sensible defaults that work well for most API responses. No configuration required.
Cost Savings
Typical savings for different agent sizes using GPT-4 ($0.03/1k input tokens):
| Agent Size | Tools | Calls/Day | Monthly Savings | Annual Savings |
|---|---|---|---|---|
| Small | 5 | 100 | $44 | $528 |
| Medium | 10 | 500 | $541 | $6,492 |
| Large | 20 | 1,000 | $3,249 | $38,988 |
| Enterprise | 50 | 5,000 | $40,664 | $487,968 |
Based on 56.9% average schema reduction
Best Practices
1. Compress Schemas Once, Reuse
# At application startup
compressor = SchemaCompressor()
COMPRESSED_TOOLS = [
compressor.process(search_schema),
compressor.process(create_schema),
compressor.process(update_schema)
]
# In agent loop - reuse compressed schemas
response = client.chat.completions.create(
model="gpt-4",
messages=messages,
tools=COMPRESSED_TOOLS # Reuse
)
2. Always Compress Responses
# Always compress before sending to LLM
tool_response = api.search(query)
compact = ResponseCompressor().process(tool_response)
messages.append({"role": "tool", "content": json.dumps(compact)})
3. Monitor Token Savings
import tiktoken
encoder = tiktoken.encoding_for_model("gpt-4")
original_tokens = len(encoder.encode(json.dumps(original_schema)))
compressed_tokens = len(encoder.encode(json.dumps(compressed_schema)))
print(f"Saved {original_tokens - compressed_tokens} tokens")
print(f"Reduction: {(1 - compressed_tokens/original_tokens)*100:.1f}%")
Benchmark Results
See comprehensive benchmark results for detailed performance on 20 real-world API schemas.