Packer Examples
Advanced examples for managing context budgets with MessagesPacker and TextPacker.
Documentation Update in Progress
These examples are being updated for v0.2.2 which removed the model and max_tokens parameters. Packers now include all items without token budget constraints - let LLM APIs handle final token limits. See API Reference for current v0.2.2 syntax.
When to Use Packer
Packer is ideal for:
- RAG Applications: Pack multiple retrieved documents within token budget
- Chatbots: Manage conversation history with priorities
- Context Window Management: Fit critical information within model limits
- Multi-source Data: Combine system prompts, user input, and documents
Example 1: Basic RAG with MessagesPacker
Pack RAG documents for chat APIs with priority-based selection:
from prompt_refiner import (
MessagesPacker,
PRIORITY_SYSTEM,
PRIORITY_USER,
PRIORITY_HIGH,
PRIORITY_MEDIUM,
)
# Create packer with token budget
packer = MessagesPacker(max_tokens=500)
# System prompt (must include)
packer.add(
"Answer based on the provided context only.",
role="system",
priority=PRIORITY_SYSTEM
)
# RAG documents with different priorities
packer.add(
"Product X costs $99 and includes 1-year warranty.",
role="system",
priority=PRIORITY_HIGH # Most relevant document
)
packer.add(
"We offer free shipping on orders over $50.",
role="system",
priority=PRIORITY_MEDIUM # Less relevant document
)
packer.add(
"Customer reviews rate Product X 4.5/5 stars.",
role="system",
priority=PRIORITY_MEDIUM
)
# User query (must include)
packer.add(
"What is the price of Product X?",
role="user",
priority=PRIORITY_USER
)
# Pack into messages format
messages = packer.pack()
# Use directly with OpenAI
# response = client.chat.completions.create(
# model="gpt-4",
# messages=messages
# )
print(f"Packed {len(messages)} messages")
for msg in messages:
print(f"{msg['role']}: {msg['content'][:50]}...")
Example 2: RAG with Dirty HTML (JIT Refinement)
Clean web-scraped RAG documents on-the-fly:
from prompt_refiner import (
MessagesPacker,
PRIORITY_SYSTEM,
PRIORITY_USER,
PRIORITY_HIGH,
StripHTML,
NormalizeWhitespace
)
packer = MessagesPacker(max_tokens=800)
# System prompt
packer.add(
"You are a helpful product assistant.",
role="system",
priority=PRIORITY_SYSTEM
)
# RAG documents from web scraping (with HTML)
docs = [
"<div class='product'><h2>Product Features</h2><p> Waterproof design </p></div>",
"<html><body> <p>Available in <b>5 colors</b>: red, blue...</p> </body></html>",
"<article> Battery life: <strong> 48 hours </strong> continuous use </article>"
]
# Clean each document before adding (JIT refinement)
for doc in docs:
packer.add(
doc,
role="system",
priority=PRIORITY_HIGH,
refine_with=[StripHTML(), NormalizeWhitespace()] # Clean on-the-fly!
)
# User query
packer.add(
"What are the key features?",
role="user",
priority=PRIORITY_USER
)
messages = packer.pack()
# All HTML is automatically cleaned before packing
print("Cleaned messages:")
for msg in messages:
if msg['role'] == 'system' and 'Waterproof' in msg['content']:
print(f"Before: {docs[0][:50]}...")
print(f"After: {msg['content'][:50]}...")
Example 3: Chatbot with Conversation History
Manage conversation history with priorities - old messages can be dropped:
from prompt_refiner import (
MessagesPacker,
PRIORITY_SYSTEM,
PRIORITY_USER,
PRIORITY_HIGH,
PRIORITY_LOW,
)
packer = MessagesPacker(max_tokens=1000)
# System prompt (highest priority)
packer.add(
"You are a helpful customer support agent.",
role="system",
priority=PRIORITY_SYSTEM
)
# RAG: Current relevant documentation
packer.add(
"Return policy: 30-day money-back guarantee for all products.",
role="system",
priority=PRIORITY_HIGH
)
# Old conversation history (can be dropped if budget is tight)
old_conversation = [
{"role": "user", "content": "What are your business hours?"},
{"role": "assistant", "content": "We're open 9 AM - 5 PM EST, Monday-Friday."},
{"role": "user", "content": "Do you ship internationally?"},
{"role": "assistant", "content": "Yes, we ship to over 50 countries worldwide."}
]
packer.add_messages(old_conversation, priority=PRIORITY_LOW)
# Current user query (highest priority)
packer.add(
"What is your return policy?",
role="user",
priority=PRIORITY_USER
)
messages = packer.pack()
# If budget is tight, old history is dropped, but system prompt + current query are kept
print(f"Packed {len(messages)} messages (old history may be dropped)")
Example 4: TextPacker for Base Models
Use TextPacker with Llama or GPT-3 base models:
from prompt_refiner import (
TextPacker,
TextFormat,
PRIORITY_SYSTEM,
PRIORITY_HIGH,
PRIORITY_USER,
)
# Use MARKDOWN format for better structure
packer = TextPacker(
max_tokens=600,
text_format=TextFormat.MARKDOWN
)
# System instructions
packer.add(
"You are a QA assistant. Answer based on the context provided.",
role="system",
priority=PRIORITY_SYSTEM
)
# RAG documents (no role = treated as context)
packer.add(
"Prompt-refiner is a Python library for optimizing LLM inputs.",
priority=PRIORITY_HIGH
)
packer.add(
"It reduces token usage by 4-15% through cleaning and compression.",
priority=PRIORITY_HIGH
)
packer.add(
"The library has zero dependencies by default.",
priority=PRIORITY_HIGH
)
# User query
packer.add(
"What is prompt-refiner?",
role="user",
priority=PRIORITY_USER
)
# Pack into formatted text
prompt = packer.pack()
print(prompt)
# Output:
# ### INSTRUCTIONS:
# You are a QA assistant. Answer based on the context provided.
#
# ### CONTEXT:
# - Prompt-refiner is a Python library for optimizing LLM inputs.
# - It reduces token usage by 4-15% through cleaning and compression.
# - The library has zero dependencies by default.
#
# ### INPUT:
# What is prompt-refiner?
# Use with completion API
# response = client.completions.create(
# model="llama-2-70b",
# prompt=prompt
# )
Example 5: TextPacker with XML Format
Use XML format (Anthropic best practice for Claude base models):
from prompt_refiner import (
TextPacker,
TextFormat,
PRIORITY_SYSTEM,
PRIORITY_HIGH,
PRIORITY_USER,
)
packer = TextPacker(
max_tokens=500,
text_format=TextFormat.XML
)
packer.add(
"You are a code review assistant.",
role="system",
priority=PRIORITY_SYSTEM
)
packer.add(
"Code snippet: def hello(): return 'world'",
priority=PRIORITY_HIGH
)
packer.add(
"Please review this code for best practices.",
role="user",
priority=PRIORITY_USER
)
prompt = packer.pack()
print(prompt)
# Output:
# <system>
# You are a code review assistant.
# </system>
#
# <context>
# Code snippet: def hello(): return 'world'
# </context>
#
# <user>
# Please review this code for best practices.
# </user>
Example 6: Precise Mode for Maximum Token Utilization
Use precise mode with tiktoken for 100% token budget utilization:
from prompt_refiner import MessagesPacker, PRIORITY_SYSTEM, PRIORITY_USER
# Install tiktoken: pip install llm-prompt-refiner[token]
# Estimation mode (default): 10% safety buffer
packer_estimate = MessagesPacker(max_tokens=1000)
print(f"Estimation mode: {packer_estimate.effective_max_tokens} effective tokens")
# Output: Estimation mode: 900 effective tokens
# Precise mode: 100% budget utilization (no safety buffer)
packer_precise = MessagesPacker(max_tokens=1000, model="gpt-4")
print(f"Precise mode: {packer_precise.effective_max_tokens} effective tokens")
# Output: Precise mode: 997 effective tokens (1000 - 3 request overhead)
# Use precise mode for production to maximize token capacity
packer_precise.add("System prompt", role="system", priority=PRIORITY_SYSTEM)
packer_precise.add("User query", role="user", priority=PRIORITY_USER)
messages = packer_precise.pack()
Example 7: Inspection and Debugging
Inspect items before packing to understand token distribution:
from prompt_refiner import MessagesPacker, PRIORITY_SYSTEM, PRIORITY_HIGH, PRIORITY_USER
packer = MessagesPacker(max_tokens=500)
packer.add("System prompt here", role="system", priority=PRIORITY_SYSTEM)
packer.add("Document 1" * 50, role="system", priority=PRIORITY_HIGH)
packer.add("Document 2" * 50, role="system", priority=PRIORITY_HIGH)
packer.add("User query", role="user", priority=PRIORITY_USER)
# Inspect items before packing
items = packer.get_items()
print("Items before packing:")
for i, item in enumerate(items):
print(f"{i+1}. Priority: {item['priority']}, Tokens: {item['tokens']}, Role: {item['role']}")
# Pack and see which items fit
messages = packer.pack()
print(f"\nPacked {len(messages)}/{len(items)} items")
Key Takeaways
- Choose the Right Packer:
MessagesPackerfor chat APIs (OpenAI, Anthropic)-
TextPackerfor completion APIs (Llama Base, GPT-3) -
Set Priorities Correctly:
PRIORITY_SYSTEM(0): System prompts, absolute must-havePRIORITY_USER(10): User queries, criticalPRIORITY_HIGH(20): Core RAG documentsPRIORITY_MEDIUM(30): Supporting context-
PRIORITY_LOW(40): Old conversation history -
Use JIT Refinement:
- Clean dirty documents with
refine_withparameter -
Chain multiple operations:
refine_with=[StripHTML(), NormalizeWhitespace()] -
Optimize for Production:
- Use precise mode with
modelparameter for 100% token utilization - Choose appropriate text format for base models (MARKDOWN recommended)