RAG Optimization
This guide covers how to use ragit’s optimization engine to find the best hyperparameters for your RAG pipeline.
Why Optimize?
RAG quality is highly sensitive to hyperparameters:
Chunk size: Too small loses context, too large dilutes relevance
Chunk overlap: Affects information continuity at boundaries
Top-k retrieval: More chunks = more context but also more noise
Model selection: Different models have different strengths
Manual tuning is time-consuming. ragit automates this process.
Setting Up an Experiment
Prepare Documents
First, prepare your documents:
from ragit import Document, load_directory
# Option 1: Create documents manually
documents = [
Document(
id="intro",
content="ragit is a RAG optimization library...",
metadata={"source": "intro.txt"}
),
Document(
id="install",
content="Install ragit using pip install ragit...",
metadata={"source": "install.txt"}
),
]
# Option 2: Load from files
from ragit import load_directory
docs = load_directory("docs/", pattern="*.txt")
documents = [
Document(id=doc.id, content=doc.content, metadata=doc.metadata)
for doc in docs
]
Create Benchmark Questions
Create questions with expected answers:
from ragit import BenchmarkQuestion
benchmark = [
BenchmarkQuestion(
question="How do I install ragit?",
ground_truth="Install using pip install ragit",
context="Installation documentation"
),
BenchmarkQuestion(
question="What is the default chunk size?",
ground_truth="512 characters",
context="Configuration section"
),
BenchmarkQuestion(
question="Is RAGAssistant thread-safe?",
ground_truth="No, RAGAssistant is not thread-safe",
context="Thread safety documentation"
),
BenchmarkQuestion(
question="What embedding models are supported?",
ground_truth="mxbai-embed-large, nomic-embed-text, all-minilm",
context="Model documentation"
),
BenchmarkQuestion(
question="How do I configure Ollama URL?",
ground_truth="Set the OLLAMA_BASE_URL environment variable",
context="Configuration section"
),
]
Guidelines for good benchmark questions:
Use 5-20 questions for meaningful results
Cover different aspects of your documents
Make ground truth answers clear and specific
Include both simple lookups and complex reasoning questions
Running the Experiment
Basic Experiment
Run with default search space:
from ragit import RagitExperiment, Document, BenchmarkQuestion
experiment = RagitExperiment(documents, benchmark)
results = experiment.run()
# Results are sorted by score (best first)
print(f"Tested {len(results)} configurations")
best = results[0]
print(f"\nBest configuration: {best.pattern_name}")
print(f"Final score: {best.final_score:.3f}")
print(f"Execution time: {best.execution_time:.1f}s")
Custom Search Space
Define your own hyperparameter ranges:
from ragit import RagitExperiment
experiment = RagitExperiment(documents, benchmark)
# Define custom search space
configs = experiment.define_search_space(
chunk_sizes=[256, 512, 1024, 2048],
chunk_overlaps=[0, 25, 50, 100],
num_chunks=[3, 5, 7, 10],
llm_models=["llama3", "mistral"],
embedding_models=["mxbai-embed-large"],
max_configs=50 # Limit total configurations
)
print(f"Search space: {len(configs)} configurations")
# Run with custom configs
results = experiment.run(configs=configs)
Limiting Configurations
For faster iteration, limit the search:
# Quick test with fewer configs
configs = experiment.define_search_space(
chunk_sizes=[512],
chunk_overlaps=[50],
num_chunks=[3, 5],
max_configs=10
)
results = experiment.run(configs=configs)
Understanding Results
Examining Results
from ragit import RagitExperiment
experiment = RagitExperiment(documents, benchmark)
results = experiment.run()
# Top 5 configurations
print("Top 5 Configurations:")
print("-" * 60)
for i, result in enumerate(results[:5], 1):
print(f"\n{i}. {result.pattern_name}")
print(f" Score: {result.final_score:.3f}")
print(f" Time: {result.execution_time:.1f}s")
print(f" Indexing: {result.indexing_params}")
print(f" Inference: {result.inference_params}")
# Detailed scores
for metric, values in result.scores.items():
print(f" {metric}: {values['mean']:.3f}")
Result Attributes
Each EvaluationResult contains:
result = results[0]
# Configuration name
result.pattern_name # "Pattern_1"
# Indexing hyperparameters
result.indexing_params # {"chunk_size": 512, "chunk_overlap": 50}
# Inference hyperparameters
result.inference_params # {"num_chunks": 5, "llm_model": "llama3"}
# Evaluation scores
result.scores # {"answer_correctness": {"mean": 0.85}, ...}
# Combined score
result.final_score # 0.82
# Time taken
result.execution_time # 45.3 seconds
Evaluation Metrics
Each configuration is scored on three metrics:
Answer Correctness: Semantic similarity between generated and expected answers
Context Relevance: How relevant the retrieved chunks are to the question
Faithfulness: Whether the answer is supported by the retrieved context
result = results[0]
print("Detailed Scores:")
print(f" Answer Correctness: {result.scores['answer_correctness']['mean']:.3f}")
print(f" Context Relevance: {result.scores['context_relevance']['mean']:.3f}")
print(f" Faithfulness: {result.scores['faithfulness']['mean']:.3f}")
Applying Optimal Settings
Use the best configuration in your application:
from ragit import RAGAssistant, RagitExperiment
# Run experiment
experiment = RagitExperiment(documents, benchmark)
results = experiment.run()
best = results[0]
# Extract optimal parameters
chunk_size = best.indexing_params["chunk_size"]
chunk_overlap = best.indexing_params["chunk_overlap"]
llm_model = best.inference_params.get("llm_model", "llama3")
# Create optimized assistant
assistant = RAGAssistant(
"docs/",
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
llm_model=llm_model
)
# Use the optimized assistant
answer = assistant.ask("Your question here")
Saving and Loading Results
Export results for analysis:
import json
from ragit import RagitExperiment
experiment = RagitExperiment(documents, benchmark)
results = experiment.run()
# Export to JSON
results_data = [result.to_dict() for result in results]
with open("experiment_results.json", "w") as f:
json.dump(results_data, f, indent=2)
# Export best config
best = results[0]
config = {
"chunk_size": best.indexing_params["chunk_size"],
"chunk_overlap": best.indexing_params["chunk_overlap"],
"num_chunks": best.inference_params.get("num_chunks", 5),
"llm_model": best.inference_params.get("llm_model"),
"final_score": best.final_score
}
with open("best_config.json", "w") as f:
json.dump(config, f, indent=2)
Advanced Optimization
Custom Provider
Use a custom provider for the experiment:
from ragit import RagitExperiment
from ragit.providers import OllamaProvider
# Custom provider with different settings
provider = OllamaProvider(
base_url="http://gpu-server:11434",
timeout=300
)
experiment = RagitExperiment(
documents,
benchmark,
provider=provider
)
results = experiment.run()
Progress Tracking
The experiment shows progress using tqdm:
Evaluating configurations: 100%|██████████| 24/24 [05:32<00:00, 13.83s/it]
For programmatic progress tracking:
from ragit import RagitExperiment
experiment = RagitExperiment(documents, benchmark)
# Access results incrementally
configs = experiment.define_search_space(max_configs=10)
for i, config in enumerate(configs):
result = experiment.evaluate_config(config)
print(f"Config {i+1}/10: {result.final_score:.3f}")
Optimization Tips
Start Broad, Then Narrow
# Phase 1: Broad search
configs = experiment.define_search_space(
chunk_sizes=[256, 512, 1024],
chunk_overlaps=[25, 50, 100],
num_chunks=[3, 5, 7],
max_configs=30
)
results = experiment.run(configs=configs)
# Find best chunk_size from Phase 1
best_chunk_size = results[0].indexing_params["chunk_size"]
# Phase 2: Fine-tune around best
fine_configs = experiment.define_search_space(
chunk_sizes=[best_chunk_size - 128, best_chunk_size, best_chunk_size + 128],
chunk_overlaps=[25, 50, 75, 100],
num_chunks=[4, 5, 6],
max_configs=20
)
final_results = experiment.run(configs=fine_configs)
Quality vs Speed Trade-offs
Smaller chunks: Faster embedding, more precise retrieval
Fewer num_chunks: Faster generation, less context
Smaller LLM: Faster responses, potentially lower quality
# Optimize for speed
fast_configs = experiment.define_search_space(
chunk_sizes=[256, 512],
num_chunks=[2, 3],
llm_models=["mistral"] # Fast model
)
# Optimize for quality
quality_configs = experiment.define_search_space(
chunk_sizes=[512, 1024],
num_chunks=[5, 7, 10],
llm_models=["llama3:70b"] # Large model
)
Representative Benchmark
Your benchmark should reflect real usage:
Include questions users actually ask
Cover different difficulty levels
Test edge cases and corner cases
Update benchmark as usage patterns change