Experiment API
The experiment module provides RAG hyperparameter optimization.
RagitExperiment
- class ragit.RagitExperiment(documents: list[Document], benchmark: list[BenchmarkQuestion], embed_fn: Callable[[str], list[float]] | None = None, generate_fn: Callable[[...], str] | None = None, provider: BaseEmbeddingProvider | BaseLLMProvider | None = None)[source]
Bases:
objectRagit Experiment - Automatic RAG Hyperparameter Optimization.
This class orchestrates the optimization of RAG pipeline hyperparameters by systematically evaluating different configurations.
- Parameters:
documents (list[Document]) – Documents to use as the knowledge base.
benchmark (list[BenchmarkQuestion]) – Benchmark questions for evaluation.
embed_fn (Callable[[str], list[float]], optional) – Function that takes text and returns an embedding vector.
generate_fn (Callable, optional) – Function for text generation.
provider (BaseEmbeddingProvider, optional) – Provider for embeddings and LLM. If embed_fn is provided, this is ignored for embeddings but can be used for LLM.
- Raises:
ValueError – If neither embed_fn nor provider is provided.
Examples
>>> # With custom functions >>> experiment = RagitExperiment(docs, benchmark, embed_fn=my_embed, generate_fn=my_llm) >>> >>> # With explicit provider >>> from ragit.providers import OllamaProvider >>> experiment = RagitExperiment(docs, benchmark, provider=OllamaProvider()) >>> >>> results = experiment.run() >>> print(results[0].config) # Best configuration
- __init__(documents: list[Document], benchmark: list[BenchmarkQuestion], embed_fn: Callable[[str], list[float]] | None = None, generate_fn: Callable[[...], str] | None = None, provider: BaseEmbeddingProvider | BaseLLMProvider | None = None)[source]
- property provider: BaseEmbeddingProvider
Return the embedding provider (for backwards compatibility).
- define_search_space(chunk_sizes: list[int] | None = None, chunk_overlaps: list[int] | None = None, num_chunks_options: list[int] | None = None, embedding_models: list[str] | None = None, llm_models: list[str] | None = None) list[RAGConfig][source]
Define the hyperparameter search space.
- Parameters:
chunk_sizes (list[int], optional) – Chunk sizes to test. Default: [256, 512]
chunk_overlaps (list[int], optional) – Chunk overlaps to test. Default: [50, 100]
num_chunks_options (list[int], optional) – Number of chunks to retrieve. Default: [2, 3]
embedding_models (list[str], optional) – Embedding models to test. Default: [“default”]
llm_models (list[str], optional) – LLM models to test. Default: [“default”]
- Returns:
List of configurations to evaluate.
- Return type:
list[RAGConfig]
- evaluate_config(config: RAGConfig, verbose: bool = False) EvaluationResult[source]
Evaluate a single RAG configuration.
- Parameters:
config (RAGConfig) – Configuration to evaluate.
verbose (bool) – Print progress information.
- Returns:
Evaluation results for this configuration.
- Return type:
- run(configs: list[RAGConfig] | None = None, max_configs: int | None = None, verbose: bool = True) list[EvaluationResult][source]
Run the RAG optimization experiment.
- Parameters:
- Returns:
Results sorted by combined score (best first).
- Return type:
- get_best_config() EvaluationResult | None[source]
Get the best configuration from results.
Data Classes
Document
- class ragit.Document(id: str, content: str, metadata: dict[str, ~typing.Any]=<factory>)[source]
A document in the knowledge base.
from ragit import Document
doc = Document(
id="readme",
content="Your document content here...",
metadata={"source": "README.md", "version": "1.0"}
)
BenchmarkQuestion
- class ragit.BenchmarkQuestion(question: str, ground_truth: str, relevant_doc_ids: list[str] = <factory>)[source]
A benchmark question for evaluation.
from ragit import BenchmarkQuestion
question = BenchmarkQuestion(
question="How do I install the library?",
ground_truth="Use pip install ragit",
context="Installation section of documentation"
)
EvaluationResult
- class ragit.core.experiment.results.EvaluationResult(pattern_name: str, indexing_params: dict[str, Any], inference_params: dict[str, Any], scores: dict[str, dict[str, float]], execution_time: float, final_score: float)[source]
Result from evaluating a single RAG configuration.
- Parameters:
pattern_name (str) – Name of the RAG pattern (e.g., “Pattern_1”).
indexing_params (dict[str, Any]) – Hyperparameters used during indexing (chunk_size, overlap, etc.).
inference_params (dict[str, Any]) – Hyperparameters used during inference (num_chunks, llm_model, etc.).
scores (dict[str, dict]) – Evaluation scores (answer_correctness, context_relevance, faithfulness).
execution_time (float) – Time taken for evaluation in seconds.
final_score (float) – Combined score for optimization ranking.
# Accessing result attributes
result = results[0]
print(result.pattern_name) # "Pattern_1"
print(result.final_score) # 0.85
print(result.execution_time) # 45.3
print(result.indexing_params) # {"chunk_size": 512, ...}
print(result.inference_params) # {"num_chunks": 5, ...}
print(result.scores) # {"answer_correctness": {...}, ...}
ExperimentResults
- class ragit.core.experiment.results.ExperimentResults(evaluations: list[EvaluationResult] = <factory>)[source]
Collection of evaluation results from an optimization experiment.
- evaluations
All evaluation results.
- Type:
- evaluations: list[EvaluationResult]
- add(result: EvaluationResult) None[source]
Add an evaluation result.
- is_cached(indexing_params: dict[str, Any], inference_params: dict[str, Any]) float | None[source]
Check if this configuration was already evaluated.
- Returns:
Final score if cached, None otherwise.
- Return type:
float or None
- sorted(reverse: bool = True) list[EvaluationResult][source]
Get results sorted by final score.
- Parameters:
reverse (bool) – If True (default), best scores first.
- Returns:
Sorted results.
- Return type:
- get_best(k: int = 1) list[EvaluationResult][source]
Get k best results.
- Parameters:
k (int) – Number of results to return.
- Returns:
Top k results by score.
- Return type:
- __init__(evaluations: list[EvaluationResult] = <factory>) None
from ragit import RagitExperiment
experiment = RagitExperiment(documents, benchmark)
results = experiment.run()
# Iterate over results
for result in results:
print(result)
# Get best results
top_5 = results.get_best(k=5)
# Get sorted results
sorted_results = results.sorted(reverse=True)
# Check if configuration was tested
cached_score = results.is_cached(
indexing_params={"chunk_size": 512},
inference_params={"num_chunks": 5}
)
Quick Reference
Creating an Experiment
from ragit import RagitExperiment, Document, BenchmarkQuestion
# Prepare documents
documents = [
Document(id="doc1", content="...", metadata={}),
Document(id="doc2", content="...", metadata={}),
]
# Create benchmark
benchmark = [
BenchmarkQuestion(
question="Question 1?",
ground_truth="Expected answer 1",
context="Context 1"
),
BenchmarkQuestion(
question="Question 2?",
ground_truth="Expected answer 2",
context="Context 2"
),
]
# Create experiment
experiment = RagitExperiment(documents, benchmark)
Running with Default Settings
# Run with default search space
results = experiment.run()
# Get best configuration
best = results[0]
print(f"Best score: {best.final_score:.3f}")
Custom Search Space
# Define custom search space
configs = experiment.define_search_space(
chunk_sizes=[256, 512, 1024],
chunk_overlaps=[25, 50, 100],
num_chunks=[3, 5, 7],
llm_models=["llama3", "mistral"],
embedding_models=["mxbai-embed-large"],
max_configs=50
)
# Run with custom configs
results = experiment.run(configs=configs)
Evaluating Single Configuration
from ragit.core.experiment.experiment import RAGConfig
# Create a specific configuration
config = RAGConfig(
chunk_size=512,
chunk_overlap=50,
num_chunks=5,
llm_model="llama3",
embedding_model="mxbai-embed-large"
)
# Evaluate single config
result = experiment.evaluate_config(config)
print(f"Score: {result.final_score:.3f}")
Analyzing Results
results = experiment.run()
# Summary statistics
scores = results.scores
print(f"Min score: {min(scores):.3f}")
print(f"Max score: {max(scores):.3f}")
print(f"Mean score: {sum(scores)/len(scores):.3f}")
# Best by different criteria
for result in results[:5]:
print(f"\n{result.pattern_name}:")
print(f" Final: {result.final_score:.3f}")
print(f" Correctness: {result.scores['answer_correctness']['mean']:.3f}")
print(f" Relevance: {result.scores['context_relevance']['mean']:.3f}")
print(f" Faithfulness: {result.scores['faithfulness']['mean']:.3f}")
Exporting Results
import json
results = experiment.run()
# Export all results
export_data = []
for result in results:
export_data.append(result.to_dict())
with open("experiment_results.json", "w") as f:
json.dump(export_data, f, indent=2)
# Export best config
best = results[0]
best_config = {
"chunk_size": best.indexing_params["chunk_size"],
"chunk_overlap": best.indexing_params["chunk_overlap"],
"num_chunks": best.inference_params.get("num_chunks"),
"score": best.final_score
}
with open("best_config.json", "w") as f:
json.dump(best_config, f, indent=2)
SimpleVectorStore
Internal vector store used by the experiment engine.
Warning
SimpleVectorStore is NOT thread-safe.
- class ragit.core.experiment.experiment.SimpleVectorStore[source]
Simple in-memory vector store with pre-normalized embeddings for fast search.
Note: This class is NOT thread-safe.
from ragit.core.experiment.experiment import SimpleVectorStore, Chunk
# Create vector store
store = SimpleVectorStore()
# Add chunks with embeddings
chunk = Chunk(
content="Some text content",
doc_id="doc1",
chunk_index=0,
embedding=(0.1, 0.2, 0.3, ...) # Pre-computed embedding
)
store.add(chunk)
# Search
query_embedding = (0.15, 0.25, 0.35, ...)
results = store.search(query_embedding, top_k=5)
for chunk, score in results:
print(f"{chunk.doc_id}: {score:.3f}")
# Clear store
store.clear()