RAGAssistant API
The RAGAssistant class provides a high-level interface for RAG operations.
Warning
RAGAssistant is NOT thread-safe. Each thread should have its own instance.
See Platform Integration for thread-safe patterns.
Class Reference
- class ragit.RAGAssistant(documents: list[Document] | str | Path, embed_fn: Callable[[str], list[float]] | None = None, generate_fn: Callable[[...], str] | None = None, provider: BaseEmbeddingProvider | BaseLLMProvider | None = None, embedding_model: str | None = None, llm_model: str | None = None, chunk_size: int = 512, chunk_overlap: int = 50)[source]
Bases:
objectHigh-level RAG assistant for document Q&A and generation.
Handles document indexing, retrieval, and LLM generation in one simple API.
- Parameters:
documents (list[Document] or str or Path) – Documents to index. Can be: - List of Document objects - Path to a single file - Path to a directory (will load all .txt, .md, .rst files)
embed_fn (Callable[[str], list[float]], optional) – Function that takes text and returns an embedding vector. If provided, creates a FunctionProvider internally.
generate_fn (Callable, optional) – Function for text generation. Supports (prompt) or (prompt, system_prompt). If provided without embed_fn, must also provide embed_fn.
provider (BaseEmbeddingProvider, optional) – Provider for embeddings (and optionally LLM). If embed_fn is provided, this is ignored for embeddings.
embedding_model (str, optional) – Embedding model name (used with provider).
llm_model (str, optional) – LLM model name (used with provider).
chunk_size (int, optional) – Chunk size for splitting documents (default: 512).
chunk_overlap (int, optional) – Overlap between chunks (default: 50).
- Raises:
ValueError – If neither embed_fn nor provider is provided.
Note
This class is NOT thread-safe. Each thread should have its own instance.
Examples
>>> # With custom embedding function (retrieval-only) >>> assistant = RAGAssistant(docs, embed_fn=my_embed) >>> results = assistant.retrieve("query") >>> >>> # With custom embedding and LLM functions (full RAG) >>> assistant = RAGAssistant(docs, embed_fn=my_embed, generate_fn=my_llm) >>> answer = assistant.ask("What is X?") >>> >>> # With Ollama provider (supports nomic-embed-text) >>> from ragit.providers import OllamaProvider >>> assistant = RAGAssistant(docs, provider=OllamaProvider())
- __init__(documents: list[Document] | str | Path, embed_fn: Callable[[str], list[float]] | None = None, generate_fn: Callable[[...], str] | None = None, provider: BaseEmbeddingProvider | BaseLLMProvider | None = None, embedding_model: str | None = None, llm_model: str | None = None, chunk_size: int = 512, chunk_overlap: int = 50)[source]
- add_documents(documents: list[Document] | str | Path) int[source]
Add documents to the existing index incrementally.
- Parameters:
documents – Documents to add.
- Returns:
Number of chunks added.
- remove_documents(source_path_pattern: str) int[source]
Remove documents matching a source path pattern.
- Parameters:
source_path_pattern – Glob pattern to match ‘source’ metadata.
- Returns:
Number of chunks removed.
- update_documents(documents: list[Document] | str | Path) int[source]
Update existing documents (remove old, add new).
Uses document source path to identify what to remove.
- Parameters:
documents – New versions of documents.
- Returns:
Number of chunks added.
- retrieve(query: str, top_k: int = 3) list[tuple[Chunk, float]][source]
Retrieve relevant chunks for a query.
Uses vectorized cosine similarity for fast search over all chunks.
- Parameters:
- Returns:
List of (chunk, similarity_score) tuples, sorted by relevance.
- Return type:
Examples
>>> results = assistant.retrieve("how to create a route") >>> for chunk, score in results: ... print(f"{score:.2f}: {chunk.content[:100]}...")
- retrieve_with_context(query: str, top_k: int = 3, window_size: int = 1, min_score: float = 0.0) list[tuple[Chunk, float]][source]
Retrieve chunks with adjacent context expansion (window search).
For each retrieved chunk, also includes adjacent chunks from the same document to provide more context. This is useful when relevant information spans multiple chunks.
Pattern inspired by ai4rag window_search.
- Parameters:
- Returns:
List of (chunk, similarity_score) tuples, sorted by relevance. Adjacent chunks have slightly lower scores.
- Return type:
Examples
>>> # Get chunks with 1 adjacent chunk on each side >>> results = assistant.retrieve_with_context("query", window_size=1) >>> for chunk, score in results: ... print(f"{score:.2f}: {chunk.content[:50]}...")
- get_context_with_window(query: str, top_k: int = 3, window_size: int = 1, min_score: float = 0.0) str[source]
Get formatted context with adjacent chunk expansion.
Merges overlapping text from adjacent chunks intelligently.
- get_context(query: str, top_k: int = 3) str[source]
Get formatted context string from retrieved chunks.
- generate(prompt: str, system_prompt: str | None = None, temperature: float = 0.7) str[source]
Generate text using the LLM (without retrieval).
- Parameters:
- Returns:
Generated text.
- Return type:
- Raises:
NotImplementedError – If no LLM is configured.
- ask(question: str, system_prompt: str | None = None, top_k: int = 3, temperature: float = 0.7) str[source]
Ask a question using RAG (retrieve + generate).
- Parameters:
- Returns:
Generated answer.
- Return type:
- Raises:
NotImplementedError – If no LLM is configured.
Examples
>>> answer = assistant.ask("How do I create a REST API?") >>> print(answer)
- generate_code(request: str, language: str = 'python', top_k: int = 3, temperature: float = 0.7) str[source]
Generate code based on documentation context.
- Parameters:
- Returns:
Generated code (cleaned, without markdown).
- Return type:
- Raises:
NotImplementedError – If no LLM is configured.
Examples
>>> code = assistant.generate_code("create a REST API with user endpoints") >>> print(code)
Quick Reference
Constructor
from ragit import RAGAssistant
assistant = RAGAssistant(
source, # str, Path, or list of sources
chunk_size=512, # Characters per chunk
chunk_overlap=50, # Overlap between chunks
llm_model="llama3", # LLM model name
embedding_model="mxbai-embed-large", # Embedding model name
pattern="*.txt", # File pattern for directories
recursive=False # Recursive directory search
)
Methods
ask()
Ask a question and get an answer using RAG.
answer = assistant.ask("What is the installation process?")
print(answer)
retrieve()
Retrieve relevant chunks without generating an answer.
results = assistant.retrieve("database configuration", top_k=5)
for chunk, score in results:
print(f"Score: {score:.3f}")
print(f"Source: {chunk.doc_id}")
print(f"Content: {chunk.content[:100]}...")
generate()
Direct LLM generation without retrieval.
response = assistant.generate("Explain what RAG means")
print(response)
generate_code()
Generate code with context from documents.
code = assistant.generate_code("Write a function to parse JSON")
print(code)
Examples
Basic Usage
from ragit import RAGAssistant
# Create assistant
assistant = RAGAssistant("docs/")
# Ask questions
answer = assistant.ask("How do I get started?")
print(answer)
Custom Configuration
from ragit import RAGAssistant
assistant = RAGAssistant(
"docs/",
chunk_size=1024, # Larger chunks
chunk_overlap=100, # More overlap
llm_model="llama3:70b", # Larger model
embedding_model="nomic-embed-text" # Different embeddings
)
Multiple Document Sources
from ragit import RAGAssistant
# Load from multiple sources
assistant = RAGAssistant([
"README.md", # Single file
"docs/", # Directory
"examples/tutorial/" # Another directory
])
Recursive Loading
from ragit import RAGAssistant
# Load all markdown files recursively
assistant = RAGAssistant(
"project/",
pattern="**/*.md",
recursive=True
)
Getting Sources with Answers
from ragit import RAGAssistant
assistant = RAGAssistant("docs/")
question = "How do I configure logging?"
# Get sources
sources = assistant.retrieve(question, top_k=3)
# Get answer
answer = assistant.ask(question)
print(f"Answer: {answer}\n")
print("Based on:")
for chunk, score in sources:
print(f" - {chunk.doc_id} (relevance: {score:.2f})")
Internal Attributes
These attributes are available but considered internal:
# Access loaded chunks (immutable tuple)
chunks = assistant._chunks
print(f"Total chunks: {len(chunks)}")
# Access embedding matrix (numpy array)
matrix = assistant._embedding_matrix
print(f"Matrix shape: {matrix.shape}")
# Access provider
provider = assistant.provider
print(f"Provider: {provider.provider_name}")