Providers API
Providers are responsible for LLM and embedding operations. ragit includes an Ollama provider and abstract base classes for creating custom providers.
OllamaProvider
The primary provider for both LLM and embedding operations.
- class ragit.providers.OllamaProvider(base_url: str | None = None, embedding_url: str | None = None, api_key: str | None = None, timeout: int | None = None, timeouts: dict[str, int] | None = None, use_cache: bool = True, use_resilience: bool = True)[source]
Bases:
BaseLLMProvider,BaseEmbeddingProviderOllama provider for both LLM and Embedding operations.
Performance features: - Connection pooling via requests.Session() for faster sequential requests - Native batch embedding via /api/embed endpoint (single API call) - LRU cache for repeated embedding queries (2048 entries)
- Parameters:
base_url (str, optional) – Ollama server URL (default: from OLLAMA_BASE_URL env var)
api_key (str, optional) – API key for authentication (default: from OLLAMA_API_KEY env var)
timeout (int, optional) – Request timeout in seconds (default: from OLLAMA_TIMEOUT env var)
use_cache (bool, optional) – Enable embedding cache (default: True)
Examples
>>> provider = OllamaProvider() >>> response = provider.generate("What is RAG?", model="llama3") >>> print(response.text)
>>> # Batch embedding (single API call) >>> embeddings = provider.embed_batch(texts, "mxbai-embed-large")
- EMBEDDING_DIMENSIONS: dict[str, int] = {'all-minilm': 384, 'mxbai-embed-large': 1024, 'nomic-embed-text': 768, 'nomic-embed-text:latest': 768, 'qwen3-embedding': 4096, 'qwen3-embedding:0.6b': 1024, 'qwen3-embedding:4b': 2560, 'qwen3-embedding:8b': 4096, 'snowflake-arctic-embed': 1024}
- MAX_EMBED_CHARS = 2000
- DEFAULT_TIMEOUTS: dict[str, int] = {'chat': 300, 'embed': 30, 'embed_batch': 120, 'generate': 300, 'health': 5, 'list_models': 10}
- __init__(base_url: str | None = None, embedding_url: str | None = None, api_key: str | None = None, timeout: int | None = None, timeouts: dict[str, int] | None = None, use_cache: bool = True, use_resilience: bool = True) None[source]
- property session: Session
Lazy-initialized session for connection pooling.
Note: API key is NOT stored in session headers to prevent potential exposure in logs or error messages. Authentication is handled per-request via _get_headers().
- generate(prompt: str, model: str, system_prompt: str | None = None, temperature: float = 0.7, max_tokens: int | None = None) LLMResponse[source]
Generate text using Ollama with optional resilience (retry + circuit breaker).
- embed(text: str, model: str) EmbeddingResponse[source]
Generate embedding using Ollama with optional caching and resilience.
- embed_batch(texts: list[str], model: str) list[EmbeddingResponse][source]
Generate embeddings for multiple texts in a single API call with resilience.
The /api/embed endpoint supports batch inputs natively.
- async embed_batch_async(texts: list[str], model: str, max_concurrent: int = 10) list[EmbeddingResponse][source]
Generate embeddings for multiple texts asynchronously.
The /api/embed endpoint supports batch inputs natively, so this makes a single async HTTP request for all texts.
- Parameters:
- Returns:
Embeddings in the same order as input texts.
- Return type:
Examples
>>> import asyncio >>> embeddings = asyncio.run(provider.embed_batch_async(texts, "mxbai-embed-large"))
- chat(messages: list[dict[str, str]], model: str, temperature: float = 0.7, max_tokens: int | None = None) LLMResponse[source]
Chat completion using Ollama with optional resilience.
- property generate_circuit_status: str
Get generate circuit breaker status (CLOSED, OPEN, HALF_OPEN, or ‘disabled’).
Quick Reference
from ragit.providers import OllamaProvider
# Create with defaults (uses environment variables)
provider = OllamaProvider()
# Create with custom settings
provider = OllamaProvider(
base_url="http://localhost:11434",
embedding_url="http://localhost:11434", # Can be different
api_key=None, # For cloud providers
timeout=120 # Request timeout
)
Checking Availability
from ragit.providers import OllamaProvider
provider = OllamaProvider()
if provider.is_available():
print("Ollama is running")
print(f"URL: {provider.base_url}")
else:
print("Ollama is not available")
print("Start with: ollama serve")
Text Generation
from ragit.providers import OllamaProvider
provider = OllamaProvider()
# Basic generation
response = provider.generate(
prompt="Explain quantum computing",
model="llama3"
)
print(response.text)
# With parameters
response = provider.generate(
prompt="Write a haiku about programming",
model="llama3",
temperature=0.9, # Higher = more creative
max_tokens=100, # Limit response length
system_prompt="You are a poet."
)
print(response.text)
print(f"Model: {response.model}")
print(f"Provider: {response.provider}")
Creating Embeddings
from ragit.providers import OllamaProvider
provider = OllamaProvider()
# Single embedding
response = provider.embed(
text="This is a sample sentence",
model="mxbai-embed-large"
)
print(f"Dimensions: {response.dimensions}")
print(f"Embedding type: {type(response.embedding)}") # tuple
print(f"First 5 values: {response.embedding[:5]}")
# Batch embeddings (more efficient)
texts = [
"First document",
"Second document",
"Third document"
]
responses = provider.embed_batch(texts, model="mxbai-embed-large")
print(f"Created {len(responses)} embeddings")
for i, resp in enumerate(responses):
print(f" {i}: {resp.dimensions} dimensions")
Resource Management
OllamaProvider supports the context manager protocol for automatic cleanup:
from ragit.providers import OllamaProvider
# Recommended: Use context manager for automatic cleanup
with OllamaProvider() as provider:
response = provider.generate("Hello", model="llama3")
embeddings = provider.embed_batch(texts, model="mxbai-embed-large")
# Session automatically closed on exit
# Alternative: Manual cleanup
provider = OllamaProvider()
try:
response = provider.generate("Hello", model="llama3")
finally:
provider.close()
Performance Features
Connection Pooling
OllamaProvider uses requests.Session() for HTTP connection pooling:
from ragit.providers import OllamaProvider
with OllamaProvider() as provider:
# All requests reuse the same TCP connection
for text in texts:
provider.embed(text, model="mxbai-embed-large")
Async Parallel Embedding
For large batches, use embed_batch_async() with asyncio:
import asyncio
from ragit.providers import OllamaProvider
provider = OllamaProvider()
async def embed_many():
texts = ["doc1...", "doc2...", "doc3..."]
return await provider.embed_batch_async(
texts,
model="mxbai-embed-large"
)
results = asyncio.run(embed_many())
Embedding Cache
Embeddings are cached automatically using an LRU cache (2048 entries):
from ragit.providers import OllamaProvider
provider = OllamaProvider(use_cache=True) # Default
# First call hits API
provider.embed("Hello", model="mxbai-embed-large")
# Second call returns cached result
provider.embed("Hello", model="mxbai-embed-large")
# View cache statistics
info = OllamaProvider.embedding_cache_info()
print(info) # {'hits': 1, 'misses': 1, 'maxsize': 2048, 'currsize': 1}
# Clear cache
OllamaProvider.clear_embedding_cache()
Base Classes
Abstract base classes for creating custom providers.
BaseLLMProvider
- class ragit.providers.base.BaseLLMProvider[source]
Bases:
ABCAbstract base class for LLM providers.
Implement this to add support for new LLM providers like Gemini, Claude, etc.
- abstract property provider_name: str
Return the provider name (e.g., ‘ollama’, ‘gemini’, ‘claude’).
- abstractmethod generate(prompt: str, model: str, system_prompt: str | None = None, temperature: float = 0.7, max_tokens: int | None = None) LLMResponse[source]
Generate text from the LLM.
- Parameters:
prompt (str) – The user prompt/query.
model (str) – Model identifier (e.g., ‘llama3’, ‘qwen3-vl:235b-instruct-cloud’).
system_prompt (str, optional) – System prompt for context/instructions.
temperature (float) – Sampling temperature (0.0 to 1.0).
max_tokens (int, optional) – Maximum tokens to generate.
- Returns:
The generated response.
- Return type:
BaseEmbeddingProvider
- class ragit.providers.base.BaseEmbeddingProvider[source]
Bases:
ABCAbstract base class for embedding providers.
Implement this to add support for new embedding providers.
- abstractmethod embed(text: str, model: str) EmbeddingResponse[source]
Generate embedding for text.
- Parameters:
- Returns:
The embedding response.
- Return type:
Response Classes
LLMResponse
- class ragit.providers.base.LLMResponse(text: str, model: str, provider: str, usage: dict[str, int] | None = None)[source]
Response from an LLM call.
from ragit.providers import OllamaProvider
provider = OllamaProvider()
response = provider.generate("Hello", model="llama3")
# Access response attributes
print(response.text) # Generated text
print(response.model) # "llama3"
print(response.provider) # "ollama"
EmbeddingResponse
- class ragit.providers.base.EmbeddingResponse(embedding: tuple[float, ...], model: str, provider: str, dimensions: int)[source]
Response from an embedding call (immutable).
from ragit.providers import OllamaProvider
provider = OllamaProvider()
response = provider.embed("Sample text", model="mxbai-embed-large")
# Access response attributes
print(response.embedding) # tuple[float, ...]
print(response.dimensions) # 1024 (for mxbai-embed-large)
print(response.model) # "mxbai-embed-large"
print(response.provider) # "ollama"
Note
EmbeddingResponse is immutable (frozen dataclass) with tuple embeddings
to prevent accidental modification and ensure thread-safety of the data.
Creating Custom Providers
To add support for a new LLM service, inherit from the base classes:
from ragit.providers.base import (
BaseLLMProvider,
BaseEmbeddingProvider,
LLMResponse,
EmbeddingResponse
)
class OpenAIProvider(BaseLLMProvider, BaseEmbeddingProvider):
"""Provider for OpenAI API."""
def __init__(self, api_key: str):
self.api_key = api_key
self._dimensions = 1536 # text-embedding-ada-002
@property
def provider_name(self) -> str:
return "openai"
@property
def dimensions(self) -> int:
return self._dimensions
def generate(
self,
prompt: str,
model: str = "gpt-4",
temperature: float = 0.7,
max_tokens: int | None = None,
system_prompt: str | None = None,
) -> LLMResponse:
# Implementation using OpenAI API
import openai
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens
)
return LLMResponse(
text=response.choices[0].message.content,
model=model,
provider=self.provider_name
)
def embed(self, text: str, model: str = "text-embedding-ada-002") -> EmbeddingResponse:
import openai
response = openai.Embedding.create(
input=text,
model=model
)
embedding = response.data[0].embedding
return EmbeddingResponse(
embedding=tuple(embedding), # Must be tuple
model=model,
provider=self.provider_name,
dimensions=len(embedding)
)
def embed_batch(
self,
texts: list[str],
model: str = "text-embedding-ada-002"
) -> list[EmbeddingResponse]:
import openai
response = openai.Embedding.create(
input=texts,
model=model
)
return [
EmbeddingResponse(
embedding=tuple(item.embedding),
model=model,
provider=self.provider_name,
dimensions=len(item.embedding)
)
for item in response.data
]
def is_available(self) -> bool:
try:
import openai
openai.Model.list()
return True
except Exception:
return False
Using Custom Providers
from ragit import RagitExperiment
# Create custom provider
provider = OpenAIProvider(api_key="sk-...")
# Use with experiment
experiment = RagitExperiment(
documents,
benchmark,
provider=provider
)
results = experiment.run()
Embedding Model Dimensions
ragit includes a mapping of known embedding model dimensions:
from ragit.providers.ollama import EMBEDDING_DIMENSIONS
print(EMBEDDING_DIMENSIONS)
# {
# "mxbai-embed-large": 1024,
# "nomic-embed-text": 768,
# "all-minilm": 384,
# ...
# }
When using a model not in this mapping, ragit queries the model to determine dimensions automatically.