Providers API

Providers are responsible for LLM and embedding operations. ragit includes an Ollama provider and abstract base classes for creating custom providers.

OllamaProvider

The primary provider for both LLM and embedding operations.

class ragit.providers.OllamaProvider(base_url: str | None = None, embedding_url: str | None = None, api_key: str | None = None, timeout: int | None = None, timeouts: dict[str, int] | None = None, use_cache: bool = True, use_resilience: bool = True)[source]

Bases: BaseLLMProvider, BaseEmbeddingProvider

Ollama provider for both LLM and Embedding operations.

Performance features: - Connection pooling via requests.Session() for faster sequential requests - Native batch embedding via /api/embed endpoint (single API call) - LRU cache for repeated embedding queries (2048 entries)

Parameters:
  • base_url (str, optional) – Ollama server URL (default: from OLLAMA_BASE_URL env var)

  • api_key (str, optional) – API key for authentication (default: from OLLAMA_API_KEY env var)

  • timeout (int, optional) – Request timeout in seconds (default: from OLLAMA_TIMEOUT env var)

  • use_cache (bool, optional) – Enable embedding cache (default: True)

Examples

>>> provider = OllamaProvider()
>>> response = provider.generate("What is RAG?", model="llama3")
>>> print(response.text)
>>> # Batch embedding (single API call)
>>> embeddings = provider.embed_batch(texts, "mxbai-embed-large")
EMBEDDING_DIMENSIONS: dict[str, int] = {'all-minilm': 384, 'mxbai-embed-large': 1024, 'nomic-embed-text': 768, 'nomic-embed-text:latest': 768, 'qwen3-embedding': 4096, 'qwen3-embedding:0.6b': 1024, 'qwen3-embedding:4b': 2560, 'qwen3-embedding:8b': 4096, 'snowflake-arctic-embed': 1024}
MAX_EMBED_CHARS = 2000
DEFAULT_TIMEOUTS: dict[str, int] = {'chat': 300, 'embed': 30, 'embed_batch': 120, 'generate': 300, 'health': 5, 'list_models': 10}
__init__(base_url: str | None = None, embedding_url: str | None = None, api_key: str | None = None, timeout: int | None = None, timeouts: dict[str, int] | None = None, use_cache: bool = True, use_resilience: bool = True) None[source]
property session: Session

Lazy-initialized session for connection pooling.

Note: API key is NOT stored in session headers to prevent potential exposure in logs or error messages. Authentication is handled per-request via _get_headers().

close() None[source]

Close the session and release resources.

property provider_name: str

Return the provider name (e.g., ‘ollama’, ‘gemini’, ‘claude’).

property dimensions: int

Return the embedding dimensions for the current model.

is_available() bool[source]

Check if Ollama server is reachable.

list_models() list[dict[str, Any]][source]

List available models on the Ollama server.

generate(prompt: str, model: str, system_prompt: str | None = None, temperature: float = 0.7, max_tokens: int | None = None) LLMResponse[source]

Generate text using Ollama with optional resilience (retry + circuit breaker).

embed(text: str, model: str) EmbeddingResponse[source]

Generate embedding using Ollama with optional caching and resilience.

embed_batch(texts: list[str], model: str) list[EmbeddingResponse][source]

Generate embeddings for multiple texts in a single API call with resilience.

The /api/embed endpoint supports batch inputs natively.

async embed_batch_async(texts: list[str], model: str, max_concurrent: int = 10) list[EmbeddingResponse][source]

Generate embeddings for multiple texts asynchronously.

The /api/embed endpoint supports batch inputs natively, so this makes a single async HTTP request for all texts.

Parameters:
  • texts (list[str]) – Texts to embed.

  • model (str) – Embedding model name.

  • max_concurrent (int) – Deprecated, kept for API compatibility. No longer used since the API now supports native batching.

Returns:

Embeddings in the same order as input texts.

Return type:

list[EmbeddingResponse]

Examples

>>> import asyncio
>>> embeddings = asyncio.run(provider.embed_batch_async(texts, "mxbai-embed-large"))
chat(messages: list[dict[str, str]], model: str, temperature: float = 0.7, max_tokens: int | None = None) LLMResponse[source]

Chat completion using Ollama with optional resilience.

Parameters:
  • messages (list[dict]) – List of messages with ‘role’ and ‘content’ keys.

  • model (str) – Model identifier.

  • temperature (float) – Sampling temperature.

  • max_tokens (int, optional) – Maximum tokens to generate.

Returns:

The generated response.

Return type:

LLMResponse

property generate_circuit_status: str

Get generate circuit breaker status (CLOSED, OPEN, HALF_OPEN, or ‘disabled’).

property embed_circuit_status: str

Get embed circuit breaker status (CLOSED, OPEN, HALF_OPEN, or ‘disabled’).

static clear_embedding_cache() None[source]

Clear the embedding cache.

static embedding_cache_info() dict[str, int][source]

Get embedding cache statistics.

Quick Reference

from ragit.providers import OllamaProvider

# Create with defaults (uses environment variables)
provider = OllamaProvider()

# Create with custom settings
provider = OllamaProvider(
    base_url="http://localhost:11434",
    embedding_url="http://localhost:11434",  # Can be different
    api_key=None,                            # For cloud providers
    timeout=120                              # Request timeout
)

Checking Availability

from ragit.providers import OllamaProvider

provider = OllamaProvider()

if provider.is_available():
    print("Ollama is running")
    print(f"URL: {provider.base_url}")
else:
    print("Ollama is not available")
    print("Start with: ollama serve")

Text Generation

from ragit.providers import OllamaProvider

provider = OllamaProvider()

# Basic generation
response = provider.generate(
    prompt="Explain quantum computing",
    model="llama3"
)
print(response.text)

# With parameters
response = provider.generate(
    prompt="Write a haiku about programming",
    model="llama3",
    temperature=0.9,      # Higher = more creative
    max_tokens=100,       # Limit response length
    system_prompt="You are a poet."
)
print(response.text)
print(f"Model: {response.model}")
print(f"Provider: {response.provider}")

Creating Embeddings

from ragit.providers import OllamaProvider

provider = OllamaProvider()

# Single embedding
response = provider.embed(
    text="This is a sample sentence",
    model="mxbai-embed-large"
)

print(f"Dimensions: {response.dimensions}")
print(f"Embedding type: {type(response.embedding)}")  # tuple
print(f"First 5 values: {response.embedding[:5]}")

# Batch embeddings (more efficient)
texts = [
    "First document",
    "Second document",
    "Third document"
]

responses = provider.embed_batch(texts, model="mxbai-embed-large")
print(f"Created {len(responses)} embeddings")

for i, resp in enumerate(responses):
    print(f"  {i}: {resp.dimensions} dimensions")

Resource Management

OllamaProvider supports the context manager protocol for automatic cleanup:

from ragit.providers import OllamaProvider

# Recommended: Use context manager for automatic cleanup
with OllamaProvider() as provider:
    response = provider.generate("Hello", model="llama3")
    embeddings = provider.embed_batch(texts, model="mxbai-embed-large")
# Session automatically closed on exit

# Alternative: Manual cleanup
provider = OllamaProvider()
try:
    response = provider.generate("Hello", model="llama3")
finally:
    provider.close()

Performance Features

Connection Pooling

OllamaProvider uses requests.Session() for HTTP connection pooling:

from ragit.providers import OllamaProvider

with OllamaProvider() as provider:
    # All requests reuse the same TCP connection
    for text in texts:
        provider.embed(text, model="mxbai-embed-large")

Async Parallel Embedding

For large batches, use embed_batch_async() with asyncio:

import asyncio
from ragit.providers import OllamaProvider

provider = OllamaProvider()

async def embed_many():
    texts = ["doc1...", "doc2...", "doc3..."]
    return await provider.embed_batch_async(
        texts,
        model="mxbai-embed-large"
    )

results = asyncio.run(embed_many())

Embedding Cache

Embeddings are cached automatically using an LRU cache (2048 entries):

from ragit.providers import OllamaProvider

provider = OllamaProvider(use_cache=True)  # Default

# First call hits API
provider.embed("Hello", model="mxbai-embed-large")

# Second call returns cached result
provider.embed("Hello", model="mxbai-embed-large")

# View cache statistics
info = OllamaProvider.embedding_cache_info()
print(info)  # {'hits': 1, 'misses': 1, 'maxsize': 2048, 'currsize': 1}

# Clear cache
OllamaProvider.clear_embedding_cache()

Base Classes

Abstract base classes for creating custom providers.

BaseLLMProvider

class ragit.providers.base.BaseLLMProvider[source]

Bases: ABC

Abstract base class for LLM providers.

Implement this to add support for new LLM providers like Gemini, Claude, etc.

abstract property provider_name: str

Return the provider name (e.g., ‘ollama’, ‘gemini’, ‘claude’).

abstractmethod generate(prompt: str, model: str, system_prompt: str | None = None, temperature: float = 0.7, max_tokens: int | None = None) LLMResponse[source]

Generate text from the LLM.

Parameters:
  • prompt (str) – The user prompt/query.

  • model (str) – Model identifier (e.g., ‘llama3’, ‘qwen3-vl:235b-instruct-cloud’).

  • system_prompt (str, optional) – System prompt for context/instructions.

  • temperature (float) – Sampling temperature (0.0 to 1.0).

  • max_tokens (int, optional) – Maximum tokens to generate.

Returns:

The generated response.

Return type:

LLMResponse

abstractmethod is_available() bool[source]

Check if the provider is available and configured.

BaseEmbeddingProvider

class ragit.providers.base.BaseEmbeddingProvider[source]

Bases: ABC

Abstract base class for embedding providers.

Implement this to add support for new embedding providers.

abstract property provider_name: str

Return the provider name.

abstract property dimensions: int

Return the embedding dimensions for the current model.

abstractmethod embed(text: str, model: str) EmbeddingResponse[source]

Generate embedding for text.

Parameters:
  • text (str) – Text to embed.

  • model (str) – Model identifier (e.g., ‘nomic-embed-text’).

Returns:

The embedding response.

Return type:

EmbeddingResponse

abstractmethod embed_batch(texts: list[str], model: str) list[EmbeddingResponse][source]

Generate embeddings for multiple texts.

Parameters:
  • texts (list[str]) – Texts to embed.

  • model (str) – Model identifier.

Returns:

List of embedding responses.

Return type:

list[EmbeddingResponse]

abstractmethod is_available() bool[source]

Check if the provider is available and configured.

Response Classes

LLMResponse

class ragit.providers.base.LLMResponse(text: str, model: str, provider: str, usage: dict[str, int] | None = None)[source]

Response from an LLM call.

text: str
model: str
provider: str
usage: dict[str, int] | None = None
__init__(text: str, model: str, provider: str, usage: dict[str, int] | None = None) None
from ragit.providers import OllamaProvider

provider = OllamaProvider()
response = provider.generate("Hello", model="llama3")

# Access response attributes
print(response.text)       # Generated text
print(response.model)      # "llama3"
print(response.provider)   # "ollama"

EmbeddingResponse

class ragit.providers.base.EmbeddingResponse(embedding: tuple[float, ...], model: str, provider: str, dimensions: int)[source]

Response from an embedding call (immutable).

embedding: tuple[float, ...]
model: str
provider: str
dimensions: int
__init__(embedding: tuple[float, ...], model: str, provider: str, dimensions: int) None
from ragit.providers import OllamaProvider

provider = OllamaProvider()
response = provider.embed("Sample text", model="mxbai-embed-large")

# Access response attributes
print(response.embedding)   # tuple[float, ...]
print(response.dimensions)  # 1024 (for mxbai-embed-large)
print(response.model)       # "mxbai-embed-large"
print(response.provider)    # "ollama"

Note

EmbeddingResponse is immutable (frozen dataclass) with tuple embeddings to prevent accidental modification and ensure thread-safety of the data.

Creating Custom Providers

To add support for a new LLM service, inherit from the base classes:

from ragit.providers.base import (
    BaseLLMProvider,
    BaseEmbeddingProvider,
    LLMResponse,
    EmbeddingResponse
)

class OpenAIProvider(BaseLLMProvider, BaseEmbeddingProvider):
    """Provider for OpenAI API."""

    def __init__(self, api_key: str):
        self.api_key = api_key
        self._dimensions = 1536  # text-embedding-ada-002

    @property
    def provider_name(self) -> str:
        return "openai"

    @property
    def dimensions(self) -> int:
        return self._dimensions

    def generate(
        self,
        prompt: str,
        model: str = "gpt-4",
        temperature: float = 0.7,
        max_tokens: int | None = None,
        system_prompt: str | None = None,
    ) -> LLMResponse:
        # Implementation using OpenAI API
        import openai

        messages = []
        if system_prompt:
            messages.append({"role": "system", "content": system_prompt})
        messages.append({"role": "user", "content": prompt})

        response = openai.ChatCompletion.create(
            model=model,
            messages=messages,
            temperature=temperature,
            max_tokens=max_tokens
        )

        return LLMResponse(
            text=response.choices[0].message.content,
            model=model,
            provider=self.provider_name
        )

    def embed(self, text: str, model: str = "text-embedding-ada-002") -> EmbeddingResponse:
        import openai

        response = openai.Embedding.create(
            input=text,
            model=model
        )

        embedding = response.data[0].embedding

        return EmbeddingResponse(
            embedding=tuple(embedding),  # Must be tuple
            model=model,
            provider=self.provider_name,
            dimensions=len(embedding)
        )

    def embed_batch(
        self,
        texts: list[str],
        model: str = "text-embedding-ada-002"
    ) -> list[EmbeddingResponse]:
        import openai

        response = openai.Embedding.create(
            input=texts,
            model=model
        )

        return [
            EmbeddingResponse(
                embedding=tuple(item.embedding),
                model=model,
                provider=self.provider_name,
                dimensions=len(item.embedding)
            )
            for item in response.data
        ]

    def is_available(self) -> bool:
        try:
            import openai
            openai.Model.list()
            return True
        except Exception:
            return False

Using Custom Providers

from ragit import RagitExperiment

# Create custom provider
provider = OpenAIProvider(api_key="sk-...")

# Use with experiment
experiment = RagitExperiment(
    documents,
    benchmark,
    provider=provider
)
results = experiment.run()

Embedding Model Dimensions

ragit includes a mapping of known embedding model dimensions:

from ragit.providers.ollama import EMBEDDING_DIMENSIONS

print(EMBEDDING_DIMENSIONS)
# {
#     "mxbai-embed-large": 1024,
#     "nomic-embed-text": 768,
#     "all-minilm": 384,
#     ...
# }

When using a model not in this mapping, ragit queries the model to determine dimensions automatically.