Models
Buddy AI supports 30+ LLM providers through a unified Model interface.
OpenAI
from buddy.models.openai import OpenAIChat
model = OpenAIChat(id="gpt-4o")
model = OpenAIChat(id="gpt-4o-mini", temperature=0.7, max_tokens=2048)
Anthropic
pip install "buddy-ai[anthropic]"
from buddy.models.anthropic import Claude
model = Claude(id="claude-opus-4-5")
Google Gemini
pip install "buddy-ai[google]"
from buddy.models.google import Gemini
model = Gemini(id="gemini-1.5-pro")
AWS Bedrock
pip install "buddy-ai[aws]"
from buddy.models.aws import AwsBedrock
model = AwsBedrock(id="anthropic.claude-3-sonnet-20240229-v1:0")
Ollama (Local)
from buddy.models.ollama import Ollama
model = Ollama(id="llama3.2") # Requires ollama server running
All Supported Providers
| Provider | Import | Extra |
|---|---|---|
| OpenAI | buddy.models.openai |
core |
| Anthropic | buddy.models.anthropic |
[anthropic] |
| Google Gemini | buddy.models.google |
[google] |
| AWS Bedrock | buddy.models.aws |
[aws] |
| Azure OpenAI | buddy.models.azure |
core |
| Cohere | buddy.models.cohere |
[cohere] |
| Ollama | buddy.models.ollama |
core |
| Groq | buddy.models.groq |
core |
| Mistral | buddy.models.mistral |
core |
| HuggingFace | buddy.models.huggingface |
core |
| DeepSeek | buddy.models.deepseek |
core |
| xAI (Grok) | buddy.models.xai |
core |
| Perplexity | buddy.models.perplexity |
core |
| Together AI | buddy.models.together |
core |
| Fireworks | buddy.models.fireworks |
core |
| LiteLLM | buddy.models.litellm |
core |
See Model Providers Overview for the full list.
Prompt Caching
All models support the cache_prompt flag. Each provider implements caching in
the most efficient way for its API:
from buddy.agent import Agent
from buddy.models.anthropic import Claude
from buddy.models.openai import OpenAIChat
# Anthropic — explicit cache_control breakpoints on system, tools, history
agent = Agent(model=Claude(id="claude-opus-4-5"), cache_prompt=True)
# OpenAI — server-side automatic caching (>=1024 token prefix)
agent = Agent(model=OpenAIChat(id="gpt-4o"), cache_prompt=True)
Cache hit/miss token counts appear automatically in RunResponse.metrics.
See the full Prompt Caching guide for
PromptCacheConfig, 1-hour TTL, and per-provider details.