Model Providers Overview
Buddy AI supports 25+ language model providers through a unified interface. This allows you to switch between providers seamlessly and take advantage of different models' strengths.
Supported Providers
Major Cloud Providers
| Provider | Models | Features |
|---|---|---|
| OpenAI | GPT-4, GPT-4 Turbo, GPT-3.5 | Function calling, vision, embeddings |
| Anthropic | Claude 3.5 Sonnet, Claude 3 | Large context, advanced reasoning |
| Gemini Pro, Gemini Ultra | Multi-modal, code generation | |
| AWS Bedrock | Claude, Llama, Titan | Enterprise security, compliance |
| Azure OpenAI | GPT-4, GPT-3.5 | Enterprise integration |
Open Source & Self-Hosted
| Provider | Models | Features |
|---|---|---|
| Ollama | Llama, Mistral, CodeLlama | Local deployment, privacy |
| Hugging Face | Thousands of models | Open source, customizable |
| vLLM | Optimized inference | High performance, batching |
| LM Studio | Local models | Desktop deployment |
Specialized Providers
| Provider | Specialty | Use Cases |
|---|---|---|
| Cohere | Enterprise NLP | Classification, embeddings |
| Fireworks | Fast inference | Real-time applications |
| Together AI | Open source models | Cost-effective inference |
| Groq | Ultra-fast inference | Low-latency applications |
Unified Interface
All providers use the same interface:
from buddy import Agent
from buddy.models.openai import OpenAIChat
from buddy.models.anthropic import AnthropicChat
from buddy.models.google import GoogleChat
# Same agent interface, different providers
openai_agent = Agent(model=OpenAIChat())
anthropic_agent = Agent(model=AnthropicChat())
google_agent = Agent(model=GoogleChat())
# Identical usage
response1 = openai_agent.run("Hello!")
response2 = anthropic_agent.run("Hello!")
response3 = google_agent.run("Hello!")
Model Configuration
Basic Configuration
from buddy.models.openai import OpenAIChat
model = OpenAIChat(
model="gpt-4",
temperature=0.7,
max_tokens=1000,
top_p=1.0
)
Advanced Configuration
from buddy.models.openai import OpenAIChat
model = OpenAIChat(
model="gpt-4",
temperature=0.7,
max_tokens=4000,
top_p=0.9,
frequency_penalty=0.1,
presence_penalty=0.1,
stop=["\\n\\n"],
seed=42,
timeout=30,
max_retries=3,
organization="org-xxx",
base_url="https://custom-endpoint.com/v1"
)
Model Selection Guide
For Different Use Cases
General Conversation - OpenAI GPT-4 Turbo - Anthropic Claude 3.5 Sonnet - Google Gemini Pro
Code Generation - OpenAI GPT-4 - Google Gemini Pro - Claude 3 Sonnet
Large Context Tasks - Anthropic Claude 3 (200K tokens) - Google Gemini Pro (128K tokens) - OpenAI GPT-4 Turbo (128K tokens)
Cost Optimization - OpenAI GPT-3.5 Turbo - Together AI Llama models - Ollama (self-hosted)
Privacy & Compliance - Ollama (local deployment) - AWS Bedrock - Azure OpenAI
Speed & Performance - Groq Llama - Fireworks AI - Together AI
Model Switching
Runtime Switching
from buddy import Agent
from buddy.models.openai import OpenAIChat
from buddy.models.anthropic import AnthropicChat
agent = Agent(model=OpenAIChat())
# Switch model at runtime
agent.model = AnthropicChat()
response = agent.run("Hello with new model!")
Fallback Models
from buddy.models.openai import OpenAIChat
from buddy.models.anthropic import AnthropicChat
class FallbackModel:
def __init__(self):\n self.primary = OpenAIChat()
self.fallback = AnthropicChat()
def run(self, *args, **kwargs):
try:
return self.primary.run(*args, **kwargs)
except Exception:
return self.fallback.run(*args, **kwargs)
agent = Agent(model=FallbackModel())
Performance Optimization
Token Management
# Efficient token usage
model = OpenAIChat(
model="gpt-4",
max_tokens=500, # Limit output tokens
temperature=0.3 # More deterministic = fewer retries
)
Caching
from buddy.models.openai import OpenAIChat
model = OpenAIChat(
model="gpt-4",
enable_caching=True, # Enable response caching
cache_ttl=3600 # Cache for 1 hour
)
Batch Processing
# Process multiple requests efficiently
messages = ["Hello", "Goodbye", "How are you?"]
responses = []
for message in messages:
response = agent.run(message)
responses.append(response)
Model Comparison
Feature Matrix
| Feature | OpenAI | Anthropic | AWS | Azure | |
|---|---|---|---|---|---|
| Function Calling | ✅ | ✅ | ✅ | Varies | ✅ |
| Vision | ✅ | ✅ | ✅ | ✅ | ✅ |
| Code Generation | ✅ | ✅ | ✅ | ✅ | ✅ |
| Long Context | 128K | 200K | 128K | Varies | 128K |
| Streaming | ✅ | ✅ | ✅ | ✅ | ✅ |
| Enterprise | ✅ | ✅ | ✅ | ✅ | ✅ |
Cost Comparison (Approximate)
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4 Turbo | $10 | $30 |
| Claude 3.5 Sonnet | $3 | $15 |
| Gemini Pro | $0.50 | $1.50 |
| GPT-3.5 Turbo | $0.50 | $1.50 |
Prices are approximate and may change
Best Practices
Model Selection
- Start with GPT-4 Turbo for general use
- Use Claude 3 for large context tasks
- Try Gemini Pro for code generation
- Consider cost for high-volume applications
- Test multiple models for your specific use case
Error Handling
from buddy.exceptions import ModelProviderError
try:
response = agent.run("Hello!")
except ModelProviderError as e:
print(f"Model error: {e}")
# Switch to fallback model
agent.model = fallback_model
response = agent.run("Hello!")
Monitoring
# Track model usage
response = agent.run("Hello!")
print(f"Model: {response.model}")
print(f"Tokens used: {response.metrics.total_tokens}")
print(f"Cost: ${response.metrics.cost}")