Model Providers Overview

Buddy AI supports 25+ language model providers through a unified interface. This allows you to switch between providers seamlessly and take advantage of different models' strengths.

Supported Providers

Major Cloud Providers

Provider	Models	Features
OpenAI	GPT-4, GPT-4 Turbo, GPT-3.5	Function calling, vision, embeddings
Anthropic	Claude 3.5 Sonnet, Claude 3	Large context, advanced reasoning
Google	Gemini Pro, Gemini Ultra	Multi-modal, code generation
AWS Bedrock	Claude, Llama, Titan	Enterprise security, compliance
Azure OpenAI	GPT-4, GPT-3.5	Enterprise integration

Open Source & Self-Hosted

Provider	Models	Features
Ollama	Llama, Mistral, CodeLlama	Local deployment, privacy
Hugging Face	Thousands of models	Open source, customizable
vLLM	Optimized inference	High performance, batching
LM Studio	Local models	Desktop deployment

Specialized Providers

Provider	Specialty	Use Cases
Cohere	Enterprise NLP	Classification, embeddings
Fireworks	Fast inference	Real-time applications
Together AI	Open source models	Cost-effective inference
Groq	Ultra-fast inference	Low-latency applications

Unified Interface

All providers use the same interface:

from buddy import Agent
from buddy.models.openai import OpenAIChat
from buddy.models.anthropic import AnthropicChat
from buddy.models.google import GoogleChat

# Same agent interface, different providers
openai_agent = Agent(model=OpenAIChat())
anthropic_agent = Agent(model=AnthropicChat())
google_agent = Agent(model=GoogleChat())

# Identical usage
response1 = openai_agent.run("Hello!")
response2 = anthropic_agent.run("Hello!")
response3 = google_agent.run("Hello!")

Model Configuration

Basic Configuration

from buddy.models.openai import OpenAIChat

model = OpenAIChat(
    model="gpt-4",
    temperature=0.7,
    max_tokens=1000,
    top_p=1.0
)

Advanced Configuration

from buddy.models.openai import OpenAIChat

model = OpenAIChat(
    model="gpt-4",
    temperature=0.7,
    max_tokens=4000,
    top_p=0.9,
    frequency_penalty=0.1,
    presence_penalty=0.1,
    stop=["\\n\\n"],
    seed=42,
    timeout=30,
    max_retries=3,
    organization="org-xxx",
    base_url="https://custom-endpoint.com/v1"
)

Model Selection Guide

For Different Use Cases

General Conversation - OpenAI GPT-4 Turbo - Anthropic Claude 3.5 Sonnet - Google Gemini Pro

Code Generation - OpenAI GPT-4 - Google Gemini Pro - Claude 3 Sonnet

Large Context Tasks - Anthropic Claude 3 (200K tokens) - Google Gemini Pro (128K tokens) - OpenAI GPT-4 Turbo (128K tokens)

Cost Optimization - OpenAI GPT-3.5 Turbo - Together AI Llama models - Ollama (self-hosted)

Privacy & Compliance - Ollama (local deployment) - AWS Bedrock - Azure OpenAI

Speed & Performance - Groq Llama - Fireworks AI - Together AI

Model Switching

Runtime Switching

from buddy import Agent
from buddy.models.openai import OpenAIChat
from buddy.models.anthropic import AnthropicChat

agent = Agent(model=OpenAIChat())

# Switch model at runtime
agent.model = AnthropicChat()
response = agent.run("Hello with new model!")

Fallback Models

from buddy.models.openai import OpenAIChat
from buddy.models.anthropic import AnthropicChat

class FallbackModel:
    def __init__(self):\n        self.primary = OpenAIChat()
        self.fallback = AnthropicChat()

    def run(self, *args, **kwargs):
        try:
            return self.primary.run(*args, **kwargs)
        except Exception:
            return self.fallback.run(*args, **kwargs)

agent = Agent(model=FallbackModel())

Performance Optimization

Token Management

# Efficient token usage
model = OpenAIChat(
    model="gpt-4",
    max_tokens=500,  # Limit output tokens
    temperature=0.3  # More deterministic = fewer retries
)

Caching

from buddy.models.openai import OpenAIChat

model = OpenAIChat(
    model="gpt-4",
    enable_caching=True,  # Enable response caching
    cache_ttl=3600       # Cache for 1 hour
)

Batch Processing

# Process multiple requests efficiently
messages = ["Hello", "Goodbye", "How are you?"]

responses = []
for message in messages:
    response = agent.run(message)
    responses.append(response)

Model Comparison

Feature Matrix

Feature	OpenAI	Anthropic	Google	AWS	Azure
Function Calling	✅	✅	✅	Varies	✅
Vision	✅	✅	✅	✅	✅
Code Generation	✅	✅	✅	✅	✅
Long Context	128K	200K	128K	Varies	128K
Streaming	✅	✅	✅	✅	✅
Enterprise	✅	✅	✅	✅	✅

Cost Comparison (Approximate)

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-4 Turbo	$10	$30
Claude 3.5 Sonnet	$3	$15
Gemini Pro	$0.50	$1.50
GPT-3.5 Turbo	$0.50	$1.50

Prices are approximate and may change

Best Practices

Model Selection

Start with GPT-4 Turbo for general use
Use Claude 3 for large context tasks
Try Gemini Pro for code generation
Consider cost for high-volume applications
Test multiple models for your specific use case

Error Handling

from buddy.exceptions import ModelProviderError

try:
    response = agent.run("Hello!")
except ModelProviderError as e:
    print(f"Model error: {e}")
    # Switch to fallback model
    agent.model = fallback_model
    response = agent.run("Hello!")

Monitoring

# Track model usage
response = agent.run("Hello!")
print(f"Model: {response.model}")
print(f"Tokens used: {response.metrics.total_tokens}")
print(f"Cost: ${response.metrics.cost}")