RAG Implementation

Retrieval-Augmented Generation grounds an agent's answers in your own documents. buddy-ai supports two retrieval styles, both driven by parameters on Agent (and Team).

Two retrieval modes

Mode	Parameter	When retrieval happens
Agentic RAG	`search_knowledge=True` (default)	The model calls a search tool on demand during its reasoning
Traditional RAG	`add_references=True`	Retrieval runs before the model, injecting context into the prompt

from buddy import Agent
from buddy.models.openai import OpenAIChat

# Agentic: the model decides when to search
agent = Agent(model=OpenAIChat(id="gpt-4o"), knowledge=knowledge, search_knowledge=True)

# Traditional: always prepend retrieved references
agent = Agent(model=OpenAIChat(id="gpt-4o"), knowledge=knowledge, add_references=True)

The relevant Agent parameters (from buddy/agent/agent.py) are:

knowledge — the AgentKnowledge instance.
search_knowledge (default True) — register the knowledge-search tool.
add_references (default False) — inject retrieved passages into the prompt.
knowledge_filters — metadata filters applied to every search.
enable_agentic_knowledge_filters — let the model choose filters itself.
retriever — a custom callable that replaces the default search.
references_format — "json" (default) or "yaml".

Retrieval flow

A query is produced — either the model's tool call (agentic) or the user message (traditional).
AgentKnowledge.search(query, num_documents, filters) runs.
That calls vector_db.search(query, limit, filters), returning the top num_documents (default 5) Documents.
The passages are returned to the model — as a tool result or as references in the prompt, formatted per references_format.

Knowledge filters

Filters narrow retrieval to documents whose metadata matches. Pass them once on the agent, or per call to run().

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    knowledge=knowledge,
    knowledge_filters={"department": "engineering"},
)

# Or per-run:
agent.print_response("What is the on-call policy?",
                     knowledge_filters={"team": "platform"})

Metadata keys become valid filters as documents are loaded; AgentKnowledge tracks them and validates filter keys via validate_filters().

Custom retriever

Supply a retriever callable to bypass the built-in search entirely — useful for hybrid search, reranking, or an external service.

def my_retriever(agent, query, num_documents=5, **kwargs):
    docs = my_search_service(query, k=num_documents)
    return [{"content": d.text, "meta_data": d.meta} for d in docs]

agent = Agent(model=OpenAIChat(id="gpt-4o"), knowledge=knowledge, retriever=my_retriever)

iRAG: the built-in lightweight RAG

iRAG (buddy.knowledge.irag) is a self-contained knowledge base that needs no external vector database. It stores documents in SQLite and retrieves with a blend of TF-IDF cosine similarity, NLP ontology matching (via spaCy, if installed), and basic/fuzzy text search.

from buddy.knowledge import irag

kb = irag(file_path="support_docs.txt", strict_kb_mode=True)
kb.load()

results = kb.search("login error timeout")

What iRAG is good for — and its trade-offs

iRAG is convenient for local files and logs because it has no infrastructure dependency and ingests directories directly (dir_path=...). It is not a dense-vector semantic engine: ranking is lexical/TF-IDF based, and its defaults favor recall (broad results) over precision. spaCy ontology features require en_core_web_sm. The class is exported lowercase as irag, and it also offers helpers like search_comprehensive(), create_agent() and get_database_info().

For semantic retrieval at scale, prefer a standard AgentKnowledge subclass with a vector database.