How to Design Your Knowledge Base for RAG
A practical framework for designing knowledge bases that power RAG systems. Covers when to use BM25, vector databases, and knowledge graphs.
January 20, 2026
•
9 min read
TL;DR
Your knowledge base design should match your data, not the hype. Start with BM25 if queries are predictable. Use vector search for unstructured content. Add a knowledge graph only when entity relationships matter for answering questions.
I've designed knowledge bases that ranged from simple BM25 search to full graph+vector hybrids. The biggest lesson? Most teams overcomplicate this. They jump straight to vector databases when a well-tuned keyword search would work better for their data.
Here's the framework I use to decide how to design a knowledge base.
The Three Level RAG Hierarchy
Level 1: Statistical and Rule-Based Retrieval
When to use: Predictable queries, structured data, known access patterns
This is the highest-accuracy approach when your domain is well-understood. No vector database required. It often outperforms naive vector search because it's precise when query patterns are known.
| Algorithm | Best For |
|---|---|
| BM25 / TF-IDF | Keyword-heavy domains (technical documentation) |
| Recency Weighting | News, logs, time-sensitive content |
| Frequency / Popularity | FAQs, common support queries |
| Rule-Based Routing | When you know which document answers which query type |
| Metadata Filtering | Date ranges, categories, authors, document types |
Many production systems use BM25 + metadata filters and outperform naive vector search because they're precise when the domain is well-understood.
Level 2: Vector Database Retrieval
When to use: Data is substantial but not interrelated. Books, images, documents where content is self-contained.
This is the right choice when semantic similarity matters and you need to find conceptually related content rather than exact keyword matches.
Chunking Methods
| Method | How It Works | Best For |
|---|---|---|
| Fixed Size | Split every N tokens with overlap | General purpose, simple baseline |
| Sentence Based | Split on sentence boundaries | Conversational content, Q&A |
| Paragraph / Section | Respect document structure | Well-formatted docs (markdown, HTML) |
| Semantic Chunking | Split when embedding similarity drops | Varied content, topic shifts |
| Recursive / Hierarchical | Try large chunks, split if too big | Mixed document types |
| Document-Specific | Code: by function; Legal: by clause | Domain-specific corpora |
| Agentic Chunking | LLM decides chunk boundaries | High-value, complex documents |
Chunk Size Guidelines
- Smaller chunks (128-256 tokens): Higher precision, better for specific facts
- Larger chunks (512-1024 tokens): More context, but noisier retrieval
Chunk size matters as much as chunking method. Start with 256-512 tokens and adjust based on retrieval quality.
Add-Ons to Enhance Level 2
These techniques layer on top of basic vector retrieval to improve accuracy and handle edge cases.
Hybrid Search (Sparse + Dense)
Combine BM25 (keyword) and vector search with weighted fusion. Catches both exact keyword matches and semantic similarities.
1Query → [BM25 Top-K] ─┐2 ├→ RRF Fusion → Results3Query → [Dense Top-K] ┘
When to add: Almost always. This is becoming the default in production systems.
Example scenarios:
- Query: "Error code 0x8007045D". BM25 catches exact error code match, Vector catches related I/O error docs
- Query: "Python async await tutorial". BM25 catches exact "async await" keywords, Vector catches conceptually similar concurrency docs
- Query: "GDPR Article 17 compliance". BM25 catches exact "Article 17" reference, Vector catches related "right to erasure" content
Reranking Layer
Retrieve broad (top 50) with fast vector search, then rerank with a cross-encoder model to return top N.
1Query → Vector Search (Top 50) → Cross-Encoder Rerank → Top 5
When to add: When precision matters more than latency. Dramatically improves result quality with minimal latency cost.
| Query | Vector retrieves (noisy) | Reranker promotes |
|---|---|---|
| "Refund policy for damaged items?" | General refund docs, shipping docs | Specific damaged goods policy |
| "Terminate employee in California?" | HR docs, general termination, CA laws | CA-specific termination procedures |
| "Ibuprofen and alcohol side effects?" | Ibuprofen info, alcohol info | Specific interaction warnings |
HyDE (Hypothetical Document Embedding)
Generate a hypothetical answer first, embed that, then search for similar real documents.
1Query: "How do I handle database migrations in production?"2 │3 ▼4LLM generates hypothetical answer:5"To handle database migrations in production, you should use a6migration tool like Flyway or Liquibase. Always backup your7database first, run migrations during low-traffic periods..."8 │9 ▼10Embed the hypothetical answer11 │12 ▼13Search vector DB for real documents similar to this answer
Why it works: Queries are short and may not match document language. A hypothetical answer is longer, uses domain terminology, and is semantically closer to actual documents that answer the question.
When to add: Vague queries, abstract questions, or when query-document vocabulary mismatch is high.
| Query | Hypothetical Answer | Finds Documents About |
|---|---|---|
| "Why is my app slow?" | "N+1 queries, memory leaks, unoptimized indexes..." | Database optimization, memory profiling |
| "Best way to structure a team?" | "Cross-functional squads, matrix orgs, pod-based models..." | Org design, team topologies |
| "How do I not lose money?" | "Diversify investments, maintain emergency funds..." | Investment strategies, risk management |
Other Query Transformations
When to add: When users submit vague, incomplete, or poorly-worded queries.
| Technique | What It Does | Example |
|---|---|---|
| Query Expansion | Rewrite query into multiple variants, search all | "JS not working" → "JavaScript errors", "JS debugging", "script not loading" |
| Step-Back Prompting | Abstract the query first, then search | "Is 140/90 BP bad?" → "Blood pressure ranges and health implications" |
Hierarchical / Parent-Child Retrieval
Embed small chunks for precision, but retrieve the parent chunk (or full document) for context.
1Index: Small chunks (256 tokens)2Retrieve: Parent chunks (1024 tokens) or full sections
When to add: When retrieved chunks lack sufficient context for good generation.
| Small Chunk | Problem | Parent Chunk Provides |
|---|---|---|
| "The fee is 2.5% per transaction" | 2.5% of what? Which transactions? | Full pricing section with tiers, caps |
| "Patients should avoid this if pregnant" | Avoid what? What medication? | Full drug info with name, dosage |
| "Returns must be initiated within 30 days" | 30 days from what? | Full returns policy with definitions |
Contextual Retrieval
Prepend each chunk with LLM-generated context explaining where it fits in the document before embedding.
1Original: "The company reported $5M revenue."2Contextual: "This chunk is from Q3 2024 earnings report, Revenue section. The company reported $5M revenue."
When to add: When chunks lose meaning without document context (pronouns, references, abbreviations).
| Original Chunk | Problem | With Context |
|---|---|---|
| "It increased by 15% YoY" | What increased? Which year? | "From Acme Corp 2024 Annual Report, Operating Expenses section: ..." |
| "Users must complete this before proceeding" | Complete what? Where? | "From Onboarding Guide, Step 3 - Identity Verification: ..." |
| "The API returns a 429 error in this case" | Which API? What case? | "From Payment Gateway Docs, Rate Limiting section: ..." |
Agentic RAG
Instead of one-shot retrieval, the LLM orchestrates multi-step search:
- Analyze query
- Decide which index/tool to search
- Evaluate results
- Iterate if insufficient
When to add: Complex, multi-part questions that can't be answered in a single retrieval pass.
| Query | Agent Steps |
|---|---|
| "Compare Q3 revenue to competitors, suggest pricing changes" | Internal docs → Market data → Pricing strategy → Synthesize |
| "What caused Tuesday's outage? Fixed similar issues before?" | Incident reports → Extract cause → Historical incidents → Check status |
| "Find FAANG candidates with ML experience, draft outreach" | Resume search → Rank → Email templates → Personalize |
Self-RAG / Corrective RAG
After retrieval, the LLM critiques whether retrieved docs actually answer the question. If not, it reformulates and retries.
1Retrieve → Evaluate Relevance →2 If sufficient → Generate3 If insufficient → Reformulate query → Retrieve again
When to add: When hallucination reduction is critical and you can tolerate extra latency.
| Query | First Retrieval | Self-Correction |
|---|---|---|
| "Cancellation fee for enterprise plans?" | General pricing (no enterprise) | Reformulates → "enterprise cancellation terms" |
| "Integrate Salesforce with OAuth?" | Generic OAuth + unrelated Salesforce | Reformulates → "Salesforce OAuth integration tutorial" |
| "Tax implications of RSU vesting?" | General RSU overview (no tax) | Reformulates → "RSU taxation at vesting" |
Level 3: Graph + Vector Database
When to use: Data is interlinked in ways you know and understand
This is essential when relationships between entities are first-class citizens in your domain and users ask relational or comparative questions.
Ideal Use Cases
| Domain | Why Graph + Vector |
|---|---|
| Organizational Data | "Who reports to X's manager?" requires traversing relationships |
| Legal / Regulatory | Rules depend on each other: "If A applies, what exceptions exist under B?" |
| Medical / Scientific | Entities interconnect: bacteria → resistance → drugs → diseases |
| Product Catalogs | "Find alternatives to X that are compatible with Y" |
| Knowledge Bases | Multi-hop reasoning across connected concepts |
When Graph Adds Value
- Relationships are explicit and known
- Users ask questions requiring multi-hop traversal
- Accuracy on entity relationships is critical
- You need audit trails for why something was retrieved
Graph + Vector increases answer correctness when your data has inherent structure that pure semantic search would lose.
Decision Flowchart
1START2 │3 ▼4Is your data structured with known query patterns?5 │6 ├─ YES → Level 1: Statistical/Rule-Based7 │ (BM25 + metadata filters)8 │9 └─ NO10 │11 ▼12 Are relationships between entities critical to answers?13 │14 ├─ YES → Level 3: Graph + Vector15 │ (Knowledge graph + embeddings)16 │17 └─ NO → Level 2: Vector Database18 (Choose appropriate chunking method)
Quick Reference
| Data Characteristic | Recommended Approach |
|---|---|
| FAQ / Support tickets | Level 1: BM25 + popularity ranking |
| Technical documentation | Level 1: BM25 + metadata filters |
| Books / Long-form content | Level 2: Vector DB + semantic chunking |
| Mixed document corpus | Level 2: Vector DB + recursive chunking |
| Org charts / Hierarchies | Level 3: Graph + Vector |
| Legal with dependencies | Level 3: Graph + Vector |
| Medical knowledge base | Level 3: Graph + Vector |
Summary
- Start simple: If your queries are predictable, Level 1 (statistical methods) often produces the highest accuracy with the least complexity.
- Scale to vectors: When content is unstructured and self-contained, a well-chunked vector database handles semantic similarity effectively.
- Add graphs for relationships: When entities interconnect in meaningful ways, combining graph traversal with vector search significantly improves answer correctness.
The right approach depends on your data's nature, not on what's most technically sophisticated.
Thoughts? Hit me up at [email protected]