AI & ML

How to Design Your Knowledge Base for RAG

A practical framework for designing knowledge bases that power RAG systems. Covers when to use BM25, vector databases, and knowledge graphs.

Krishna C
Krishna C

January 20, 2026

9 min read

TL;DR

Your knowledge base design should match your data, not the hype. Start with BM25 if queries are predictable. Use vector search for unstructured content. Add a knowledge graph only when entity relationships matter for answering questions.

I've designed knowledge bases that ranged from simple BM25 search to full graph+vector hybrids. The biggest lesson? Most teams overcomplicate this. They jump straight to vector databases when a well-tuned keyword search would work better for their data.

Here's the framework I use to decide how to design a knowledge base.

The Three Level RAG Hierarchy

Level 1: Statistical and Rule-Based Retrieval

When to use: Predictable queries, structured data, known access patterns

This is the highest-accuracy approach when your domain is well-understood. No vector database required. It often outperforms naive vector search because it's precise when query patterns are known.

AlgorithmBest For
BM25 / TF-IDFKeyword-heavy domains (technical documentation)
Recency WeightingNews, logs, time-sensitive content
Frequency / PopularityFAQs, common support queries
Rule-Based RoutingWhen you know which document answers which query type
Metadata FilteringDate ranges, categories, authors, document types

Many production systems use BM25 + metadata filters and outperform naive vector search because they're precise when the domain is well-understood.

Level 2: Vector Database Retrieval

When to use: Data is substantial but not interrelated. Books, images, documents where content is self-contained.

This is the right choice when semantic similarity matters and you need to find conceptually related content rather than exact keyword matches.

Chunking Methods

MethodHow It WorksBest For
Fixed SizeSplit every N tokens with overlapGeneral purpose, simple baseline
Sentence BasedSplit on sentence boundariesConversational content, Q&A
Paragraph / SectionRespect document structureWell-formatted docs (markdown, HTML)
Semantic ChunkingSplit when embedding similarity dropsVaried content, topic shifts
Recursive / HierarchicalTry large chunks, split if too bigMixed document types
Document-SpecificCode: by function; Legal: by clauseDomain-specific corpora
Agentic ChunkingLLM decides chunk boundariesHigh-value, complex documents

Chunk Size Guidelines

  • Smaller chunks (128-256 tokens): Higher precision, better for specific facts
  • Larger chunks (512-1024 tokens): More context, but noisier retrieval

Chunk size matters as much as chunking method. Start with 256-512 tokens and adjust based on retrieval quality.

Add-Ons to Enhance Level 2

These techniques layer on top of basic vector retrieval to improve accuracy and handle edge cases.

Hybrid Search (Sparse + Dense)

Combine BM25 (keyword) and vector search with weighted fusion. Catches both exact keyword matches and semantic similarities.

1Query → [BM25 Top-K] ─┐
2 ├→ RRF Fusion → Results
3Query → [Dense Top-K] ┘

When to add: Almost always. This is becoming the default in production systems.

Example scenarios:

  • Query: "Error code 0x8007045D". BM25 catches exact error code match, Vector catches related I/O error docs
  • Query: "Python async await tutorial". BM25 catches exact "async await" keywords, Vector catches conceptually similar concurrency docs
  • Query: "GDPR Article 17 compliance". BM25 catches exact "Article 17" reference, Vector catches related "right to erasure" content

Reranking Layer

Retrieve broad (top 50) with fast vector search, then rerank with a cross-encoder model to return top N.

1Query → Vector Search (Top 50) → Cross-Encoder Rerank → Top 5

When to add: When precision matters more than latency. Dramatically improves result quality with minimal latency cost.

QueryVector retrieves (noisy)Reranker promotes
"Refund policy for damaged items?"General refund docs, shipping docsSpecific damaged goods policy
"Terminate employee in California?"HR docs, general termination, CA lawsCA-specific termination procedures
"Ibuprofen and alcohol side effects?"Ibuprofen info, alcohol infoSpecific interaction warnings

HyDE (Hypothetical Document Embedding)

Generate a hypothetical answer first, embed that, then search for similar real documents.

1Query: "How do I handle database migrations in production?"
2
3
4LLM generates hypothetical answer:
5"To handle database migrations in production, you should use a
6migration tool like Flyway or Liquibase. Always backup your
7database first, run migrations during low-traffic periods..."
8
9
10Embed the hypothetical answer
11
12
13Search vector DB for real documents similar to this answer

Why it works: Queries are short and may not match document language. A hypothetical answer is longer, uses domain terminology, and is semantically closer to actual documents that answer the question.

When to add: Vague queries, abstract questions, or when query-document vocabulary mismatch is high.

QueryHypothetical AnswerFinds Documents About
"Why is my app slow?""N+1 queries, memory leaks, unoptimized indexes..."Database optimization, memory profiling
"Best way to structure a team?""Cross-functional squads, matrix orgs, pod-based models..."Org design, team topologies
"How do I not lose money?""Diversify investments, maintain emergency funds..."Investment strategies, risk management

Other Query Transformations

When to add: When users submit vague, incomplete, or poorly-worded queries.

TechniqueWhat It DoesExample
Query ExpansionRewrite query into multiple variants, search all"JS not working" → "JavaScript errors", "JS debugging", "script not loading"
Step-Back PromptingAbstract the query first, then search"Is 140/90 BP bad?" → "Blood pressure ranges and health implications"

Hierarchical / Parent-Child Retrieval

Embed small chunks for precision, but retrieve the parent chunk (or full document) for context.

1Index: Small chunks (256 tokens)
2Retrieve: Parent chunks (1024 tokens) or full sections

When to add: When retrieved chunks lack sufficient context for good generation.

Small ChunkProblemParent Chunk Provides
"The fee is 2.5% per transaction"2.5% of what? Which transactions?Full pricing section with tiers, caps
"Patients should avoid this if pregnant"Avoid what? What medication?Full drug info with name, dosage
"Returns must be initiated within 30 days"30 days from what?Full returns policy with definitions

Contextual Retrieval

Prepend each chunk with LLM-generated context explaining where it fits in the document before embedding.

1Original: "The company reported $5M revenue."
2Contextual: "This chunk is from Q3 2024 earnings report, Revenue section. The company reported $5M revenue."

When to add: When chunks lose meaning without document context (pronouns, references, abbreviations).

Original ChunkProblemWith Context
"It increased by 15% YoY"What increased? Which year?"From Acme Corp 2024 Annual Report, Operating Expenses section: ..."
"Users must complete this before proceeding"Complete what? Where?"From Onboarding Guide, Step 3 - Identity Verification: ..."
"The API returns a 429 error in this case"Which API? What case?"From Payment Gateway Docs, Rate Limiting section: ..."

Agentic RAG

Instead of one-shot retrieval, the LLM orchestrates multi-step search:

  1. Analyze query
  2. Decide which index/tool to search
  3. Evaluate results
  4. Iterate if insufficient

When to add: Complex, multi-part questions that can't be answered in a single retrieval pass.

QueryAgent Steps
"Compare Q3 revenue to competitors, suggest pricing changes"Internal docs → Market data → Pricing strategy → Synthesize
"What caused Tuesday's outage? Fixed similar issues before?"Incident reports → Extract cause → Historical incidents → Check status
"Find FAANG candidates with ML experience, draft outreach"Resume search → Rank → Email templates → Personalize

Self-RAG / Corrective RAG

After retrieval, the LLM critiques whether retrieved docs actually answer the question. If not, it reformulates and retries.

1Retrieve → Evaluate Relevance →
2 If sufficient → Generate
3 If insufficient → Reformulate query → Retrieve again

When to add: When hallucination reduction is critical and you can tolerate extra latency.

QueryFirst RetrievalSelf-Correction
"Cancellation fee for enterprise plans?"General pricing (no enterprise)Reformulates → "enterprise cancellation terms"
"Integrate Salesforce with OAuth?"Generic OAuth + unrelated SalesforceReformulates → "Salesforce OAuth integration tutorial"
"Tax implications of RSU vesting?"General RSU overview (no tax)Reformulates → "RSU taxation at vesting"

Level 3: Graph + Vector Database

When to use: Data is interlinked in ways you know and understand

This is essential when relationships between entities are first-class citizens in your domain and users ask relational or comparative questions.

Ideal Use Cases

DomainWhy Graph + Vector
Organizational Data"Who reports to X's manager?" requires traversing relationships
Legal / RegulatoryRules depend on each other: "If A applies, what exceptions exist under B?"
Medical / ScientificEntities interconnect: bacteria → resistance → drugs → diseases
Product Catalogs"Find alternatives to X that are compatible with Y"
Knowledge BasesMulti-hop reasoning across connected concepts

When Graph Adds Value

  • Relationships are explicit and known
  • Users ask questions requiring multi-hop traversal
  • Accuracy on entity relationships is critical
  • You need audit trails for why something was retrieved

Graph + Vector increases answer correctness when your data has inherent structure that pure semantic search would lose.

Decision Flowchart

1START
2
3
4Is your data structured with known query patterns?
5
6 ├─ YES → Level 1: Statistical/Rule-Based
7 │ (BM25 + metadata filters)
8
9 └─ NO
10
11
12 Are relationships between entities critical to answers?
13
14 ├─ YES → Level 3: Graph + Vector
15 │ (Knowledge graph + embeddings)
16
17 └─ NO → Level 2: Vector Database
18 (Choose appropriate chunking method)

Quick Reference

Data CharacteristicRecommended Approach
FAQ / Support ticketsLevel 1: BM25 + popularity ranking
Technical documentationLevel 1: BM25 + metadata filters
Books / Long-form contentLevel 2: Vector DB + semantic chunking
Mixed document corpusLevel 2: Vector DB + recursive chunking
Org charts / HierarchiesLevel 3: Graph + Vector
Legal with dependenciesLevel 3: Graph + Vector
Medical knowledge baseLevel 3: Graph + Vector


Summary

  1. Start simple: If your queries are predictable, Level 1 (statistical methods) often produces the highest accuracy with the least complexity.

  1. Scale to vectors: When content is unstructured and self-contained, a well-chunked vector database handles semantic similarity effectively.

  1. Add graphs for relationships: When entities interconnect in meaningful ways, combining graph traversal with vector search significantly improves answer correctness.

The right approach depends on your data's nature, not on what's most technically sophisticated.

Thoughts? Hit me up at [email protected]

#ai

Next →

Time: The Only Commodity That Matters

Time is the only resource we can never get back. More precious than money, scarcer than gold. As we chase status and consumption, we trade away the very thing that gives life meaning.