Skip to content
Go back

RAG: Stop Searching, Start Classifying

Published:  at  01:00 AM
Available Languages:

RAG: Stop Searching, Start Classifying

Why your RAG should be a library, not a search engine

Executive Summary

Observation: Similarity search (top-k) works in demos but breaks in production as soon as questions are ambiguous, multi-hop, global, or when you need to arbitrate contradictions.

Core idea: A strong RAG should behave like a library (index, categories, navigation, levels of abstraction), not a flat search engine based solely on semantic proximity.

Implication: You must structure the corpus (metadata, hierarchies, relationships) and use strategies adapted to the question type, with progressive retrieval to control cost and quality.

Options: Hierarchical indexing (levels), RAPTOR (summary tree), GraphRAG (knowledge graph + communities), Agentic RAG (librarian agent/router).

Glossary
RAG (Retrieval-Augmented Generation)
approach where an LLM generates answers using context retrieved from a corpus.
Chunk
a document fragment (often a few paragraphs) indexed for retrieval and injected into the LLM context.
Embedding
vector representation of text used to measure semantic proximity.
Vector store
database optimized to store embeddings and run similarity / k-NN searches.
Similarity search / top-k
retrieving the (k) “closest” items in embedding space.
Precision / recall
retrieval metrics: precision = fraction of retrieved items that are relevant; recall = ability to include relevant items.
Multi-hop
a question that requires chaining multiple facts/relations (e.g., entity → organization → attribute).
Reranking
re-sorting candidates (often with a cross-encoder) after an initial retrieval to improve precision.
Hybrid search
combining lexical search (BM25/keywords) and vector search.
Metadata filtering
constraining retrieval by attributes (date, source, domain, type) before or during search.
Hierarchical indexing
organizing the corpus into levels (summary → sections → details) to navigate from general to specific.
RAPTOR
method that builds a tree of summaries via recursive clustering, providing multiple abstraction levels.
Knowledge graph
graph of entities and relationships enabling explicit traversals (instead of similarity approximations).
GraphRAG
approach that combines entity/relation extraction, community clustering, and summaries to answer global and multi-hop questions.
Agentic RAG
an agent that dynamically chooses the best retrieval strategy (filters, search, graph traversal, etc.) based on the question.
Progressive retrieval
retrieving a summary/card first, then drilling down into details only if needed.


The problem: similarity search is not enough

You have a corpus, embeddings, a vector store, a top-k of 5. It works in demos. In production, it breaks.

The issue is not implementation — it’s the paradigm. Similarity search relies on an implicit assumption: the most semantically similar chunk is the most relevant. That’s often false.

Five recurring failures in production

ProblemWhat happensExample
Low precisionThe top-k returns semantically close chunks that aren’t relevant — noise, ambiguity, superficial false positives“vestibular treatment” matches “water treatment” because “treatment” dominates the vector
Multi-hop impossibleQuestions that require chaining multiple facts consistently fail“What degree does the CEO of the company that makes the F-150 have?” — no single chunk contains the whole answer
No aggregationImpossible to answer corpus-wide questions“What are the main themes?”, “How many publications talk about GVS?”
Unresolved conflictsTwo contradictory chunks, no mechanism to decideCEO of Twitter in 2022 vs 2023 — which is “more similar”? Both.
Embeddings ≠ meaningVectors capture semantic proximity, not business logicEntity relations, temporality, domain hierarchy — none of that is in an embedding

Common thread: we ask a similarity tool to do a structural understanding job. It’s like asking a spellchecker to validate an argument’s logic.

Two techniques have become reflexes to improve retrieval. They help — but they don’t change the paradigm.

Reranking adds a cross-encoder after top-k to better sort results. Precision improves, sometimes significantly. But the fundamental issue remains: if the right chunk isn’t in the initial candidate set, no reranker will make it appear. You optimize ordering, not coverage.

Hybrid search combines lexical search (BM25/keywords) and vector search. The gain is real — about ~20% recall improvement in common benchmarks. But the paradigm stays the same: flat search → sort → hope the right chunk is in the set. It’s a quick win, not an architectural shift.

Both techniques are useful. They should be the minimum. But treating them as the final solution is like putting a better engine in a car whose problem is the steering.


The analogy: library vs search engine

To understand the required shift, take a simple analogy.

A search engine is: you type words, you get “close” results. No structure, no navigation, no context. It’s statistical matching.

A library is: an index, categories, shelves, cards, classification codes, cross-references. The librarian doesn’t search by similarity — they navigate an organized structure. They know galvanic vestibular stimulation belongs in neuroscience, under neurophysiology, and that recent publications are sorted by date.

Today’s RAG is someone walking into a library, ignoring shelves and index, and flipping books at random looking for sentences that “look like” the question.

The paradigm shift is one sentence: move from “find the closest” to “navigate to the most relevant.”

Data is not a pile of vectors — it’s a corpus you can organize. Structure costs time to build. But it makes retrieval reliable, explainable, and auditable.


What do we actually need?

Before choosing a tool or a framework, you must state the needs. Six capabilities define robust retrieval.

#NeedWhat it enablesWhat flat search can’t do
1Navigation across abstraction levelsZoom from macro (themes) to meso (summaries) to micro (chunks)Top-k knows only one granularity level
2Metadata filteringConstrain search by date, source, domain, type, authority“Most similar” ≠ “most similar within this category, after this date”
3Entity relationshipsTraverse links: author → institution → publications → themesEvery multi-hop link would need an embedding match — fragile and often impossible
4Corpus-wide questions“What are the main themes?”, “How many publications on X?”Flat search is structurally unable to aggregate
5Adaptive strategyRetrieval method adapts to question typeFactual ≠ synthesis ≠ exploration — one pipeline won’t cover all
6Progressive retrievalReturn a summary first, then details if relevantSending 15 full chunks saturates context, costs money, and drowns signal

Need #6 deserves emphasis. In production, per-query cost and answer quality depend directly on the number of tokens sent to the LLM. Progressive retrieval — fetch a card first, evaluate relevance, then fetch details — is not a nice-to-have. It’s what makes the system viable. The librarian doesn’t hand you 15 full books. They hand you a card. You decide whether you want the book.

Running example: a scientific corpus

To make this concrete, use a realistic scenario: a corpus of 500 scientific publications about galvanic vestibular stimulation (GVS) covering neurophysiology, clinical applications, and virtual reality.

Here are questions researchers actually ask — and why flat search fails on each:

User questionPrimary needWhy flat search fails
“What are the major research themes in GVS?”#4 Global aggregationNo single chunk contains “the main themes”
“Summarize Fitzpatrick lab’s work on postural GVS”#3 Relationships + #1 AbstractionYou must link author → lab → publications → summary
“Which post-2022 publications cover GVS in VR?”#2 Metadata + similarityWithout a time filter, top-k mixes 2005 and 2023
“Compare sinusoidal vs noise stimulation protocols”#5 Adaptive strategyComparison ≠ factual lookup — you need structured sources on both sides
“Give me an overview of this paper, then details from the methods section”#6 Progressive retrievalSending the whole paper wastes tokens; retrieving only “methods” chunks misses context

Each solution in the next section is illustrated by its ability to answer one of these questions.


Documented solutions

Four approaches, each targeting different needs. None is universal — the right choice depends on the real complexity of your queries.

Hierarchical indexing — level-based index

The principle is simple: organize the corpus into levels, like a table of contents. Document summaries → sections → detailed chunks. Retrieval navigates from general to specific.

In our running example, when a researcher asks for an overview and then method details, the system returns the document summary first (macro), evaluates relevance, then drills down to the “methods” chunk (micro). No token waste, no noise.

Needs covered: #1 Abstraction navigation, #6 Progressive retrieval.

The trade-off is blunt: the hierarchy must be built upfront, and it must match the real structure of the content. A bad hierarchy can perform worse than flat search.

RAPTOR — recursive summary tree

RAPTOR pushes the idea further. Instead of an imposed hierarchy, it builds a bottom-up tree: chunks are clustered, each cluster is summarized by an LLM, summaries are clustered and summarized again, and so on. The result is a navigable tree where each node provides a different abstraction level.

On our GVS corpus, “What are the major research themes?” is answered by upper nodes — where summaries capture macro trends without requiring any individual chunk to contain that information.

Needs covered: #1 Abstraction navigation, #4 Corpus-wide questions, #6 Progressive retrieval.

The cost is significant: clustering + LLM summarization at each level. Quality depends directly on summary quality. A bad intermediate summary contaminates everything above it.

GraphRAG — knowledge graph + communities

GraphRAG changes representation. Instead of treating the corpus as a set of chunks, it extracts entities and relationships to build a knowledge graph. It then applies hierarchical clustering (Leiden) to identify communities of entities and generates summaries per community.

This is the approach that best answers multi-hop questions. “Summarize Fitzpatrick lab’s work on postural GVS” requires a graph traversal: Fitzpatrick → University of New South Wales → publications → filter by postural GVS. No vector search can reliably do that path — graph traversal does it natively.

Needs covered: #3 Entity relationships, #4 Corpus-wide questions, #1 Abstraction navigation.

The trade-off is heavy: high indexing cost (LLM-based entity extraction over the full corpus), maintenance complexity, and — often ignored — GraphRAG can underperform vanilla RAG on simple factual questions. The overhead is justified only if queries are truly complex.

Agentic RAG — the librarian agent

Agentic RAG doesn’t propose a new data structure — it adds a decision layer. An agent analyzes the question, chooses the optimal retrieval strategy, and orchestrates available tools: vector search, metadata filters, SQL, graph traversal, or combinations.

That’s the librarian. For “Which post-2022 publications cover GVS in VR?”, it doesn’t run raw vector search — it filters by time (date > 2022), then by topic (domain = VR), then runs semantic search within the filtered subset.

Needs covered: #5 Adaptive strategy — and potentially all others through orchestration.

Trade-off: implementation complexity, multi-step latency, and a critical dependency on routing quality. If the agent chooses poorly, results can be worse than a simple top-k.

Comparison of solutions

SolutionNeeds coveredIndexing costComplexityBest use case
Hierarchical indexing#1, #6MediumLowWell-structured corpus, factual questions needing progressive zoom
RAPTOR#1, #4, #6High (LLM)MediumUnstructured corpus, multi-level summaries and corpus-wide questions
GraphRAG#1, #3, #4Very high (LLM)HighMulti-hop queries, entity relations, dense technical/narrative corpora
Agentic RAG#5 (+ all via orchestration)VariableHighHeterogeneous queries requiring heterogeneous strategies

In practice: custom is often the best choice

Frameworks cover generic patterns

RAPTOR, GraphRAG, LlamaIndex offer ready-to-use architectures. They are well documented, tested, and a good starting point. But every domain has its own knowledge structure. The hierarchy of a medical corpus is nothing like a regulatory database or customer support content.

Decision synthesis

How do you choose between these approaches? Here is a decision matrix based on the nature of your corpus and your queries.

Your situationRecommended architectureWhy
Documentary corpus with clear structure (reports, regulations)Hierarchical indexing + metadata filteringNatural fit for an existing table of contents
Dense scientific corpus, frequent synthesis questionsRAPTORCan answer “What do we know about X?” without manual navigation
Highly relational data (entities, collaborations, causalities)GraphRAGTraversing Author–Protocol–Result relationships is essential
Unpredictable heterogeneous queries (sometimes SQL, sometimes semantic)Agentic RAGMaximum flexibility, at the cost of latency
Small volume (<1000 docs), simple queriesHybrid search + rerankingDon’t over-engineer; complexity must match the need

Build your own layer

The real work isn’t choosing a tool — it’s designing the index. Understand the domain structure, identify key relationships, choose meaningful abstraction levels. A tailored structuring layer often outperforms a plug-and-play framework applied as-is.

Custom’s advantage: full control over retrieval cost, granularity, and navigation logic. Drawback: you must know what you’re doing and be ready to invest design time.


Conclusion

What changes

Moving from “embed everything and top-k” to “structure, index, categorize, and let an agent navigate” is not an optimization — it’s a paradigm shift. Structured data is not overhead. It’s an investment. Progressive retrieval is not a nice-to-have. It’s what makes the system viable in production.

The question to ask

How would a human expert search in this corpus?

If the answer is “they flip things at random and take what looks similar” — you have a problem. If the answer is “they consult the index, identify the category, read the summary, then drill down” — build that.


Sources

ReferenceTypeURL
Seven Failure Points When Engineering a RAG System (2024)Paperarxiv.org
RAPTOR — Sarthi et al. (Stanford, 2024)Paperarxiv.org
GraphRAG — Edge et al. (Microsoft, 2024)Paperarxiv.org
IBM — RAG Problems PersistArticleibm.com
RAG Is a Data Engineering ProblemArticlesubstack.com
VectorHub — Hybrid Search & RerankingArticlesuperlinked.com
5 RAG Failures + Knowledge GraphsArticlefreecodecamp.org
PIXION — Hierarchical Index RetrievalArticlepixion.co
NirDiamant/RAG_TechniquesRepogithub.com
Beyond Vector Search — Next-Gen RAGArticlemachinelearningmastery.com
LlamaIndex — Structured Hierarchical RetrievalDocllamaindex.ai
Microsoft GraphRAGRepogithub.com
RAPTORRepogithub.com


Next Post
LLM Grounding in 2026: Options, Hidden Costs, and Risks