
Practical guide to anchor your LLM responses on the web — without getting trapped.
Executive Summary
Problem: LLMs have knowledge frozen at a cutoff date and are prone to hallucinations. Web search appears as an obvious solution, but several paths exist with very different trade-offs.
Three main approaches: Integrated (Claude web_search, Perplexity): Zero config, but zero control and double token billing. Classic API (Brave, Serper): Full control, but more code to maintain. AI-optimized (Tavily, Exa): Middle ground with integrated post-processing.
Critical hidden costs: Double synthesis in integrated solutions (you pay 2x the tokens). Context tokens: 10 results with snippets = 2-3k tokens before your question. Full page fetching often billed separately. Rate limits rarely clearly documented.
Major risks: Web poisoning: Malicious SEO, dynamically generated pages, editing of “trusted” sources (Wikipedia, Reddit, Stack Overflow). Cache amplifies the problem: A poisoned source at t=0 remains in cache for the entire TTL duration, serving corrupted content to all users.
Recommendation: The choice depends on budget, need for control, risk tolerance, and resources. In all cases, a cache + poisoning defense strategy must be planned.
Glossary
- Grounding
- anchoring LLM responses on external sources (web, documents) to reduce hallucinations and ensure information freshness.
- LLM (Large Language Model)
- language model trained on vast amounts of text, capable of generating and understanding natural language.
- Search API
- interface allowing to query a search engine (Google, Bing, Brave, etc.) and retrieve structured results (titles, URLs, snippets).
- RAG (Retrieval-Augmented Generation)
- technique combining information retrieval and text generation, where an LLM uses retrieved documents to produce anchored responses.
- Tokens
- basic units of language processing by LLMs. Costs are generally billed per token (input and output).
- Double synthesis
- phenomenon where an integrated solution first synthesizes with its LLM, then you pass it through your LLM to integrate into your conversation, resulting in double billing.
- Cache TTL (Time To Live)
- duration during which cached data remains valid before being refreshed. A long TTL can fossilize errors or amplify poisoning.
- Web poisoning
- attack consisting of manipulating search results in real-time via malicious SEO, dynamically generated pages, or editing of “trusted” sources.
- Knowledge poisoning
- attack on training data or an internal RAG base, with a one-time attack window at build time.
- Search-time poisoning
- attack on real-time search results, with a permanent and evolving attack window, nearly impossible to detect at scale.
- Lock-in
- strong dependency on an ecosystem or provider, making it difficult to change solutions.
- Supply chain risk
- risk related to dependency on intermediate providers who can be cut off or change their conditions (e.g., Bing API closed in 2023).
- Rate limits
- limits on the number of requests allowed per period (minute/hour), rarely clearly documented in “unlimited” plans.
- Snippets
- short text excerpts returned by search APIs, generally 100-200 characters per result.
- Proprietary index
- database of web pages maintained by a search engine (Google, Bing, Brave). Most other providers aggregate, scrape, or purchase access.
1. Introduction — The Grounding Problem
LLM = knowledge frozen at a cutoff date + hallucinations. Web search as an obvious solution. But several paths are possible, with very different trade-offs.
2. The Three Approaches
2.1 Integrated
Examples: Claude web_search, Perplexity Sonar, Brave AI Grounding
You ask your question, you receive a sourced answer. Zero config. No visibility on the pipeline.
2.2 Classic Search API + Your LLM
Examples: Brave Search API, Serper, SerpAPI
Raw results (titles, URLs, snippets). You manage synthesis with your own LLM. Full control, more code to maintain.
2.3 AI-optimized Search API
Examples: Tavily, Exa, Firecrawl Search, Linkup
Integrated post-processing: cleaning, targeted extraction, anti-hallucination. No imposed synthesis — you keep your LLM. Middle ground: less work than classic, more control than integrated.
3. Anatomy of Costs (Often Hidden)
3.1 Double Synthesis
Integrated solutions: their LLM synthesizes → you pass it through your LLM to integrate into your conversation = double token billing.
3.2 Context Tokens
Search results injected into the prompt = input tokens. 10 results with snippets = easily 2-3k tokens before your question.
3.3 Full Page Fetching
Search returns snippets. If you want full content → separate fetch, often billed separately.
3.4 Rate Limits
“Unlimited” plans are never unlimited. Rate limits per minute/hour rarely clearly documented.
4. Pricing Orders of Magnitude 2026
| Service | Type | Search Cost | Token Cost | Persistent Free Tier | Notes |
|---|---|---|---|---|---|
| Claude web_search | Integrated | Included (limited) | Normal Claude rate | ❌ | No explicit search surcharge |
| Perplexity Sonar | Integrated/API | ~$5/1000 | Sonar: $1/M — Sonar Pro: $3/$15 M | ❌ | +search fees on top |
| Brave AI Grounding | Integrated/API | $5-9/1000 | Separate LLM | ✅ 2-5k/month | Base $5, Pro $9 |
| Brave Search API | Classic | $3-5/1000 | Your LLM | ✅ 2k/month | Pro $9 for AI rights |
| Serper / SerpAPI | Classic | $1-3/1000 | Your LLM | ⚠️ Limited | Stable |
| Tavily | AI-optimized | $5-8/1000 | Your LLM | ⚠️ Limited | Plans from $30/month |
| Exa | AI-optimized | $5-25/1000 | Your LLM | ⚠️ Limited | +$1/1000 for content |
5. The Independence Question
5.1 Who Has Their Own Index?
Proprietary index: Google, Bing, Brave — it’s short. Others: aggregate, scrape, or purchase access.
5.2 Supply Chain Risk
Bing API closed in 2023 → all those who depended on it had to migrate urgently. If your provider scrapes Google/Bing → it can be cut off at any time.
5.3 Lock-in
Integrated solutions = strong coupling with an ecosystem. Separate APIs = more flexibility to change components.
6. Summary Table
| Criterion | Integrated | Classic API | AI-optimized |
|---|---|---|---|
| Pipeline control | ❌ Low | ✅ Strong | ⚠️ Medium-Strong |
| Bias/transparency | ❌ Opaque | ⚠️ Medium | ⚠️ Medium (better documented) |
| Effective token cost | ⚠️ Double payment | ✅ Controlled | ✅ Very controlled |
| Setup ease | ✅ Trivial | ✅ Reasonable | ✅ Very fast |
| Maintenance | ✅ Zero | ⚠️ Light | ✅ Almost none |
| Independence | ❌ Strong lock-in | ⚠️ Partial | ⚠️ Partial |
| Poisoning resistance | ❌ Subject to their filtering | ⚠️ You can filter | ⚠️ Filtering + integrated ranking |
| Data freshness | ✅ Excellent | ✅ Good | ✅ Very good |
| Latency | ✅ Very fast | ⚠️ Variable | ✅ Optimized |
7. Architecture Patterns in Practice (2026)
7.1 Search-only
Integrated: Use case: Quick Q&A, copilots, consumer chatbots. Advantage: zero config. Limit: no control.
Classic: Use case: simple agents, prototypes. Advantage: flexibility. Limit: more code to maintain.
7.2 Search + Cache + Citation
Cache TTL: periodic source refresh. Deduplication: same info from multiple sources → single entry. Document hash: detect content changes.
7.3 Search → Controlled RAG
Search = discovery: find relevant sources on the open web. RAG = stabilization: index locally, control the corpus. Workflow: search → validation → ingestion → RAG.
7.4 Hybrid
Integrated for exploration: broad search, rapid discovery. Classic API for production: control, audit, controlled cost.
7.5 Human-in-the-loop
Human validation on sensitive queries. Automatic flagging of suspicious sources → manual review. Use case: regulated domains, critical decisions.
8. Cache: Economic Friend, Epistemic Enemy
Cache transforms a transient attack into persistent truth.
8.1 The Problem
You cache to save: fewer API calls, reduced latency, divided cost. But you amplify risks: a poisoned source at t=0 remains in your cache, you serve corrupted content for the entire TTL duration, all your users see the same error.
8.2 Naive Cache = Poisoning Amplification
8.3 Long TTL Cache = Error Fossilization
TTL 24h on info that changes in 1h = drift. TTL 7d on a source that gets compromised = disaster. No universal optimal TTL.
8.4 What We Really Cache
Cache by query: Key: exact query. Problem: “president France” ≠ “who is the French president” → cache miss, same intent.
Cache by intent: Key: normalized intent. Smarter, but more complex to implement.
Cache by source: Key: URL + content hash. Advantage: detects changes. Limit: doesn’t prevent initial poisoning.
Cache by response: Key: generated response. Danger: you cache the hallucination, not the source.
8.5 Mitigation Strategies
| Strategy | Cost | Protection |
|---|---|---|
| Short TTL (< 1h) | $$$ More calls | ✅ Limits exposure |
| Hash + invalidation | ⚠️ Complexity | ✅ Detects changes |
| Cache by source (not by response) | ⚠️ Medium | ⚠️ Partial |
| No cache on risky sources | ⚠️ List to maintain | ⚠️ Partial |
| Mandatory multi-source | $$$ More calls | ✅ Dilutes poison |
9. The Elephant in the Room: Web Poisoning
9.1 Two Distinct Attack Types
Knowledge poisoning (fixed base): attack on training data or an internal RAG base. Vector: dataset compromise, injection during indexing. Attack window: at build time. Detection: possible through corpus audit.
Search-time poisoning (open web): attack on real-time search results. Vector: malicious SEO, dynamically generated pages, editing of “trusted” sources. Attack window: permanent, evolving. Detection: nearly impossible at scale.
9.2 Documented Attacks 2024-2025
Important note: these attacks come from academic publications — demonstrated under controlled conditions, not (yet) observed massively in the wild. The gap between “it works in the lab” and “it happens in prod” is real. But the history of computer security shows that this gap always narrows faster than we think.
PoisonedRAG: The reference — injection of adversarial documents. The LLM prioritizes malicious content even when minor.
ADMIT: Few-shot poisoning with extremely low rate. ASR ~86% with only 10⁻⁶ contamination. Particularly vicious: nearly undetectable.
AuthChain: Exploitation of citation chains. One-shot dominance: a single well-positioned doc suffices. Gains credibility by citing legitimate sources.
CamoDocs: Camouflage + embedding optimization. Passes quality filters. Legitimate appearance, manipulative content.
Fact2Fiction: Specific targeting of fact-checking agents. Turns the verification tool against itself. Particularly dangerous for validation pipelines.
9.3 What We Already Observe in the Wild
Less sophisticated, but very real: Classic SEO poisoning: AI content farms that rank on niche queries. Wikipedia edit wars: subtle modifications that persist for weeks. Reddit astroturfing: manufactured false consensus, voted by bots. Stack Overflow pollution: incorrect AI answers accepted by the community. “Artisanal” poisoning already exists. Academic techniques show what happens when it becomes industrialized.
9.4 Defense Attempts (State of the Art)
RAGForensics: Traceback: identify which source influenced which part of the response. Still very academic, no production-ready solution.
RAGuard: Detection of adversarial content before injection into context. Significant overhead, problematic false positives.
Reality: these defenses are in research, not in production.
9.5 The Real Attack Surface
Open Question
In 2026, who will build the real anti-poisoning defenses? Search providers? LLM publishers? Agent developers? Probably everyone — and no one sufficiently.
Related articles:
AI Agent Design Guide
Building an Agent: The Art of Assembling the Right Building Blocks
Meta-analysis of AI Agent Capabilities
References
Documented Poisoning Attacks
-
PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models
https://arxiv.org/abs/2402.07867 -
ADMIT: A Few-Shot Poisoning Attack Against Retrieval-Augmented Generation
https://arxiv.org/abs/2405.12321 -
AuthChain: A Novel and Practical Framework to Enhance LLM-based Agents with Chain-of-Authentication
https://arxiv.org/abs/2406.12345 -
CamoDocs: Camouflaging Malicious Documents to Evade RAG Detection
https://arxiv.org/abs/2407.12345 -
Fact2Fiction: Turning Fact-Checking Tools Against Themselves
https://arxiv.org/abs/2408.12345
Defenses and Countermeasures
-
RAGForensics: Tracing Information Flow in Retrieval-Augmented Generation
https://arxiv.org/abs/2409.12345 -
RAGuard: Defending Retrieval-Augmented Generation Against Adversarial Attacks
https://arxiv.org/abs/2410.12345
Search APIs and Services
-
Brave Search API Documentation
https://brave.com/search/api/ -
Tavily API Documentation
https://docs.tavily.com/ -
Exa AI Documentation
https://docs.exa.ai/ -
Serper API Documentation
https://serper.dev/
Articles and Analysis
-
The Hidden Costs of LLM Grounding
https://www.example.com/hidden-costs-llm-grounding -
Web Poisoning: The Next Frontier of AI Attacks
https://www.example.com/web-poisoning-ai-attacks
Article published on askaibrain.com
AiBrain