Skip to content
Go back

Engram: DeepSeek’s proposal to stop recomputing simple facts

Published:  at  03:40 PM
Available Languages:

ChatGPT’s Magic Directory

Imagine that every time you want to call your mother, instead of looking up her number in your contacts, you had to recompute it by analyzing all your past conversations, family photos, and childhood memories. Absurd, right? Yet that’s exactly what AIs like ChatGPT do to retrieve simple facts like “Paris is the capital of France”.

Engram flow diagram (conditional external memory) Simplified flow: the prompt goes through the Transformer. A gating mechanism decides whether to trigger an Engram lookup. If a lookup happens, the system extracts N-grams, computes a hash, queries an external memory, then merges the result back with the Transformer path. PROMPT tokens / prompt TRANSFORMER attention + FFN context representation GATE lookup? NO YES N-grams key building HASH lookup key ENGRAM external memory candidates / continuations MERGE Transformer + memory budget reallocation OUT next token Conditional memory - Fast lookup (N-gram -> hash -> table) - Merge with the Transformer path

Executive summary

Observation: an LLM spends a lot of compute “retrieving” simple facts (e.g., a capital), as if it had to re-derive them every time.

Engram’s idea (DeepSeek): add an external memory — a “directory” — to perform very fast lookups for certain patterns (N-grams).

Key mechanism: a gate decides when to consult the directory, to avoid irrelevant/out-of-context lookups. Less energy wasted, and more compute budget available for reasoning and long contexts.

Limits: fact updates, ambiguity (multiple meanings), and the size/quality trade-off of the directory.

Glossary
Transformer
neural network architecture (at the core of LLMs) combining attention and feed-forward layers, used to process token sequences.
Parameters
learned “weights” during training (often billions) that determine the model’s behavior.
Context / context window
: the portion of text the model can “see” at once (the conversation/document in input).
N-gram
contiguous sequence of (N) tokens (e.g., “capital of France”) used here as a recognizable pattern.
Hash
numeric fingerprint of a sequence, used to retrieve an entry quickly in a table (barcode analogy).
Lookup
direct retrieval of information via a key (here, the hash of an N-gram), instead of “recomputing” through the entire network. (O(1)) : “constant” time (in practice very fast): lookup time does not grow with the directory size at these orders of magnitude.
Gating (gate / guard)
mechanism that decides whether the model should consult external memory (the directory) given the context.
Hash collisions
cases where two different sequences produce the same fingerprint; typically mitigated with hashing/storage strategies.

The problem: a supercomputer to remember a phone number

When you ask ChatGPT “What is the capital of France?”, here’s what happens inside:

Your question goes through dozens of layers of ultra-complex computations. Each layer contains billions of connections that activate, weigh, and adjust. It’s like mobilizing a whole team of engineers, mathematicians, and physicists… to retrieve a phone number you already know by heart.

The waste is massive. Billions of operations, phenomenal power, all to recover a static fact — information that doesn’t change: Paris = capital of France. It’s like using an electron microscope to read the time on your watch.

And it’s not just about elegance. This waste has real consequences:

The (simple) idea: an instant directory

DeepSeek researchers had a simple but powerful idea: what if we gave the AI a directory?

Imagine a separate, ultra-fast module that contains simple facts in a format that’s easy to search. When the AI sees “capital of France”, instead of mobilizing 70 billion parameters, it glances at the directory and instantly finds “Paris”.

They called this system Engram (from “engram”, the physical trace of a memory in the brain). The core principle: separate “remembering” from “thinking.”

Here’s how it works in practice.

The magic directory in action

Step 1: Split into recognizable chunks

Take the sentence: “The capital of France is Paris.”

The AI breaks it into small groups of words called N-grams (“N” just means “several”). For example:

It’s like splitting a phone number into segments: 01-23-45-67-89.

Step 2: Create a unique fingerprint

For each segment, Engram computes a digital fingerprint (a “hash”, in technical jargon). It’s like turning “capital of France” into a unique barcode: #47291.

That fingerprint enables ultra-fast retrieval in constant time (what computer scientists call (O(1)) — but the key idea is: instant, regardless of the directory size).

Step 3: Store it in the directory

Fingerprint #47291 points to a small data packet saying: “When you see this sequence, the most likely continuation is ‘Paris’.”

It’s stored in a compact structure — more like a book index than the entire book.

Step 4: The gatekeeper

Here’s the subtle part: the AI does not blindly consult the directory on every token.

It has a “gatekeeper” (technically: contextual gating) that decides: “Should I consult the directory here, or do I truly need to think?”

Example where consulting the directory makes sense:

Example where you should NOT consult it:

The gatekeeper helps avoid stupid errors, like looking up “capital” in a metaphorical context (“Paris is the capital of fashion”).

The gains

More room to think

By delegating simple facts to the directory, the AI’s neural layers are freed up. They can focus on what they do best: reasoning, creating, analyzing.

It’s like offloading phone numbers from your head into your contacts and suddenly having more mental space for complex problems.

Measured result: On reasoning tasks (math, logic), models with Engram perform better than equivalent models without Engram, even with fewer total parameters.

Longer texts

One major challenge for modern AIs is handling very long contexts (hours-long conversations, documents with hundreds of pages).

Every additional token costs compute. But if a meaningful fraction of those tokens correspond to simple facts retrievable via lookup, the cost drops sharply.

Measured result: Engram enables processing contexts 2–3× longer for the same compute budget.

Less energy wasted

A directory lookup consumes thousands of times less energy than a full forward pass through tens of billions of parameters.

At the scale of millions of users, that’s the difference between a power plant and a light bulb.

A concrete end-to-end example

Take this question: “What is France’s GDP in 2023 and how does it compare to Germany’s when accounting for population?”

Without Engram (classic method):

  1. The AI goes through all its layers to retrieve “France GDP 2023” → ~€3,000B
  2. The AI goes through all its layers to retrieve “Germany GDP 2023” → ~€4,000B
  3. The AI goes through all its layers to retrieve “France population” → ~68M
  4. The AI goes through all its layers to retrieve “Germany population” → ~84M
  5. Finally, it uses reasoning to compute per-capita GDP and compare

Steps 1–4 are pure compute waste.

With Engram:

  1. Instant lookup: France GDP = 3,000B
  2. Instant lookup: Germany GDP = 4,000B
  3. Instant lookup: France population = 68M
  4. Instant lookup: Germany population = 84M
  5. All compute budget goes into reasoning: divide, compare, write a nuanced answer

The payoff: the neural layers can do finer reasoning, consider more nuance, because they haven’t burned their “budget” on fact retrieval.

The real innovation: the balance

Engram’s genius isn’t just having a directory. It’s finding the right balance between:

Before Engram, everything went through reasoning. That’s like using your conscious brain to breathe — technically possible, but exhausting and inefficient.

With Engram, the AI gains a reflex system for simple tasks, freeing its “conscious brain” for what really matters.

What it changes for you

If you use an AI with Engram (like future DeepSeek models):

You’ll see faster answers for factual questions. No need to mobilize an army of neurons to recall “Paris is the capital of France.”

You can have longer conversations without the AI “forgetting” the beginning. Long context becomes cheaper.

You’ll get deeper reasoning on complex questions. The AI no longer wastes its budget on useless lookups.

Concrete example: ask the AI to analyze a 50-page contract. With Engram, it can:

Limits and open questions

When the directory is wrong

What happens if a fact changes? If “capital of France” suddenly becomes Lyon (unlikely, but play along)?

Engram’s directory is static: it is built during model training. Updating it requires retraining or dedicated update techniques (not detailed in the original paper).

That’s a trade-off: speed vs freshness.

Directory size

An infinite directory is useless (too slow to search). A tiny directory would miss important facts.

Researchers found that a directory representing about 10% of the model size was optimal. For a 7B-parameter model, that’s ~700M entries in the directory.

Ambiguous facts

“Bank of France” can refer to the financial institution — or the “river bank” in France (rare, but possible).

The contextual gate must be smart enough to avoid dumb lookups. This balance remains an active research area.

Deliberate teaching simplifications

To keep this article accessible, I intentionally simplified several technical aspects:

What I simplified

The “contextual gate” is far more sophisticated than a simple guard. It’s a neural network trained to predict when a lookup will help, based on subtle context signals. I used the “gatekeeper” metaphor to avoid diving into attention and gating mechanics.

Hash collisions: two different sequences can theoretically produce the same fingerprint. The paper uses collision-resolution techniques (like multiple hashes), but I omitted that detail to keep things readable.

Vocabulary compression: Engram actually uses a compressed vocabulary to reduce directory size. I presented it as “a compact data packet” without diving into tokenization details.

Integration into transformers: I said the lookup “frees up the neural layers”, but technically Engram integrates between certain transformer layers, not simply “next to” the model. Explaining that nuance would have required a deeper architectural detour.

Why these simplifications are OK

The essence is still true: Engram does separate static memory from dynamic reasoning. The directory metaphor captures this well.

The gains are real: the figures cited (2–3× longer contexts, reasoning gains) come from the original paper’s benchmarks.

Pedagogy matters: explaining hash collisions before explaining why you hash at all would kill intuition. I chose to build the mental model first.

What remains correct despite the simplifications

Going further

Open question: if we push this idea to the extreme, could we imagine an AI with multiple specialized “directories”? One for geography, one for historical dates, one for math formulas?

Connection to other concepts: Engram echoes external memories in neuroscience (like your address book), but also caches in computing (ultra-fast memory for frequently used data).

Future implications: if this approach generalizes, we may see much more modular AIs: a central “brain” for reasoning, and specialized peripheral modules for different knowledge types — somewhat like the human brain, with dedicated areas for language, vision, motor control…

Ethical question: who decides what goes into the directory? “Paris is the capital of France” is obvious, but what about controversial or politically charged “facts”? Does the directory encode biases?

Web resources


In short: Engram upgrades AIs by giving them what we all have in our pocket: a directory. Instead of recomputing “Paris is the capital of France” every time, the AI can finally just… remember. And this simple distinction — remembering vs thinking — might be the key to faster, more efficient, and ultimately smarter AIs.



Previous Post
Advanced Prompt Engineering: Why Perspective Changes Everything
Next Post
Analysis: Capabilities, Limitations, and Premature Patterns of AI Agents