Skip to content
Go back

King - Man + Woman = Queen : How AI Does Math with Words

Published:  at  03:37 PM
Available Languages:

King - Man + Woman = Queen : How AI Does Math with Words

Executive Summary

This article demystifies the famous analogy “king - man + woman = queen” by explaining how AI models, through embeddings, represent words as points in a multidimensional vector space. Semantic traits (royalty, gender, etc.) form axes, enabling arithmetic operations that capture linguistic relationships.

Key Points:

  • Words → numerical vectors (static embeddings like Word2Vec, contextual in Transformers).
  • Vector arithmetic: subtracting/adding modifies traits (e.g., changing gender).
  • Machine learning from textual contexts.
  • Limitations: pedagogical simplifications, potential biases.

Ideal for understanding the basics of computational semantics without advanced math.

Glossary: Understanding Technical Terms
Embedding
Representation of a word (or phrase) as a multidimensional numerical vector capturing its semantic traits. Close vectors correspond to similar meanings.
Word2Vec
: Learning algorithm to generate static embeddings by analyzing word contexts in large text corpora.
Transformer
Neural network architecture (basis of GPT, BERT) using an attention mechanism to produce contextual embeddings, adaptive to the sentence context.
Semantic vector
List of numbers (e.g., [0.9, -0.2, 0.7]) defining a word’s position in the space of meanings.
Vector analogy
Operation like king - man + woman that navigates semantic space to find an analogous word (queen).
Attention mechanism
Component of transformers that weights the importance of neighboring words to dynamically adjust embeddings.

You may have already seen this strange equation: “king - man + woman = queen”. How can you subtract one word from another? How can artificial intelligence solve math problems… with vocabulary? It’s as if words had a hidden mathematical existence. And that’s exactly the case.

Words as Points on a Map

Imagine you need to place all French words on a giant map. Not just any way: similar words must be close. “Cat” near “dog”, “king” near “queen”, “Paris” near “France”.

To achieve this, you decide to use feature axes. Like on a geographical map with latitude and longitude, each word will have coordinates. But instead of “north-south” and “east-west”, your axes represent meaning traits:

The word “king” would have coordinates like:

The word “queen” would be almost in the same place, but with a change:

This is what we call an embedding: transforming a word into a list of numbers that capture its meaning.

The Problem This Solves

For decades, computers treated words as unlinked labels. “King” and “queen” had nothing in common for a machine. Neither did “cat” and “dog”.

For an AI to understand language, it must grasp that:

Embeddings solve this problem by giving geometry to language. Similar words become close points in a mathematical space. And this proximity enables calculations.

Word Arithmetic Explained

Let’s return to our equation: king - man + woman = queen.

Imagine three simplified axes:

StepWordRoyaltyGenderNotes
1king9/102/10Masculine / Position: top right
2aman (to subtract)5/102/10Neutral (can be anyone) / Masculine
2bAfter subtraction409-5 / 2-2 (neutral now)
3awoman (to add)5/108/10Neutral / Feminine
3bFinal result984+5 / 0+8

We removed the “masculine” trait and part of social normality, then added the “feminine” trait.

These are exactly the coordinates of queen!

Why It Works: Directions Have Meaning

The magic is that differences between words capture pure relationships.

“Man” - “woman” creates a vector (an arrow) representing the change from masculine to feminine gender. This arrow has the same direction and length as “king” - “queen”, or “uncle” - “aunt”, or “actor” - “actress”.

It’s as if language had universal directions:

By navigating according to these directions, we can explore relationships between words mathematically.

The Running Example: Paris and Capitals

Let’s take another case: “Paris - France + Italy = ?”

StepWordCapitalFrenchnessSizeNotes
1Paris10/109/108/10Starting point
2aFrance (to subtract)5/1010/109/10Country, not city / Large country
2bAfter subtraction5-1-110-5 / 9-10 / 8-9
3aItaly (to add)5/101/108/10Not French
3bFinal result10075+5 / -1+1 / -1+8

We extracted “the essence of capital” by removing the French context, then added the Italian context.

The word closest to these coordinates? Rome.

How AI Learns These Coordinates

You might wonder: who decides that “king” is worth 9/10 in royalty? No one.

AI learns these coordinates automatically by reading billions of sentences. It uses a simple principle: words that appear in similar contexts have similar meanings.

If AI reads:

It deduces that “king”, “queen”, and “monarch” must be close in coordinate space, because they share the same neighbors (“wears”, “crown”).

Algorithms like word2vec adjust the coordinates of millions of words so this rule is respected everywhere. After days of calculation, words have found their natural place on the map.

The Difference with Modern Transformers

So far, we’ve talked about static embeddings: “king” always has the same coordinates.

But modern systems like ChatGPT use transformers, where coordinates change according to context.

Take the word “bank”:

In a transformer, “bank” doesn’t have a fixed position. Its coordinates are recalculated for each sentence, depending on neighboring words. The attention mechanism (another fascinating subject) enables these dynamic adjustments.

The arithmetic “king - man + woman” still works, but it becomes more subtle: “king“‘s coordinates now depend on the sentence where it appears.

Journey Recap

You’ve just understood how AI transforms words into mathematics:

  1. Each word becomes a point in a space with hundreds of dimensions
  2. Each dimension captures a meaning trait (royalty, gender, etc.)
  3. Similar words are close points
  4. Subtracting or adding words modifies these coordinates
  5. Relationships between words become geometric directions

“King - man + woman = queen” isn’t magic: it’s navigation in the space of meaning.

Pedagogical Simplifications

To make this concept accessible, I’ve made several intentional simplifications:

What has been simplified:

  1. The number of dimensions: I talked about 3-4 axes (royalty, gender, age) when real embeddings have 300 to 1000. Impossible to visualize 768 dimensions, so we reduce to what our brain can imagine.

  2. Dimension interpretability: I named the axes (“royalty”, “gender”). In reality, dimensions are learned automatically and don’t have a clear name. Dimension 247 doesn’t obviously mean “royalty”. Some dimensions capture fuzzy combinations of several traits.

  3. Calculation precision: I used scores out of 10 to simplify. Real embeddings are decimal numbers between -1 and 1 (or other scales), with extreme precision.

  4. Word2vec complexity: I said “AI reads sentences and learns”. In reality, word2vec uses neural networks that predict neighboring words, with complex mathematical functions (softmax, gradient descent).

  5. Transformers: I mentioned that embeddings become contextual, but I didn’t explain the attention mechanism that enables this. That’s another entire article.

Why these simplifications are OK:

What remains rigorously exact:

If you remember that words have geometry and that relationships between words are directions, you’ve understood the essentials. The rest are technical details to refine this intuition.

Going Further

Now that you’ve grasped the principle, questions open up:

You now have the foundations to explore these territories. Word arithmetic is only the beginning of a world where meaning becomes calculable.

Web Resources



Next Post
Building an Agent: The Art of Assembling the Right Building Blocks