Word2Vec

The 2013 breakthrough that showed words can be represented as meaningful vectors.

The Big Idea

Word2Vec learns word embeddings by predicting context. Words that appear in similar contexts get similar vectors.

Famous example: king - man + woman = queen
Vector arithmetic captures semantic relationships!

Two Architectures

CBOW (Continuous Bag of Words)

Predict the target word from surrounding context words.

Context: "The cat ___ on the mat"
Predict: "sat"

Skip-gram

Predict context words from the target word.

Input: "sat"
Predict: "The", "cat", "on", "the", "mat"

Training

Train on large text corpora (billions of words)
Window size determines context (typically 5-10 words)
Typical dimensions: 100-300
Negative sampling speeds up training

Properties

Similar words cluster together
Analogies work via vector arithmetic
Captures syntactic AND semantic relationships

Limitations

One vector per word (no context sensitivity)
"bank" has same vector whether financial or river
Out-of-vocabulary words not handled
Superseded by contextual embeddings (BERT, etc.)

Legacy

Word2Vec proved that neural networks can learn meaningful representations, paving the way for modern embeddings like BERT and sentence transformers.