The Big Idea
Word2Vec learns word embeddings by predicting context. Words that appear in similar contexts get similar vectors.
Famous example: king - man + woman = queen
Vector arithmetic captures semantic relationships!
Vector arithmetic captures semantic relationships!
Two Architectures
CBOW (Continuous Bag of Words)
Predict the target word from surrounding context words.
Context: "The cat ___ on the mat"
Predict: "sat"
Skip-gram
Predict context words from the target word.
Input: "sat"
Predict: "The", "cat", "on", "the", "mat"
Training
- Train on large text corpora (billions of words)
- Window size determines context (typically 5-10 words)
- Typical dimensions: 100-300
- Negative sampling speeds up training
Properties
- Similar words cluster together
- Analogies work via vector arithmetic
- Captures syntactic AND semantic relationships
Limitations
- One vector per word (no context sensitivity)
- "bank" has same vector whether financial or river
- Out-of-vocabulary words not handled
- Superseded by contextual embeddings (BERT, etc.)
Legacy
Word2Vec proved that neural networks can learn meaningful representations, paving the way for modern embeddings like BERT and sentence transformers.