Word2Vec

The 2013 breakthrough that showed words can be represented as meaningful vectors.

The Big Idea

Word2Vec learns word embeddings by predicting context. Words that appear in similar contexts get similar vectors.

Famous example: king - man + woman = queen
Vector arithmetic captures semantic relationships!

Two Architectures

CBOW (Continuous Bag of Words)

Predict the target word from surrounding context words.

Context: "The cat ___ on the mat"
Predict: "sat"

Skip-gram

Predict context words from the target word.

Input: "sat"
Predict: "The", "cat", "on", "the", "mat"

Training

  • Train on large text corpora (billions of words)
  • Window size determines context (typically 5-10 words)
  • Typical dimensions: 100-300
  • Negative sampling speeds up training

Properties

  • Similar words cluster together
  • Analogies work via vector arithmetic
  • Captures syntactic AND semantic relationships

Limitations

  • One vector per word (no context sensitivity)
  • "bank" has same vector whether financial or river
  • Out-of-vocabulary words not handled
  • Superseded by contextual embeddings (BERT, etc.)

Legacy

Word2Vec proved that neural networks can learn meaningful representations, paving the way for modern embeddings like BERT and sentence transformers.