From Words to Sentences
While Word2Vec embeddings represent individual words, sentence embeddings capture the meaning of entire sentences, paragraphs, or documents.
Popular Models
| Model | Dimensions | Notes |
|---|---|---|
| all-MiniLM-L6-v2 | 384 | Fast, good quality |
| all-mpnet-base-v2 | 768 | Best open-source |
| text-embedding-3-small | 1536 | OpenAI, excellent |
| text-embedding-3-large | 3072 | OpenAI, highest quality |
| Cohere embed-v3 | 1024 | Multilingual |
How They Work
- Built on transformer architecture (BERT, etc.)
- Trained on sentence pairs (similar/dissimilar)
- Contrastive learning pulls similar sentences together
- Pool token embeddings into single vector
Using Sentence Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
sentences = [
"I love programming",
"Coding is my passion",
"The weather is nice"
]
embeddings = model.encode(sentences)
# embeddings[0] and embeddings[1] will be similar
Key Advantages
- Captures full sentence meaning, not just words
- Handles synonyms and paraphrases
- Works across languages (multilingual models)
- Fixed-size output regardless of input length
Choosing a Model
- Speed critical: all-MiniLM-L6-v2
- Best quality (open): all-mpnet-base-v2
- Production (API): OpenAI or Cohere
- Multilingual: multilingual-e5-large