Sentence Embeddings

Represent entire sentences and paragraphs as meaningful vectors.

From Words to Sentences

While Word2Vec embeddings represent individual words, sentence embeddings capture the meaning of entire sentences, paragraphs, or documents.

Popular Models

ModelDimensionsNotes
all-MiniLM-L6-v2384Fast, good quality
all-mpnet-base-v2768Best open-source
text-embedding-3-small1536OpenAI, excellent
text-embedding-3-large3072OpenAI, highest quality
Cohere embed-v31024Multilingual

How They Work

  • Built on transformer architecture (BERT, etc.)
  • Trained on sentence pairs (similar/dissimilar)
  • Contrastive learning pulls similar sentences together
  • Pool token embeddings into single vector

Using Sentence Transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

sentences = [
    "I love programming",
    "Coding is my passion",
    "The weather is nice"
]

embeddings = model.encode(sentences)
# embeddings[0] and embeddings[1] will be similar

Key Advantages

  • Captures full sentence meaning, not just words
  • Handles synonyms and paraphrases
  • Works across languages (multilingual models)
  • Fixed-size output regardless of input length

Choosing a Model

  • Speed critical: all-MiniLM-L6-v2
  • Best quality (open): all-mpnet-base-v2
  • Production (API): OpenAI or Cohere
  • Multilingual: multilingual-e5-large