What is Cosine Similarity?
Cosine similarity measures the angle between two vectors, ignoring their magnitude. It's the cosine of the angle between them.
\[\text{similarity}(\vec{a}, \vec{b}) = \frac{\vec{a} \cdot \vec{b}}{|\vec{a}| \cdot |\vec{b}|} = \frac{\sum a_i b_i}{\sqrt{\sum a_i^2} \cdot \sqrt{\sum b_i^2}}\]
Value Range
1
Identical
Same direction
0
Orthogonal
No similarity
-1
Opposite
Opposite direction
Why Cosine for Embeddings?
Cosine similarity is preferred for embeddings because:
- Magnitude invariant: Focus on direction, not length
- Bounded output: Always between -1 and 1
- Works in high dimensions: Scales well to 1000+ dimensions
- Standard: What embedding models are optimized for
Cosine vs Euclidean Distance
| Aspect | Cosine Similarity | Euclidean Distance |
|---|---|---|
| Measures | Angle | Straight-line distance |
| Range | -1 to 1 | 0 to infinity |
| Magnitude matters? | No | Yes |
| Best for | Text, embeddings | Spatial data |
Note: For normalized vectors (length = 1), cosine similarity and Euclidean distance give equivalent rankings.
Code Examples
Python
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
a = np.array([[1, 2, 3]])
b = np.array([[2, 3, 4]])
similarity = cosine_similarity(a, b)[0][0]
# 0.9926
JavaScript
function cosineSimilarity(a, b) {
const dot = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
const magA = Math.sqrt(a.reduce((sum, ai) => sum + ai * ai, 0));
const magB = Math.sqrt(b.reduce((sum, bi) => sum + bi * bi, 0));
return dot / (magA * magB);
}
cosineSimilarity([1, 2, 3], [2, 3, 4]); // 0.9926
Applications in AI
- Semantic search: Find documents similar to query
- Recommendations: Find items similar to user preferences
- Deduplication: Find near-duplicate content
- Clustering: Group similar items together
- RAG: Retrieve relevant context for LLMs
Typical Thresholds
What similarity scores mean in practice (for text embeddings):
- > 0.9: Very similar / near duplicate
- 0.7 - 0.9: Related content
- 0.5 - 0.7: Somewhat related
- < 0.5: Likely unrelated