Why Different Metrics?
Different problems require different definitions of "distance." The best metric depends on what you're measuring and the structure of your data.
Euclidean Distance (L2)
The straight-line distance. Most intuitive and commonly used.
- Use when: Physical distance matters
- Example: Location data, image similarity
Manhattan Distance (L1)
Sum of absolute differences. Named for city-block navigation.
- Use when: Grid-like movement, sparse data
- Example: Taxi routing, feature selection
Cosine Distance
Based on the angle between vectors. Ignores magnitude.
- Use when: Direction matters more than magnitude
- Example: Text similarity, recommendations, embeddings
Dot Product Distance
Negative dot product. Used when vectors are already normalized.
- Use when: Pre-normalized embeddings
- Example: Fast similarity search
Comparison Table
| Metric | Magnitude Sensitive | Best For |
|---|---|---|
| Euclidean | Yes | Physical measurements |
| Manhattan | Yes | Grid data, sparse vectors |
| Cosine | No | Text, embeddings |
| Dot Product | Yes* | Normalized vectors |
In AI/ML
Vector databases typically support multiple distance metrics. Cosine similarity is most common for text embeddings, while Euclidean works well for image features.