Distance Metrics

Different ways to measure the "distance" between vectors.

Why Different Metrics?

Different problems require different definitions of "distance." The best metric depends on what you're measuring and the structure of your data.

Euclidean Distance (L2)

The straight-line distance. Most intuitive and commonly used.

\[d(\vec{a}, \vec{b}) = \sqrt{\sum_{i=1}^{n}(a_i - b_i)^2}\]
  • Use when: Physical distance matters
  • Example: Location data, image similarity

Manhattan Distance (L1)

Sum of absolute differences. Named for city-block navigation.

\[d(\vec{a}, \vec{b}) = \sum_{i=1}^{n}|a_i - b_i|\]
  • Use when: Grid-like movement, sparse data
  • Example: Taxi routing, feature selection

Cosine Distance

Based on the angle between vectors. Ignores magnitude.

\[d(\vec{a}, \vec{b}) = 1 - \frac{\vec{a} \cdot \vec{b}}{|\vec{a}||\vec{b}|}\]
  • Use when: Direction matters more than magnitude
  • Example: Text similarity, recommendations, embeddings

Dot Product Distance

Negative dot product. Used when vectors are already normalized.

\[d(\vec{a}, \vec{b}) = -\vec{a} \cdot \vec{b}\]
  • Use when: Pre-normalized embeddings
  • Example: Fast similarity search

Comparison Table

Metric Magnitude Sensitive Best For
Euclidean Yes Physical measurements
Manhattan Yes Grid data, sparse vectors
Cosine No Text, embeddings
Dot Product Yes* Normalized vectors

In AI/ML

Vector databases typically support multiple distance metrics. Cosine similarity is most common for text embeddings, while Euclidean works well for image features.

Pro tip: Many embedding models produce normalized vectors, making cosine and dot product equivalent!
💡

Distance Metrics Explained Simply

Imagine you need to get from point A to point B in a city. There are different ways to measure "how far" that is:

  • Euclidean Distance (Straight Line): This is like a bird flying directly from A to B. It's the shortest possible path - a straight line through the air. In math terms, it's the familiar "as the crow flies" distance you learned in school.
  • Manhattan Distance (City Blocks): Now imagine you're in a taxi in Manhattan. You can't fly - you have to drive along streets, making turns at corners. The distance is the total blocks traveled (up/down + left/right). It's called Manhattan distance because NYC is laid out in a grid!
  • Cosine Similarity (Direction Only): This one is different - it doesn't care how far apart two things are, only whether they're pointing in the same direction. Think of two arrows: even if one is tiny and one is huge, if they point the same way, cosine says they're similar. This is perfect for comparing documents - a short article and a long book about the same topic will have similar "directions" in meaning.
  • Dot Product: When your vectors are already normalized (all the same length), this is a fast shortcut that gives you the same answer as cosine similarity but with less computation.

Quick rule of thumb: Use Euclidean for physical things (locations, images), Manhattan for grid-like or sparse data, and Cosine for text and semantic meaning where the "size" of data doesn't matter - only its direction in meaning-space.

Frequently Asked Questions

What is Euclidean distance?

Euclidean distance is the straight-line distance between two points in space - the length of a line segment connecting them. It's calculated using the Pythagorean theorem extended to any number of dimensions. For two vectors, you find the difference in each dimension, square them, sum them up, and take the square root. It's the most intuitive distance metric and is also known as L2 distance or L2 norm.

What is the difference between Euclidean and Manhattan distance?

Euclidean distance measures the shortest straight-line path between two points (like flying), while Manhattan distance measures the path along grid lines (like walking city blocks). Euclidean uses squared differences and a square root; Manhattan uses absolute differences summed directly. Manhattan is often better for sparse or high-dimensional data because it's less affected by the "curse of dimensionality" and individual outlier dimensions have less impact.

When should I use cosine similarity?

Use cosine similarity when the direction of vectors matters more than their magnitude. This is ideal for text similarity (comparing documents of different lengths), recommendation systems, semantic search with embeddings, and any scenario where you care about the relationship or pattern in the data rather than absolute values. It's the go-to metric for NLP applications and vector embeddings from language models.

What is L1 vs L2 norm?

L1 norm (also called Manhattan norm or taxicab norm) is the sum of absolute values of vector components. L2 norm (also called Euclidean norm) is the square root of the sum of squared components. When used to measure distance between vectors, L1 gives Manhattan distance and L2 gives Euclidean distance. L1 is more robust to outliers and promotes sparsity in machine learning, while L2 gives smoother solutions and penalizes large values more heavily.

What is the difference between cosine similarity and cosine distance?

Cosine similarity measures how similar two vectors are (ranging from -1 to 1, where 1 means identical direction). Cosine distance is simply 1 minus the cosine similarity, converting it to a distance metric (ranging from 0 to 2, where 0 means identical). They contain the same information but are inverse of each other - use similarity when you want to find the most similar items, and distance when you need a proper distance metric for algorithms like k-NN.

Which distance metric is best for high-dimensional data?

For high-dimensional data, cosine similarity or Manhattan distance often work better than Euclidean distance. This is due to the "curse of dimensionality" - in high dimensions, Euclidean distances tend to become similar for all pairs of points, losing discriminative power. Cosine similarity avoids this by focusing on direction. Manhattan distance is also more stable because it doesn't square the differences, making it less sensitive to noise in individual dimensions.

How do vector databases use distance metrics?

Vector databases use distance metrics to find the most similar vectors to a query vector. When you perform a similarity search, the database calculates distances between your query and stored vectors using the configured metric. Most vector databases (Pinecone, Weaviate, Qdrant, Milvus) support multiple metrics. Cosine similarity is most common for text embeddings, while Euclidean is often used for image features. The choice of metric significantly impacts search results and should match your embedding model's training.

What is dot product distance and when is it equivalent to cosine?

Dot product distance is the negative of the dot product between two vectors. When vectors are normalized (have unit length), dot product becomes equivalent to cosine similarity because the denominator in the cosine formula equals 1. Many embedding models output normalized vectors, making dot product a faster alternative to cosine (fewer computations). Always check if your embeddings are normalized - if so, use dot product for speed; if not, use cosine for correct results.

Can I use different distance metrics for the same data?

Yes, you can use different metrics on the same data, and it often makes sense to experiment. Different metrics will rank similarities differently and may surface different nearest neighbors. However, the metric should match your use case and ideally match what the embedding model was trained with. For example, OpenAI embeddings are optimized for cosine similarity. Using Euclidean on them would work but might give suboptimal results.

What is Hamming distance and when is it used?

Hamming distance counts the number of positions where corresponding elements differ between two vectors. It's primarily used for binary vectors or strings. In machine learning, it's common for binary hash codes, error detection/correction in communications, and comparing binary feature vectors. For example, comparing two binary strings "1010" and "1001" gives a Hamming distance of 2 (two positions differ). It's extremely fast to compute using XOR operations.