Cosine Similarity: Measure Vector Similarity

What is Cosine Similarity?

Cosine similarity measures the angle between two vectors, ignoring their magnitude. It's the cosine of the angle between them.

\[\text{similarity}(\vec{a}, \vec{b}) = \frac{\vec{a} \cdot \vec{b}}{|\vec{a}| \cdot |\vec{b}|} = \frac{\sum a_i b_i}{\sqrt{\sum a_i^2} \cdot \sqrt{\sum b_i^2}}\]

Value Range

Identical

Same direction

Orthogonal

No similarity

-1

Opposite

Opposite direction

Why Cosine for Embeddings?

Cosine similarity is preferred for embeddings because:

Magnitude invariant: Focus on direction, not length
Bounded output: Always between -1 and 1
Works in high dimensions: Scales well to 1000+ dimensions
Standard: What embedding models are optimized for

Cosine vs Euclidean Distance

Aspect	Cosine Similarity	Euclidean Distance
Measures	Angle	Straight-line distance
Range	-1 to 1	0 to infinity
Magnitude matters?	No	Yes
Best for	Text, embeddings	Spatial data

Note: For normalized vectors (length = 1), cosine similarity and Euclidean distance give equivalent rankings.

Code Examples

Python

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

a = np.array([[1, 2, 3]])
b = np.array([[2, 3, 4]])

similarity = cosine_similarity(a, b)[0][0]
# 0.9926

JavaScript

function cosineSimilarity(a, b) {
    const dot = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
    const magA = Math.sqrt(a.reduce((sum, ai) => sum + ai * ai, 0));
    const magB = Math.sqrt(b.reduce((sum, bi) => sum + bi * bi, 0));
    return dot / (magA * magB);
}

cosineSimilarity([1, 2, 3], [2, 3, 4]); // 0.9926

Applications in AI

Semantic search: Find documents similar to query
Recommendations: Find items similar to user preferences
Deduplication: Find near-duplicate content
Clustering: Group similar items together
RAG: Retrieve relevant context for LLMs

Typical Thresholds

What similarity scores mean in practice (for text embeddings):

> 0.9: Very similar / near duplicate
0.7 - 0.9: Related content
0.5 - 0.7: Somewhat related
< 0.5: Likely unrelated

Try It: Cosine Similarity Calculator

What is Cosine Similarity?

Value Range

Why Cosine for Embeddings?

Cosine vs Euclidean Distance

Code Examples

Python

JavaScript

Applications in AI

Typical Thresholds