How to Calculate Canberra Distance in Python (With Example)

Canberra distance is a measure of similarity between two numerical vectors, often used in clustering algorithms. In Python, it can be calculated by using the NumPy library’s cdist() function. This function takes two vectors as arguments and returns a 1-dimensional array containing the distances between each of the points in the two vectors. For example, given two vectors of length 5, the cdist() function would return a 1-dimensional array of length 4 containing the Canberra distances between each pair of the points in the two vectors.


The Canberra distance between two vectors, A and B, is calculated as:

Canberra distance = Σ |Ai-Bi| / (|Ai| + |Bi|)

where:

  • Ai: The ith value in vector A
  • Bi: The ith value in vector B

For example, suppose we have the following two vectors:

  • A = [2, 4, 4, 6]
  • B = [5, 5, 7, 8]

We would calculate the Canberra distance between A and B as:

  • Canberra Distance = |2-5|/(2+5) + |4-5|/(4+5) + |4-7|/(4+7) + |6-8|/(6+8)
  • Canberra Distance = 3/7 + 1/9 + 3/11 + 2/14
  • Canberra Distance = 0.95527

The Canberra distance between these two vectors is 0.95527.

The following example shows how to calculate the Canberra distance between these exact two vectors in Python.

Example: Calculating Canberra Distance in Python

First, let’s create a NumPy array to hold each of our vectors:

import numpy as np

#define two arrays
array1 = np.array([2, 4, 4, 6])
array2 = np.array([5, 5, 7, 8])

Next, we can use the canberra() function from the SciPy package in Python to calculate the Canberra distance between the two vectors:

from scipy.spatial import distance

#calculate Canberra distance between the arrays
distance.canberra(array1, array2)

0.9552669552

The Canberra distance between the two vectors is 0.95527.

Notice that this value matches the one we calculated earlier by hand.

The following tutorials explain how to calculate other common distance metrics in Python:

x