How to calculate Levenshtein Distance in Python?

Levenshtein Distance is a measure of the similarity between two strings and is used in natural language processing to determine the degree of similarity between two strings. It can be calculated in Python using the Levenshtein library, which provides an implementation of the Levenshtein algorithm. The library provides a distance() function which takes two strings as parameters and returns the Levenshtein Distance between them. The lower the distance, the more similar the two strings are. The maximum value the distance can take is the length of the longer string.


The Levenshtein distance between two strings is the minimum number of single-character edits required to turn one word into the other.

The word “edits” includes substitutions, insertions, and deletions.

For example, suppose we have the following two words:

  • PARTY
  • PARK

The Levenshtein distance between the two words (i.e. the number of edits we have to make to turn one word into the other) would be 2:

Levenshtein distance example

In practice, the Levenshtein distance is used in many different applications including approximate string matching, spell-checking, and natural language processing.

This tutorial explains how to calculate the Levenshtein distance between strings in Python by using the python-Levenshtein module.

You can use the following syntax to install this module:

pip install python-Levenshtein

You can then load the function to calculate the Levenshtein distance:

from Levenshtein import distance as lev

The following examples show how to use this function in practice.

Example 1: Levenshtein Distance Between Two Strings

The following code shows how to calculate the Levenshtein distance between the two strings “party” and “park”:

#calculate Levenshtein distance
lev('party', 'park')

2

The Levenshtein distance turns out to be 2.

Example 2: Levenshtein Distance Between Two Arrays

The following code shows how to calculate the Levenshtein distance between every pairwise combination of strings in two different arrays:

#define arrays
a = ['Mavs', 'Spurs', 'Lakers', 'Cavs']
b <- ['Rockets', 'Pacers', 'Warriors', 'Celtics']

#calculate Levenshtein distance between two arrays
for i,k in zip(a, b):
  print(lev(i, k))

6
4
5
5

The way to interpret the output is as follows:

  • The Levenshtein distance between ‘Mavs’ and ‘Rockets’ is 6.
  • The Levenshtein distance between ‘Spurs’ and ‘Pacers’ is 4.
  • The Levenshtein distance between ‘Lakers’ and ‘Warriors’ is 5.
  • The Levenshtein distance between ‘Cavs’ and ‘Celtics’ is 5.

x