How to Bin Variables in Python Using numpy.digitize()

Binning variables in Python can be done using the numpy.digitize() function. This function takes an array of values and splits them into bins or categories of equal size. The function will then assign each element in the array a number corresponding to the bin it is in. This is a useful way to convert continuous data into discrete categories for further analysis.


Often you may be interested in placing the values of a variable into “bins” in Python.

Fortunately this is easy to do using the function, which uses the following syntax:

numpy.digitize(x, bins, right=False)

where:

  • x: Array to be binned.
  • bins: Array of bins.
  • right: Indicating whether the intervals include the right or the left bin edge. Default is that the interval does not include the right edge.

This tutorial shows several examples of how to use this function in practice.

Example 1: Place All Values into Two Bins

The following code shows how to place the values of an array into two bins:

  • 0 if x < 20
  • if x ≥ 20
import numpy as np

#create data
data = [2, 4, 4, 7, 12, 14, 19, 20, 24, 31, 34]

#place values into bins
np.digitize(data, bins=[20])

array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1])

Example 2: Place All Values into Three Bins

The following code shows how to place the values of an array into three bins:

  • 0 if x < 10
  • if 10 ≤ x < 20
  • if x ≥ 20
import numpy as np

#create data
data = [2, 4, 4, 7, 12, 14, 20, 22, 24, 31, 34]

#place values into bins
np.digitize(data, bins=[10, 20])

array([0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2])

Note that if we specify right=True then the values would be placed into the following bins:

  • 0 if x ≤ 10
  • if 10 < x ≤ 20
  • if x > 20

Each interval would include the right bin edge. Here’s what that looks like:

import numpy as np

#create data
data = [2, 4, 4, 7, 12, 14, 20, 22, 24, 31, 34]

#place values into bins
np.digitize(data, bins=[10, 20], right=True)

array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2])

Example 3: Place All Values into Four Bins

The following code shows how to place the values of an array into three bins:

  • 0 if x < 10
  • if 10 ≤ x < 20
  • if 20 ≤ x < 30
  • if x ≥ 30
import numpy as np

#create data
data = [2, 4, 4, 7, 12, 14, 20, 22, 24, 31, 34]

#place values into bins
np.digitize(data, bins=[10, 20, 30])

array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3])

Example 4: Count the Frequency of Each Bin

Another useful NumPy function that complements the numpy.digitize() function is the function, which counts the frequencies of each bin.

The following code shows how to place the values of an array into three bins and then count the frequency of each bin:

import numpy as np

#create data
data = [2, 4, 4, 7, 12, 14, 20, 22, 24, 31, 34]

#place values into bins
bin_data = np.digitize(data, bins=[10, 20])

#view binned data
bin_data

array([0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2])

#count frequency of each bin
np.bincount(bin_data)

array([4, 2, 5])

The output tells us that:

  • Bin “0” contains data values.
  • Bin “1” contains data values.
  • Bin “2” contains data values.

Find more Python tutorials .

x