How can I extract a number from a string in Pandas?

How can I extract a number from a string in Pandas?

Extracting a number from a string in Pandas refers to the process of isolating and retrieving a numerical value from a text data column using the Pandas library. This can be achieved by using various methods such as regular expressions, string manipulation functions, or built-in Pandas functions. The extracted number can then be used for further data analysis or manipulation purposes. This feature is particularly useful when dealing with datasets that contain mixed data types, where numbers may be embedded within strings.

Extract Number from String in Pandas


You can use the following basic syntax to extract numbers from a string in pandas:

df['my_column'].str.extract('(d+)')

This particular syntax will extract the numbers from each string in a column called my_column in a pandas DataFrame.

Note: When using a regular expression, d represents “any digit” and + stands for “one or more.”

The following example shows how to use this function in practice.

Example: Extract Number from String in Pandas

Suppose we have the following pandas DataFrame that contains information about the sales of various products:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'product': ['A33', 'B34', 'A22', 'A50', 'C200', 'D7', 'A9', 'A13'],
                   'sales': [18, 22, 19, 14, 14, 11, 20, 28]})

#view DataFrame
print(df)

  product  sales
0     A33     18
1     B34     22
2     A22     19
3     A50     14
4    C200     14
5      D7     11
6      A9     20
7     A13     28

Suppose we would like to extract the number from each string in the product column.

We can use the following syntax to do so:

#extract numbers from strings in 'product' column
df['product'].str.extract('(d+)')

	0
0	33
1	34
2	22
3	50
4	200
5	7
6	9
7	13

The result is a DataFrame that contains only the numbers from each row in the product column.

For example:

  • The formula extracts 33 from the string A33 in the first row.
  • The formula extracts 34 from the string B34 in the first row.
  • The formula extracts 22 from the string A22 in the first row.

And so on.

If you’d like, you can also store these numerical values in a new column in the DataFrame:

#extract numbers from strings in 'product' column and store them in new column
df['product_numbers'] = df['product'].str.extract('(d+)')

#view updated DataFrame
print(df)

  product  sales product_numbers
0     A33     18              33
1     B34     22              34
2     A22     19              22
3     A50     14              50
4    C200     14             200
5      D7     11               7
6      A9     20               9
7     A13     28              13

The following tutorials explain how to perform other common operations in pandas:

Cite this article

stats writer (2024). How can I extract a number from a string in Pandas?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-i-extract-a-number-from-a-string-in-pandas/

stats writer. "How can I extract a number from a string in Pandas?." PSYCHOLOGICAL SCALES, 25 Jun. 2024, https://scales.arabpsychology.com/stats/how-can-i-extract-a-number-from-a-string-in-pandas/.

stats writer. "How can I extract a number from a string in Pandas?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-i-extract-a-number-from-a-string-in-pandas/.

stats writer (2024) 'How can I extract a number from a string in Pandas?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-i-extract-a-number-from-a-string-in-pandas/.

[1] stats writer, "How can I extract a number from a string in Pandas?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, June, 2024.

stats writer. How can I extract a number from a string in Pandas?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top