How can I convert a categorical variable to numeric in Pandas?

How can I convert a categorical variable to numeric in Pandas?

Converting a categorical variable to a numeric variable in Pandas involves using the “pd.get_dummies” function to create dummy variables for each category, and then dropping one of the dummy variables to avoid the “dummy variable trap.” This process allows for the conversion of categorical data into a numerical format that can be used for analysis and modeling in Pandas. By converting categorical variables to numeric, it becomes easier to perform mathematical operations and use statistical methods on the data.

Convert Categorical Variable to Numeric in Pandas


You can use the following basic syntax to convert a categorical variable to a numeric variable in a pandas DataFrame:

df['column_name'] = pd.factorize(df['column_name'])[0]

You can also use the following syntax to convert every categorical variable in a DataFrame to a numeric variable:

#identify all categorical variables
cat_columns = df.select_dtypes(['object']).columns#convert all categorical variables to numeric
df[cat_columns] = df[cat_columns].apply(lambda x: pd.factorize(x)[0])

The following examples show how to use this syntax in practice.

Example 1: Convert One Categorical Variable to Numeric

Suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
                   'position': ['G', 'G', 'F', 'G', 'F', 'C', 'G', 'F', 'C'],
                   'points': [5, 7, 7, 9, 12, 9, 9, 4, 13],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12, 10]})

#view DataFrame
df

team	positionpoints	rebounds
0	A	G	5	11
1	A	G	7	8
2	A	F	7	10
3	B	G	9	6
4	B	F	12	6
5	B	C	9	5
6	C	G	9	9
7	C	F	4	12
8	C	C	13	10

We can use the following syntax to convert the ‘team’ column to numeric:

#convert 'team' column to numeric
df['team'] = pd.factorize(df['team'])[0]

#view updated DataFrame
df	team	positionpoints	rebounds
0	0	G	5	11
1	0	G	7	8
2	0	F	7	10
3	1	G	9	6
4	1	F	12	6
5	1	C	9	5
6	2	G	9	9
7	2	F	4	12
8	2	C	13	10

Here is how the conversion worked:

  • Each team that had a value of ‘A‘ was converted to 0.
  • Each team that had a value of ‘B‘ was converted to 1.
  • Each team that had a value of ‘C‘ was converted to 2.

Example 2: Convert Multiple Categorical Variables to Numeric

Once again suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
                   'position': ['G', 'G', 'F', 'G', 'F', 'C', 'G', 'F', 'C'],
                   'points': [5, 7, 7, 9, 12, 9, 9, 4, 13],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12, 10]})

#view DataFrame
df

        team	position points	rebounds
0	A	G	 5	11
1	A	G	 7	8
2	A	F	 7	10
3	B	G	 9	6
4	B	F	 12	6
5	B	C	 9	5
6	C	G	 9	9
7	C	F	 4	12
8	C	C	 13	10

We can use the following syntax to convert every categorical variable in the DataFrame to a numeric variable:

#get all categorical columns
cat_columns = df.select_dtypes(['object']).columns#convert all categorical columns to numeric
df[cat_columns] = df[cat_columns].apply(lambda x: pd.factorize(x)[0])

#view updated DataFrame
df

	team	position points	rebounds
0	0	0	 5	11
1	0	0	 7	8
2	0	1	 7	10
3	1	0	 9	6
4	1	1	 12	6
5	1	2	 9	5
6	2	0	 9	9
7	2	1	 4	12
8	2	2	 13	10

Note: You can find the complete documentation for the pandas factorize() function .

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

Cite this article

stats writer (2024). How can I convert a categorical variable to numeric in Pandas?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-i-convert-a-categorical-variable-to-numeric-in-pandas/

stats writer. "How can I convert a categorical variable to numeric in Pandas?." PSYCHOLOGICAL SCALES, 2 Jul. 2024, https://scales.arabpsychology.com/stats/how-can-i-convert-a-categorical-variable-to-numeric-in-pandas/.

stats writer. "How can I convert a categorical variable to numeric in Pandas?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-i-convert-a-categorical-variable-to-numeric-in-pandas/.

stats writer (2024) 'How can I convert a categorical variable to numeric in Pandas?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-i-convert-a-categorical-variable-to-numeric-in-pandas/.

[1] stats writer, "How can I convert a categorical variable to numeric in Pandas?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, July, 2024.

stats writer. How can I convert a categorical variable to numeric in Pandas?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top