How can I perform One-Hot Encoding in Python? 2

How can I perform One-Hot Encoding in Python?

One-Hot Encoding is a commonly used technique in data preprocessing and machine learning tasks to convert categorical data into numerical data. It involves creating a binary vector for each category in a categorical variable, where the vector has a length equal to the total number of categories and contains only 0s and 1s. This allows categorical data to be represented in a way that can be easily processed by machine learning algorithms. In Python, One-Hot Encoding can be performed using the “get_dummies” function from the Pandas library or by using the “OneHotEncoder” class from the Scikit-learn library. Both methods involve converting categorical variables into dummy variables and merging them with the original data set. One-Hot Encoding is a crucial step in data preprocessing and can greatly improve the performance of machine learning models.

Perform One-Hot Encoding in Python


One-hot encoding is used to convert categorical variables into a format that can be readily used by .

The basic idea of one-hot encoding is to create new variables that take on values 0 and 1 to represent the original categorical values.

For example, the following image shows how we would perform one-hot encoding to convert a categorical variable that contains team names into new variables that contain only 0 and 1 values:

The following step-by-step example shows how to perform one-hot encoding for this exact dataset in Python.

Step 1: Create the Data

First, let’s create the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29]})

#view DataFrame
print(df)

  team  points
0    A      25
1    A      12
2    B      15
3    B      14
4    B      19
5    B      23
6    C      25
7    C      29

Step 2: Perform One-Hot Encoding

Next, let’s import the OneHotEncoder() function from the sklearn library and use it to perform one-hot encoding on the ‘team’ variable in the pandas DataFrame:

from sklearn.preprocessingimport OneHotEncoder

#creating instance of one-hot-encoder
encoder = OneHotEncoder(handle_unknown='ignore')

#perform one-hot encoding on 'team' column 
encoder_df = pd.DataFrame(encoder.fit_transform(df[['team']]).toarray())

#merge one-hot encoded columns back with original DataFrame
final_df = df.join(encoder_df)

#view final df
print(final_df)

  team  points    0    1    2
0    A      25  1.0  0.0  0.0
1    A      12  1.0  0.0  0.0
2    B      15  0.0  1.0  0.0
3    B      14  0.0  1.0  0.0
4    B      19  0.0  1.0  0.0
5    B      23  0.0  1.0  0.0
6    C      25  0.0  0.0  1.0
7    C      29  0.0  0.0  1.0

Notice that three new columns were added to the DataFrame since the original ‘team’ column contained three unique values.

Note: You can find the complete documentation for the OneHotEncoder() function .

Step 3: Drop the Original Categorical Variable

Lastly, we can drop the original ‘team’ variable from the DataFrame since we no longer need it:

#drop 'team' column
final_df.drop('team', axis=1, inplace=True)

#view final df
print(final_df)

   points    0    1    2
0      25  1.0  0.0  0.0
1      12  1.0  0.0  0.0
2      15  0.0  1.0  0.0
3      14  0.0  1.0  0.0
4      19  0.0  1.0  0.0
5      23  0.0  1.0  0.0
6      25  0.0  0.0  1.0
7      29  0.0  0.0  1.0

 

#rename columns
final_df.columns = ['points', 'teamA', 'teamB', 'teamC']

#view final dfprint(final_df)

   points  teamA  teamB  teamC
0      25    1.0    0.0    0.0
1      12    1.0    0.0    0.0
2      15    0.0    1.0    0.0
3      14    0.0    1.0    0.0
4      19    0.0    1.0    0.0
5      23    0.0    1.0    0.0
6      25    0.0    0.0    1.0
7      29    0.0    0.0    1.0

The one-hot encoding is complete and we can now feed this pandas DataFrame into any machine learning algorithm that we’d like.

Cite this article

stats writer (2024). How can I perform One-Hot Encoding in Python?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-i-perform-one-hot-encoding-in-python/

stats writer. "How can I perform One-Hot Encoding in Python?." PSYCHOLOGICAL SCALES, 12 May. 2024, https://scales.arabpsychology.com/stats/how-can-i-perform-one-hot-encoding-in-python/.

stats writer. "How can I perform One-Hot Encoding in Python?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-i-perform-one-hot-encoding-in-python/.

stats writer (2024) 'How can I perform One-Hot Encoding in Python?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-i-perform-one-hot-encoding-in-python/.

[1] stats writer, "How can I perform One-Hot Encoding in Python?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, May, 2024.

stats writer. How can I perform One-Hot Encoding in Python?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top