Table of Contents
Pandas can be used to create new columns using multiple if else conditions by using the ‘np.where()’ function. This function takes three arguments: a boolean condition to check, the value if the condition is True, and the value if the condition is False. The output of this function will be the new column that is created. This function can be used to create multiple columns by adding more conditions and values in the arguments. With the help of this function, we can create new columns in Pandas dataframe with multiple if else conditions.
You can use the following syntax to create a new column in a pandas DataFrame using multiple if else conditions:
#define conditions conditions = [ (df['column1'] == 'A') & (df['column2'] < 20), (df['column1'] == 'A') & (df['column2'] >= 20), (df['column1'] == 'B') & (df['column2'] < 20), (df['column1'] == 'B') & (df['column2'] >= 20) ] #define results results = ['result1', 'result2', 'result3', 'result4'] #create new column based on conditions in column1 and column2 df['new_column'] = np.select(conditions, results)
This particular example creates a column called new_column whose values are based on the values in column1 and column2 in the DataFrame.
The following example shows how to use this syntax in practice.
Example: Create New Column Using Multiple If Else Conditions in Pandas
Suppose we have the following pandas DataFrame that contains information about various basketball players:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'points': [15, 18, 22, 24, 12, 17, 20, 28]}) #view DataFrame print(df) team points 0 A 15 1 A 18 2 A 22 3 A 24 4 B 12 5 B 17 6 B 20 7 B 28
Now suppose we would like to create a new column called class that classifies each player into one of the following four groups:
- Bad_A if team is A and points < 20
- Good_A if team is A and points ≥ 20
- Bad_B if team is B and points < 20
- Good_B if team is B and points ≥ 20
We can use the following syntax to do so:
import numpy as np #define conditions conditions = [ (df['team'] == 'A') & (df['points'] < 20), (df['team'] == 'A') & (df['points'] >= 20), (df['team'] == 'B') & (df['points'] < 20), (df['team'] == 'B') & (df['points'] >= 20) ] #define results results = ['Bad_A', 'Good_A', 'Bad_B', 'Good_B'] #create new column based on conditions in column1 and column2 df['class'] = np.select(conditions, results) #view updated DataFrame print(df) team points class 0 A 15 Bad_A 1 A 18 Bad_A 2 A 22 Good_A 3 A 24 Good_A 4 B 12 Bad_B 5 B 17 Bad_B 6 B 20 Good_B 7 B 28 Good_B
The new column called class displays the classification of each player based on the values in the team and points columns.
Note: You can find the complete documentation for the NumPy select() function .