Table of Contents
Creating a column in Pandas only if it does not already exist involves using the “if not in” statement to check if the column already exists in the dataframe. If the column is not present, then a new column can be created using the “df[‘new_column’] = ” method. This approach ensures that the column is only created if it does not already exist in the dataframe, avoiding any potential errors or duplicates.
Pandas: Create Column If It Doesn’t Exist
You can use the following basic syntax to create a column in a pandas DataFrame if it doesn’t already exist:
df['my_column'] = df.get('my_column', df['col1'] * df['col2'])
This particular syntax creates a new column called my_column if it doesn’t already exist in the DataFrame and it is defined as the product of the existing columns col1 and col2.
The following example shows how to use this syntax in practice.
Example: Create Column in Pandas If It Doesn’t Exist
Suppose we have the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'day': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 'sales': [4, 6, 5, 8, 14, 13, 13, 12, 9, 8, 19, 14], 'price': [1, 2, 2, 1, 2, 4, 4, 3, 3, 2, 2, 3]}) #view DataFrame print(df) day sales price 0 1 4 1 1 2 6 2 2 3 5 2 3 4 8 1 4 5 14 2 5 6 13 4 6 7 13 4 7 8 12 3 8 9 9 3 9 10 8 2 10 11 19 2 11 12 14 3
Now suppose we attempt to add a column called price if it doesn’t already exist and define it as a column in which each value is equal to 100:
#attempt to add column called 'price'
df['price'] = df.get('price', 100)
#view updated DataFrame
print(df)
day sales price
0 1 4 1
1 2 6 2
2 3 5 2
3 4 8 1
4 5 14 2
5 6 13 4
6 7 13 4
7 8 12 3
8 9 9 3
9 10 8 2
10 11 19 2
11 12 14 3
Since a column called price already exists, pandas simply doesn’t add it to the DataFrame.
However, suppose we attempt to add a new column called revenue if it doesn’t already exist and define it as a column in which the values are the product of the sales and price columns:
#attempt to add column called 'revenue'
df['revenue'] = df.get('revenue', df['sales'] * df['price'])
#view updated DataFrame
print(df)
day sales price revenue
0 1 4 1 4
1 2 6 2 12
2 3 5 2 10
3 4 8 1 8
4 5 14 2 28
5 6 13 4 52
6 7 13 4 52
7 8 12 3 36
8 9 9 3 27
9 10 8 2 16
10 11 19 2 38
11 12 14 3 42
This revenue column is added to the DataFrame because it did not already exist.
The following tutorials explain how to perform other common operations in pandas:
Cite this article
stats writer (2024). How can I create a column in Pandas only if it does not already exist?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-i-create-a-column-in-pandas-only-if-it-does-not-already-exist/
stats writer. "How can I create a column in Pandas only if it does not already exist?." PSYCHOLOGICAL SCALES, 27 Jun. 2024, https://scales.arabpsychology.com/stats/how-can-i-create-a-column-in-pandas-only-if-it-does-not-already-exist/.
stats writer. "How can I create a column in Pandas only if it does not already exist?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-i-create-a-column-in-pandas-only-if-it-does-not-already-exist/.
stats writer (2024) 'How can I create a column in Pandas only if it does not already exist?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-i-create-a-column-in-pandas-only-if-it-does-not-already-exist/.
[1] stats writer, "How can I create a column in Pandas only if it does not already exist?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, June, 2024.
stats writer. How can I create a column in Pandas only if it does not already exist?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.
