Table of Contents
You can use the following methods in PySpark to check if a particular column exists in a DataFrame:
Method 1: Check if Column Exists (Case-Sensitive)
'points' in df.columns
Method 2: Check if Column Exists (Not Case-Sensitive)
'points'.upper() in (name.upper() for name in df.columns)
The following examples show how to use each method in practice with the following PySpark DataFrame:
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() #define data data = [['A', 'East', 11, 4], ['A', None, 8, 9], ['A', 'East', 10, 3], ['B', 'West', None, 12], ['B', 'West', None, 4], ['C', 'East', 5, 2]] #define column names columns = ['team', 'conference', 'points', 'assists'] #create dataframe using data and column names df = spark.createDataFrame(data, columns) #view dataframe df.show() +----+----------+------+-------+ |team|conference|points|assists| +----+----------+------+-------+ | A| East| 11| 4| | A| null| 8| 9| | A| East| 10| 3| | B| West| null| 12| | B| West| null| 4| | C| East| 5| 2| +----+----------+------+-------+
Example 1: Check if Column Exists (Case-Sensitive)
We can use the following syntax to check if the column name points exists in the DataFrame:
#check if column name 'points' exists in the DataFrame 'points' in df.columns True
The output returns True since the column name points does indeed exist in the DataFrame.
Note that this syntax is case-sensitive so if we search instead for the column name Points then we will receive an output of False since the case we searched for doesn’t precisely match the case of the column name in the DataFrame:
#check if column name 'Points' exists in the DataFrame 'Points' in df.columns False
Example 2: Check if Column Exists (Not Case-Sensitive)
We can use the following syntax to check if the column name Points exists in the DataFrame:
#check if column name 'Points' exists in the DataFrame 'Points'.upper() in (name.upper() for name in df.columns) True
The output returns True even though the case of the column name that we searched for didn’t precisely match the column name of points in the DataFrame.
This allowed us to perform a case-insensitive search.