How can I check if a PySpark DataFrame is empty?

To determine if a PySpark DataFrame is empty, you can use the df.count() function which returns the number of rows in the DataFrame. If the count is equal to 0, then the DataFrame is empty. Additionally, you can also use the df.isEmpty() function which returns a boolean value indicating if the DataFrame is empty or not.

PySpark: Check if DataFrame is Empty


You can use the following syntax to check if a PySpark DataFrame is empty:

print(df.count() ==0)

This will return True if the DataFrame is empty or False if the DataFrame is not empty.

Note that df.count() will count the number of rows in the DataFrame, so we’re effectively checking if the total rows is equal to zero or not.

The following examples show how to use this syntax in practice.

Example 1: Check if Empty DataFrame is Empty

Suppose we create the following empty PySpark DataFrame with specific column names:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

from pyspark.sql.types import StructType, StructField, StringType, FloatType

#create empty RDD
empty_rdd=spark.sparkContext.emptyRDD()

#specify colum names and types
my_columns=[StructField('team', StringType(),True),
            StructField('position', StringType(),True),
            StructField('points', FloatType(),True)]

#create DataFrame with specific column names
df=spark.createDataFrame([], schema=StructType(my_columns))

#view DataFrame
df.show()

+----+--------+------+
|team|position|points|
+----+--------+------+
+----+--------+------+

We can use the following syntax to check if the DataFrame is empty:

#check if DataFrame is empty
print(df.count() ==0)

True

We receive a value of True, which indicates that the DataFrame is indeed empty.

Example 2: Check if Non-Empty DataFrame is Empty

Suppose we create the following PySpark DataFrame that contains information about various basketball players:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['Mavs', 18], 
        ['Nets', 33], 
        ['Lakers', 12], 
        ['Mavs', 15], 
        ['Cavs', 19],
        ['Wizards', 24],]
  
#define column names
columns = ['team', 'points'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+-------+------+
|   team|points|
+-------+------+
|   Mavs|    18|
|   Nets|    33|
| Lakers|    12|
|   Mavs|    15|
|   Cavs|    19|
|Wizards|    24|
+-------+------+

We can use the following syntax to check if the DataFrame is empty:

#check if DataFrame is empty
print(df.count() ==0)

False

We receive a value of False, which indicates that the DataFrame is not empty.

Additional Resources

The following tutorials explain how to perform other common tasks in PySpark:

x