Table of Contents

You can use the following syntax to explode a column that contains arrays in a PySpark DataFrame into multiple rows:

from pyspark.sql.functions import explode

#explode points column into rows
df_new = df.withColumn('points', explode(df.points))

This particular example explodes the arrays in the points column of a DataFrame into multiple rows.

The following example shows how to use this syntax in practice.

Example: How to Explode Array into Rows in a PySpark DataFrame

Suppose we have the following PySpark DataFrame that contains information about points scored in three different games by various basketball players:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['A', 'Guard', [11, 8, 25]], 
        ['A', 'Forward', [14, 20, 22]], 
        ['B', 'Guard', [21, 30, 6]], 
        ['B', 'Forward', [22, 12, 34]]] 
  
#define column names
columns = ['team', 'position', 'points'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----+--------+------------+
|team|position|      points|
+----+--------+------------+
|   A|   Guard| [11, 8, 25]|
|   A| Forward|[14, 20, 22]|
|   B|   Guard| [21, 30, 6]|
|   B| Forward|[22, 12, 34]|
+----+--------+------------+

Notice that the points column currently contains arrays.

We can use the following syntax to explode the values from each of these arrays into their own rows:

from pyspark.sql.functions import explode

#explode points column into rows
df_new = df.withColumn('points', explode(df.points))

#view new DataFrame
df_new.show()

+----+--------+------+
|team|position|points|
+----+--------+------+
|   A|   Guard|    11|
|   A|   Guard|     8|
|   A|   Guard|    25|
|   A| Forward|    14|
|   A| Forward|    20|
|   A| Forward|    22|
|   B|   Guard|    21|
|   B|   Guard|    30|
|   B|   Guard|     6|
|   B| Forward|    22|
|   B| Forward|    12|
|   B| Forward|    34|
+----+--------+------+

Notice that each of the values in the arrays from the points column have been exploded into their own rows.

Note: You can find the complete documentation for the PySpark explode function .

Additional Resources

The following tutorials explain how to perform other common tasks in PySpark:

How can an array be exploded into rows using PySpark?

Example: How to Explode Array into Rows in a PySpark DataFrame

Additional Resources

Requst a

Scale

Example: How to Explode Array into Rows in a PySpark DataFrame

Additional Resources

Related terms:

Requst a

Scale