Table of Contents

In PySpark, columns can be renamed using the `withColumnRenamed()` function. This function takes two arguments: the original column name and the new column name. It returns a new DataFrame with the renamed column. For example, if we have a DataFrame named `df` with columns “id” and “name”, we can rename the “name” column to “full_name” using the code `df.withColumnRenamed(“name”, “full_name”)`. This function can be useful when working with large datasets and wanting to make the column names more descriptive or standardized.

You can use the following methods to rename columns in a PySpark DataFrame:

Method 1: Rename One Column

#rename 'conference' column to 'conf'
df = df.withColumnRenamed('conference', 'conf')

Method 2: Rename Multiple Columns

#rename 'conference' and 'team' columns
df = df.withColumnRenamed('conference', 'conf')
       .withColumnRenamed('team', 'team_name')

Method 3: Rename All Columns

#specify new column names to use
col_names = ['the_team', 'the_conf', 'points_scored', 'total_assists']

#rename all column names with new names
df = df.toDF(*col_names)

The following examples show how to use each of these methods in practice with the following PySpark DataFrame:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['A', 'East', 11, 4], 
        ['A', 'East', 8, 9], 
        ['A', 'East', 10, 3], 
        ['B', 'West', 6, 12], 
        ['B', 'West', 6, 4], 
        ['C', 'East', 5, 2]] 
  
#define column names
columns = ['team', 'conference', 'points', 'assists'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----+----------+------+-------+
|team|conference|points|assists|
+----+----------+------+-------+
|   A|      East|    11|      4|
|   A|      East|     8|      9|
|   A|      East|    10|      3|
|   B|      West|     6|     12|
|   B|      West|     6|      4|
|   C|      East|     5|      2|
+----+----------+------+-------+

Example 1: Rename One Column in PySpark

We can use the following syntax to rename just the conference column in the DataFrame:

#rename 'conference' column to 'conf'
df = df.withColumnRenamed('conference', 'conf')

#view updated DataFrame
df.show()

+----+----+------+-------+
|team|conf|points|assists|
+----+----+------+-------+
|   A|East|    11|      4|
|   A|East|     8|      9|
|   A|East|    10|      3|
|   B|West|     6|     12|
|   B|West|     6|      4|
|   C|East|     5|      2|
+----+----+------+-------+

Notice that only the conference column has been renamed.

Example 2: Rename Multiple Columns in PySpark

We can use the following syntax to rename the conference and team columns in the DataFrame:

#rename 'conference' and 'team' columns
df = df.withColumnRenamed('conference', 'conf')
       .withColumnRenamed('team', 'team_name')

#view updated DataFrame
df.show()

+---------+----+------+-------+
|team_name|conf|points|assists|
+---------+----+------+-------+
|        A|East|    11|      4|
|        A|East|     8|      9|
|        A|East|    10|      3|
|        B|West|     6|     12|
|        B|West|     6|      4|
|        C|East|     5|      2|
+---------+----+------+-------+

Notice that the conference and team columns have been renamed while all other column names have remained the same.

Example 3: Rename All Columns in PySpark

We can use the following syntax to rename all columns in the DataFrame:

#specify new column names to use
col_names = ['the_team', 'the_conf', 'points_scored', 'total_assists']

#rename all column names with new names
df = df.toDF(*col_names)

#view updated DataFrame
df.show()

+--------+--------+-------------+-------------+
|the_team|the_conf|points_scored|total_assists|
+--------+--------+-------------+-------------+
|       A|    East|           11|            4|
|       A|    East|            8|            9|
|       A|    East|           10|            3|
|       B|    West|            6|           12|
|       B|    West|            6|            4|
|       C|    East|            5|            2|
+--------+--------+-------------+-------------+

Notice that all of the column names have been renamed based on the new names that we specified.

Additional Resources

The following tutorials explain how to perform other common tasks in PySpark:

How can columns be renamed in PySpark?

Example 1: Rename One Column in PySpark

Example 2: Rename Multiple Columns in PySpark

Example 3: Rename All Columns in PySpark

Additional Resources

Requst a

Scale

Example 1: Rename One Column in PySpark

Example 2: Rename Multiple Columns in PySpark

Example 3: Rename All Columns in PySpark

Additional Resources

Related terms:

Requst a

Scale