How can columns be reordered in PySpark?

Columns can be reordered in PySpark by using the select() function and specifying the desired order of columns. This can be achieved by passing the names of the columns in the desired order as arguments to the select() function. The select() function will then return a new DataFrame with the columns in the specified order. This allows for easy manipulation and rearrangement of columns in PySpark dataframes.

Reorder Columns in PySpark (With Examples)


You can use the following methods to reorder columns in a PySpark DataFrame:

Method 1: Reorder Columns in Specific Order

df = df.select('col3', 'col2', 'col4', 'col1')

Method 2: Reorder Columns Alphabetically

df = df.select(sorted(df.columns))

The following examples show how to use each method with the following PySpark DataFrame:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['A', 'East', 11, 4], 
        ['A', 'East', 8, 9], 
        ['A', 'East', 10, 3], 
        ['B', 'West', 6, 12], 
        ['B', 'West', 6, 4], 
        ['C', 'East', 5, 2]] 
  
#define column names
columns = ['team', 'conference', 'points', 'assists'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----+----------+------+-------+
|team|conference|points|assists|
+----+----------+------+-------+
|   A|      East|    11|      4|
|   A|      East|     8|      9|
|   A|      East|    10|      3|
|   B|      West|     6|     12|
|   B|      West|     6|      4|
|   C|      East|     5|      2|
+----+----------+------+-------+

Example 1: Reorder Columns in Specific Order

We can use the following syntax to reorder the columns in the DataFrame based on a specific order:

#reorder columns by specific order
df = df.select('conference', 'team', 'assists', 'points')

#view updated DataFrame
df.show()

+----------+----+-------+------+
|conference|team|assists|points|
+----------+----+-------+------+
|      East|   A|      4|    11|
|      East|   A|      9|     8|
|      East|   A|      3|    10|
|      West|   B|     12|     6|
|      West|   B|      4|     6|
|      East|   C|      2|     5|
+----------+----+-------+------+

The columns now appear in the exact order that we specified.

Example 2: Reorder Columns Alphabetically

We can use the following syntax to reorder the columns in the DataFrame alphabetically:

#reorder columns alphabetically
df = df.select(sorted(df.columns)) 

#view updated DataFrame
df.show()

+-------+----------+------+----+
|assists|conference|points|team|
+-------+----------+------+----+
|      4|      East|    11|   A|
|      9|      East|     8|   A|
|      3|      East|    10|   A|
|     12|      West|     6|   B|
|      4|      West|     6|   B|
|      2|      East|     5|   C|
+-------+----------+------+----+

The columns now appear in alphabetical order.

Additional Resources

The following tutorials explain how to perform other common tasks in PySpark:

x