Table of Contents
Columns can be reordered in PySpark by using the select() function and specifying the desired order of columns. This can be achieved by passing the names of the columns in the desired order as arguments to the select() function. The select() function will then return a new DataFrame with the columns in the specified order. This allows for easy manipulation and rearrangement of columns in PySpark dataframes.
Reorder Columns in PySpark (With Examples)
You can use the following methods to reorder columns in a PySpark DataFrame:
Method 1: Reorder Columns in Specific Order
df = df.select('col3', 'col2', 'col4', 'col1')
Method 2: Reorder Columns Alphabetically
df = df.select(sorted(df.columns))
The following examples show how to use each method with the following PySpark DataFrame:
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() #define data data = [['A', 'East', 11, 4], ['A', 'East', 8, 9], ['A', 'East', 10, 3], ['B', 'West', 6, 12], ['B', 'West', 6, 4], ['C', 'East', 5, 2]] #define column names columns = ['team', 'conference', 'points', 'assists'] #create dataframe using data and column names df = spark.createDataFrame(data, columns) #view dataframe df.show() +----+----------+------+-------+ |team|conference|points|assists| +----+----------+------+-------+ | A| East| 11| 4| | A| East| 8| 9| | A| East| 10| 3| | B| West| 6| 12| | B| West| 6| 4| | C| East| 5| 2| +----+----------+------+-------+
Example 1: Reorder Columns in Specific Order
We can use the following syntax to reorder the columns in the DataFrame based on a specific order:
#reorder columns by specific order
df = df.select('conference', 'team', 'assists', 'points')
#view updated DataFrame
df.show()
+----------+----+-------+------+
|conference|team|assists|points|
+----------+----+-------+------+
| East| A| 4| 11|
| East| A| 9| 8|
| East| A| 3| 10|
| West| B| 12| 6|
| West| B| 4| 6|
| East| C| 2| 5|
+----------+----+-------+------+
The columns now appear in the exact order that we specified.
Example 2: Reorder Columns Alphabetically
We can use the following syntax to reorder the columns in the DataFrame alphabetically:
#reorder columns alphabetically
df = df.select(sorted(df.columns))
#view updated DataFrame
df.show()
+-------+----------+------+----+
|assists|conference|points|team|
+-------+----------+------+----+
| 4| East| 11| A|
| 9| East| 8| A|
| 3| East| 10| A|
| 12| West| 6| B|
| 4| West| 6| B|
| 2| East| 5| C|
+-------+----------+------+----+
The columns now appear in alphabetical order.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark: