Table of Contents
The easiest way to select all columns except specific ones in a PySpark DataFrame is by using the drop function.
Here are two common ways to do so:
Method 1: Select All Columns Except One
#select all columns except 'conference' column df.drop('conference').show()
Method 2: Select All Columns Except Several Specific Ones
#select all columns except 'conference' and 'assists' columns df.drop('conference', 'assists').show()
The following examples show how to use each method in practice with the following PySpark DataFrame:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['A', 'East', 11, 4],
['A', 'East', 8, 9],
['A', 'East', 10, 3],
['B', 'West', 6, 12],
['B', 'West', 6, 4],
['C', 'East', 5, 2]]
#define column names
columns = ['team', 'conference', 'points', 'assists']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----+----------+------+-------+
|team|conference|points|assists|
+----+----------+------+-------+
| A| East| 11| 4|
| A| East| 8| 9|
| A| East| 10| 3|
| B| West| 6| 12|
| B| West| 6| 4|
| C| East| 5| 2|
+----+----------+------+-------+
Example 1: Select All Columns Except One
We can use the following syntax to select all columns in the DataFrame except for the conference column:
#select all columns except 'conference' column df.drop('conference').show() +----+------+-------+ |team|points|assists| +----+------+-------+ | A| 11| 4| | A| 8| 9| | A| 10| 3| | B| 6| 12| | B| 6| 4| | C| 5| 2| +----+------+-------+
Notice that the resulting DataFrame contains all columns from the original DataFrame except for the conference column.
Example 2: Select All Columns Except Several Specific Ones
We can use the following syntax to select all columns in the DataFrame except for the conference and assists columns:
#select all columns except 'conference' and 'assists' column df.drop('conference', 'assists').show() +----+------+ |team|points| +----+------+ | A| 11| | A| 8| | A| 10| | B| 6| | B| 6| | C| 5| +----+------+
Notice that the resulting DataFrame contains all columns from the original DataFrame except for the conference and assists columns.