Table of Contents
You can use the following syntax to convert a column to lowercase in a PySpark DataFrame:
from pyspark.sql.functions import lower
df = df.withColumn('my_column', lower(df['my_column']))
The following example shows how to use this syntax in practice.
Example: How to Convert Column to Lowercase in PySpark
Suppose we create the following PySpark DataFrame that contains information about various basketball players:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['A', 'East', 11, 4],
['A', 'East', 8, 9],
['A', 'East', 10, 3],
['B', 'West', 6, 12],
['B', 'West', 6, 4],
['C', 'East', 5, 2]]
#define column names
columns = ['team', 'conference', 'points', 'assists']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----+----------+------+-------+
|team|conference|points|assists|
+----+----------+------+-------+
| A| East| 11| 4|
| A| East| 8| 9|
| A| East| 10| 3|
| B| West| 6| 12|
| B| West| 6| 4|
| C| East| 5| 2|
+----+----------+------+-------+
Suppose we would like to convert all strings in the conference column to lowercase.
We can use the following syntax to do so:
from pyspark.sql.functions import lower
#convert 'conference' column to lowercase
df = df.withColumn('conference', lower(df['conference']))
#view updated DataFrame
df.show()
+----+----------+------+-------+
|team|conference|points|assists|
+----+----------+------+-------+
| A| east| 11| 4|
| A| east| 8| 9|
| A| east| 10| 3|
| B| west| 6| 12|
| B| west| 6| 4|
| C| east| 5| 2|
+----+----------+------+-------+
Notice that all strings in the conference column of the updated DataFrame are now lowercase.
Note #1: We used the withcolumn function to return a new DataFrame with the conference column modified and all other columns left the same.
Note #2: You can find the complete documentation for the PySpark withColumn function .