Convert String to Date in PySpark (With Example)


You can use the following syntax to convert a string column to a date column in a PySpark DataFrame:

from pyspark.sql import functions as F

df = df.withColumn('my_date_column', F.to_date('my_date_column'))

This particular example converts the values in the my_date_column from strings to dates.

The following example shows how to use this syntax in practice.

Example: How to Convert String to Date in PySpark

Suppose we have the following PySpark DataFrame that contains information about sales made on various dates at some company:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['2023-01-15', 225],
        ['2023-02-24', 260],
        ['2023-07-14', 413],
        ['2023-10-30', 368]] 
  
#define column names
columns = ['date', 'sales'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----------+-----+
|      date|sales|
+----------+-----+
|2023-01-15|  225|
|2023-02-24|  260|
|2023-07-14|  413|
|2023-10-30|  368|
+----------+-----+

We can use the following syntax to display the data type of each column in the DataFrame:

#check data type of each column
df.dtypes

[('date', 'string'), ('sales', 'bigint')]

We can see that the date column currently has a data type of string.

To convert this column from a string to a date, we can use the following syntax:

from pyspark.sql import functions as F

#convert 'date' column from string to date
df = df.withColumn('date', F.to_date('date'))

#view updated DataFrame 
df.show()

+----------+-----+
|      date|sales|
+----------+-----+
|2023-01-15|  225|
|2023-02-24|  260|
|2023-07-14|  413|
|2023-10-30|  368|
+----------+-----+

We can use the dtypes function once again to view the data types of each column in the DataFrame:

#check data type of each column
df.dtypes

[('date', 'date'), ('sales', 'bigint')]

We can see that the date column now has a data type of date.

We have successfully converted a string column to a date column.

x