How can I convert a string to a date in PySpark with an example?

In PySpark, a string can be converted to a date using the “to_date” function. This function takes in the string and a date format as parameters and returns a date object. For example, if we have a string “2021-10-21” and want to convert it to a date, we can use the “to_date” function with format “yyyy-MM-dd” and it will return a date object of October 21st, 2021. This conversion can be useful when working with date data in PySpark, allowing for easier manipulation and analysis of the data.

Convert String to Date in PySpark (With Example)


You can use the following syntax to convert a string column to a date column in a PySpark DataFrame:

from pyspark.sql import functions as F

df = df.withColumn('my_date_column', F.to_date('my_date_column'))

This particular example converts the values in the my_date_column from strings to dates.

The following example shows how to use this syntax in practice.

Example: How to Convert String to Date in PySpark

Suppose we have the following PySpark DataFrame that contains information about sales made on various dates at some company:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['2023-01-15', 225],
        ['2023-02-24', 260],
        ['2023-07-14', 413],
        ['2023-10-30', 368]] 
  
#define column names
columns = ['date', 'sales'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----------+-----+
|      date|sales|
+----------+-----+
|2023-01-15|  225|
|2023-02-24|  260|
|2023-07-14|  413|
|2023-10-30|  368|
+----------+-----+

We can use the following syntax to display the data type of each column in the DataFrame:

#check data type of each column
df.dtypes

[('date', 'string'), ('sales', 'bigint')]

We can see that the date column currently has a data type of string.

To convert this column from a string to a date, we can use the following syntax:

from pyspark.sql import functions as F

#convert 'date' column from string to date
df = df.withColumn('date', F.to_date('date'))

#view updated DataFrame 
df.show()

+----------+-----+
|      date|sales|
+----------+-----+
|2023-01-15|  225|
|2023-02-24|  260|
|2023-07-14|  413|
|2023-10-30|  368|
+----------+-----+

We can use the dtypes function once again to view the data types of each column in the DataFrame:

#check data type of each column
df.dtypes

[('date', 'date'), ('sales', 'bigint')]

We can see that the date column now has a data type of date.

We have successfully converted a string column to a date column.

Additional Resources

x