How to convert column from date to string in pyspark ?

In order to convert a column from date to string in pyspark, you can use the to_date() function. This function takes in the date column as an argument and returns the converted string value. You can then assign this new string value to the same column or a new column. For example, to convert the column ‘date’ to string, you can write df.withColumn(“date_string”, to_date(“date”)). This will create a new column called ‘date_string’ and fill it with the converted string values.


You can use the following syntax to convert a column from a date to a string in PySpark:

from pyspark.sql.functions import date_format

df_new = df.withColumn('date_string', date_format('date', 'MM/dd/yyyy'))

This particular example converts the dates in the date column to strings in a new column called date_string, using MM/dd/yyyy as the date format.

The following example shows how to use this syntax in practice.

Example: How to Convert Column from Date to String in PySpark

Suppose we have the following PySpark DataFrame that contains information about sales made on various days for some company:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

import datetime

#define data
data = [[datetime.date(2023, 10, 30), 136], 
        [datetime.date(2023, 11, 14), 223], 
        [datetime.date(2023, 11, 22), 450], 
        [datetime.date(2023, 11, 25), 290], 
        [datetime.date(2023, 12, 19), 189]]
  
#define column names
columns = ['date', 'sales'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe with full column content
df.show()

+----------+-----+
|      date|sales|
+----------+-----+
|2023-10-30|  136|
|2023-11-14|  223|
|2023-11-22|  450|
|2023-11-25|  290|
|2023-12-19|  189|
+----------+-----+

We can use the dtypes function to check the data type of each column in the DataFrame:

#check data type of each column
df.dtypes

[('date', 'date'), ('sales', 'bigint')]

We can see that the date column currently has a data type of date.

To convert this column from a date to a string, we can use the following syntax:

from pyspark.sql.functions import date_format

#create new column that converts dates to strings
df_new = df.withColumn('date_string', date_format('date', 'MM/dd/yyyy'))

#view new DataFrame
df_new.show()

+----------+-----+-----------+
|      date|sales|date_string|
+----------+-----+-----------+
|2023-10-30|  136| 10/30/2023|
|2023-11-14|  223| 11/14/2023|
|2023-11-22|  450| 11/22/2023|
|2023-11-25|  290| 11/25/2023|
|2023-12-19|  189| 12/19/2023|
+----------+-----+-----------+

We can use the dtypes function once again to view the data types of each column in the DataFrame:

#check data type of each column
df.dtypes

[('date', 'date'), ('sales', 'bigint'), ('date_string', 'string')]

We can see that the date_string column has a data type of string.

We have successfully created a string column from a date column.

Note: We used MM/dd/yyyy as the date format within the date_format function but feel free to use whatever date format you’d like.

The following tutorials explain how to perform other common tasks in PySpark:

x