How can I add time to a datetime object in PySpark?

Adding time to a datetime object in PySpark can be done using the “withColumn” function. This function allows for the creation of a new column in a PySpark dataframe by specifying a new column name and the desired transformation. In this case, the transformation would involve using the “date_add” or “date_sub” functions to add or subtract a specified amount of time to the datetime column. This allows for easy manipulation of datetime objects in PySpark for various data analysis and processing tasks.

PySpark: Add Time to Datetime


You can use the following syntax to add time to a datetime in PySpark:

from pyspark.sql import functions as F

df = df.withColumn('new_ts', df.ts + F.expr('INTERVAL 3 HOURS 5 MINUTES 2 SECONDS'))

This particular example creates a new column called new_ts that adds 3 hours, 5 minutes and 2 seconds to each datetime in the ts column.

Note: To subtract time, simply use a subtraction sign instead of an addition sign.

The following example shows how to use this syntax in practice.

Example: How to Add Time to Datetime in PySpark

Suppose we have the following PySpark DataFrame that contains information about sales made on various timestamps at some company:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

from pyspark.sql import functions as F

#define data
data = [['2023-01-15 04:14:22', 225],
        ['2023-02-24 10:55:01', 260],
        ['2023-07-14 18:34:59', 413],
        ['2023-10-30 22:20:05', 368]] 
  
#define column names
columns = ['ts', 'sales'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)

#convert string column to timestamp
df = df.withColumn('ts', F.to_timestamp('ts', 'yyyy-MM-dd HH:mm:ss'))

#view DataFrame
df.show()

+-------------------+-----+
|                 ts|sales|
+-------------------+-----+
|2023-01-15 04:14:22|  225|
|2023-02-24 10:55:01|  260|
|2023-07-14 18:34:59|  413|
|2023-10-30 22:20:05|  368|
+-------------------+-----+

We can use the following syntax to create a new column called new_time that adds 3 hours, 5 minutes and 2 seconds to each datetime in the ts column of the DataFrame:

from pyspark.sql import functions as F

#add 3 hours, 5 minutes and 2 seconds to each datetime in 'ts' column
df = df.withColumn('new_time', df.ts + F.expr('INTERVAL 3 HOURS 5 MINUTES 2 SECONDS'))

#view updated DataFrame
df.show()

+-------------------+-----+-------------------+
|                 ts|sales|           new_time|
+-------------------+-----+-------------------+
|2023-01-15 04:14:22|  225|2023-01-15 07:19:24|
|2023-02-24 10:55:01|  260|2023-02-24 14:00:03|
|2023-07-14 18:34:59|  413|2023-07-14 21:40:01|
|2023-10-30 22:20:05|  368|2023-10-31 01:25:07|
+-------------------+-----+-------------------+

The new_time column shows each time from the ts column with 3 hours, 5 minutes and 2 seconds added to it.

Note that you could also add only hours if you’d like by using the following syntax:

from pyspark.sql import functions as F

#add 3 hours to each datetime in 'ts' column
df = df.withColumn('new_time', df.ts + F.expr('INTERVAL 3 HOURS'))

#view updated DataFrame
df.show()

+-------------------+-----+-------------------+
|                 ts|sales|           new_time|
+-------------------+-----+-------------------+
|2023-01-15 04:14:22|  225|2023-01-15 07:14:22|
|2023-02-24 10:55:01|  260|2023-02-24 13:55:01|
|2023-07-14 18:34:59|  413|2023-07-14 21:34:59|
|2023-10-30 22:20:05|  368|2023-10-31 01:20:05|
+-------------------+-----+-------------------+

Now the new_time column shows each time from the ts column with 3 hours added to it.

Feel free to use this syntax to add or subtract as much time as you’d like from a datetime column.

Additional Resources

The following tutorials explain how to perform other common tasks in PySpark:

x