How can Epoch be converted to Datetime in PySpark?


You can use the following syntax to  convert epoch time to a recognizable datetime in PySpark:

from pyspark.sql import functions as f
from pyspark.sql import types as t

df.withColumn('datetime', f.to_timestamp(df.epoch.cast(dataType=t.TimestampType())))

This particular example creates a new column called datetime that converts the epoch time from the epoch column to a recognizable datetime format.

For example, this syntax will convert an epoch time of 1655439422 to a PySpark datetime of 2022-06-17 00:17:02.

The following example shows how to use this syntax in practice.

Example: How to Convert Epoch to Datetime in PySpark

Suppose we have the following PySpark DataFrame that contains information about sales made on various epoch times at some company:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [[1655439422, 18], 
        [1655638422, 33], 
        [1664799422, 12], 
        [1668439411, 15], 
        [1669939422, 19],
        [1669993948, 24]]

#define column names
columns = ['epoch', 'sales']

#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----------+-----+
|     epoch|sales|
+----------+-----+
|1655439422|   18|
|1655638422|   33|
|1664799422|   12|
|1668439411|   15|
|1669939422|   19|
|1669993948|   24|
+----------+-----+

We can use the following syntax to create a new DataFrame that contains a column called datetime that converts each time in the epoch column to a recognizable datetime format:

from pyspark.sql import functions as f
from pyspark.sql import types as t

#create new column called 'epoch' that converts epoch to datetime
df_new = df.withColumn('datetime', f.to_timestamp(df.epoch.cast(dataType=t.TimestampType())))

#view new DataFrame
df_new.show()

+----------+-----+-------------------+
|     epoch|sales|           datetime|
+----------+-----+-------------------+
|1655439422|   18|2022-06-17 00:17:02|
|1655638422|   33|2022-06-19 07:33:42|
|1664799422|   12|2022-10-03 08:17:02|
|1668439411|   15|2022-11-14 10:23:31|
|1669939422|   19|2022-12-01 19:03:42|
|1669993948|   24|2022-12-02 10:12:28|
+----------+-----+-------------------+

Notice that the values in the datetime column contain recognizable datetimes.

For example:

  • The epoch time 1655439422 is equivalent to 2022-06-07 00:17:02.
  • The epoch time 1655638422 is equivalent to 2022-06-19 07:33:42.
  • The epoch time 1664799422 is equivalent to 2022-10-03 08:17:02.

And so on.

Note: PySpark automatically displays datetimes in the local timezone based on your machine.

Additional Resources

The following tutorials explain how to perform other common tasks in PySpark:

x