How to extract minutes from timestamp in pyspark?

In PySpark, you can use the from_unixtime() function to extract minutes from a timestamp. This function takes the timestamp in the form of a Unix epoch (long integer) and returns a column containing the minutes from the given timestamp. To use the function, simply call the from_unixtime() function on the timestamp column and add the ‘mm’ formatting argument to the end of the function to indicate that you want the minutes. The resulting column will contain the minutes from the timestamp.


You can use the following methods to extract the minutes from a timestamp in PySpark:

Method 1: Extract Minutes from Timestamp

from pyspark.sql import functions as F

df_new = df.withColumn('minutes', F.minute(df['ts']))

If the timestamp is 2023-01-15 04:14:22 then this syntax would return 14.

Method 2: Extract Timestamp Truncated to Minutes

from pyspark.sql import functions as F

df_new = df.withColumn('minutes', F.date_trunc('minute', df['ts']))

If the timestamp is 2023-01-15 04:14:22 then this syntax would return 2023-01-15 04:14:00.

The following example shows how to use each method in practice with the following PySpark DataFrame that contains information about sales made on various timestamps at some company:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

from pyspark.sql import functions as F

#define data
data = [['2023-01-15 04:14:22', 225],
        ['2023-02-24 10:55:01', 260],
        ['2023-07-14 18:34:59', 413],
        ['2023-10-30 22:20:05', 368]] 
  
#define column names
columns = ['ts', 'sales'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)

#convert string column to timestamp
df = df.withColumn('ts', F.to_timestamp('ts', 'yyyy-MM-dd HH:mm:ss'))
  
#view dataframe
df.show()

+-------------------+-----+
|                 ts|sales|
+-------------------+-----+
|2023-01-15 04:14:22|  225|
|2023-02-24 10:55:01|  260|
|2023-07-14 18:34:59|  413|
|2023-10-30 22:20:05|  368|
+-------------------+-----+

Example 1: Extract Minutes from Timestamp

We can use the following syntax to extract only the minutes from each timestamp in the ts column of the DataFrame:

from pyspark.sql import functions as F

#extract minutes from each timestamp in 'ts' column
df_new = df.withColumn('minutes', F.minute(df['ts']))

#view new DataFrame
df_new.show()

+-------------------+-----+-------+
|                 ts|sales|minutes|
+-------------------+-----+-------+
|2023-01-15 04:14:22|  225|     14|
|2023-02-24 10:55:01|  260|     55|
|2023-07-14 18:34:59|  413|     34|
|2023-10-30 22:20:05|  368|     20|
+-------------------+-----+-------+

The new minutes column shows only the minutes from each timestamp in the ts column.

Example 2: Extract Timestamp Truncated to Minutes

We can use the following syntax to return each timestamp from the ts column truncated to the minutes:

from pyspark.sql import functions as F

#create new column that contains timestamp truncated to the minutes
df_new = df.withColumn('minutes', F.date_trunc('minute', df['ts']))

#view new DataFrame
df_new.show()

+-------------------+-----+-------------------+
|                 ts|sales|            minutes|
+-------------------+-----+-------------------+
|2023-01-15 04:14:22|  225|2023-01-15 04:14:00|
|2023-02-24 10:55:01|  260|2023-02-24 10:55:00|
|2023-07-14 18:34:59|  413|2023-07-14 18:34:00|
|2023-10-30 22:20:05|  368|2023-10-30 22:20:00|
+-------------------+-----+-------------------+

The new minutes column shows each timestamp from the ts column truncated to the minutes.

The following tutorials explain how to perform other common tasks in PySpark:

x