How can PySpark be used to round a date to the first day of the week?


You can use the following syntax to round dates to the first day of the week in a PySpark DataFrame:

import pyspark.sql.functions as F

#add new column that rounds date to first day of week
df_new = df.withColumn('first_day_of_week', F.trunc('date', 'week'))

This particular example creates a new column named first_day_of_week that rounds each date in the date column to the first day of the week.

The following example shows how to use this syntax in practice.

Example: How to Round Date to First Day of Week in PySpark

Suppose we have the following PySpark DataFrame that contains information about the sales made on various days at some company:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['2023-04-11', 22],
        ['2023-04-15', 14],
        ['2023-04-17', 12],
        ['2023-05-21', 15],
        ['2023-05-23', 30],
        ['2023-10-26', 45],
        ['2023-10-28', 32],
        ['2023-10-29', 47]]
  
#define column names
columns = ['date', 'sales']
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----------+-----+
|      date|sales|
+----------+-----+
|2023-04-11|   22|
|2023-04-15|   14|
|2023-04-17|   12|
|2023-05-21|   15|
|2023-05-23|   30|
|2023-10-26|   45|
|2023-10-28|   32|
|2023-10-29|   47|
+----------+-----+

Suppose we would like to round each date in the date column to the first day of the week.

We can use the following syntax to do so:

import pyspark.sql.functions as F

#add new column that rounds date to first day of week
df_new = df.withColumn('first_day_of_week', F.trunc('date', 'week'))

#view new DataFrame
df_new.show()

+----------+-----+-----------------+
|      date|sales|first_day_of_week|
+----------+-----+-----------------+
|2023-04-11|   22|       2023-04-10|
|2023-04-15|   14|       2023-04-10|
|2023-04-17|   12|       2023-04-17|
|2023-05-21|   15|       2023-05-15|
|2023-05-23|   30|       2023-05-22|
|2023-10-26|   45|       2023-10-23|
|2023-10-28|   32|       2023-10-23|
|2023-10-29|   47|       2023-10-23|
+----------+-----+-----------------+

The new first_day_of_week column contains each date from the date column rounded to the first day of the week.

Note: The “first” day of the week is considered to be Monday.

For example, we can see:

  • The first day of the week for the date 2023-04-11 is 2023-04-10.
  • The first day of the week for the date 2023-04-15 is 2023-04-10.
  • The first day of the week for the date 2023-04-17 is 2023-04-17.

And so on.

Note: You can find the complete documentation for the PySpark trunc function .

Additional Resources

x