How can I round a date to the first day of the week in PySpark?

To round a date to the first day of the week in PySpark, one can use the ‘dayofweek’ function to determine the day of the week for a given date. Then, subtract the day of the week from the date to get the first day of the week. This can be achieved by utilizing the ‘date_add’ function in PySpark. By specifying the number of days to be added or subtracted, the date can be rounded to the first day of the week. This method is useful for data manipulation and analysis, especially when working with weekly data.

PySpark: Round Date to First Day of Week


You can use the following syntax to round dates to the first day of the week in a PySpark DataFrame:

import pyspark.sql.functions as F

#add new column that rounds date to first day of week
df_new = df.withColumn('first_day_of_week', F.trunc('date', 'week'))

This particular example creates a new column named first_day_of_week that rounds each date in the date column to the first day of the week.

The following example shows how to use this syntax in practice.

Example: How to Round Date to First Day of Week in PySpark

Suppose we have the following PySpark DataFrame that contains information about the sales made on various days at some company:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['2023-04-11', 22],
        ['2023-04-15', 14],
        ['2023-04-17', 12],
        ['2023-05-21', 15],
        ['2023-05-23', 30],
        ['2023-10-26', 45],
        ['2023-10-28', 32],
        ['2023-10-29', 47]]
  
#define column names
columns = ['date', 'sales']
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----------+-----+
|      date|sales|
+----------+-----+
|2023-04-11|   22|
|2023-04-15|   14|
|2023-04-17|   12|
|2023-05-21|   15|
|2023-05-23|   30|
|2023-10-26|   45|
|2023-10-28|   32|
|2023-10-29|   47|
+----------+-----+

Suppose we would like to round each date in the date column to the first day of the week.

We can use the following syntax to do so:

import pyspark.sql.functions as F

#add new column that rounds date to first day of week
df_new = df.withColumn('first_day_of_week', F.trunc('date', 'week'))

#view new DataFrame
df_new.show()

+----------+-----+-----------------+
|      date|sales|first_day_of_week|
+----------+-----+-----------------+
|2023-04-11|   22|       2023-04-10|
|2023-04-15|   14|       2023-04-10|
|2023-04-17|   12|       2023-04-17|
|2023-05-21|   15|       2023-05-15|
|2023-05-23|   30|       2023-05-22|
|2023-10-26|   45|       2023-10-23|
|2023-10-28|   32|       2023-10-23|
|2023-10-29|   47|       2023-10-23|
+----------+-----+-----------------+

The new first_day_of_week column contains each date from the date column rounded to the first day of the week.

Note: The “first” day of the week is considered to be Monday.

For example, we can see:

  • The first day of the week for the date 2023-04-11 is 2023-04-10.
  • The first day of the week for the date 2023-04-15 is 2023-04-10.
  • The first day of the week for the date 2023-04-17 is 2023-04-17.

And so on.

Note: You can find the complete documentation for the PySpark trunc function .

Additional Resources

x