How can I find the day of the week in PySpark?

PySpark is a powerful tool for data analysis that allows users to manipulate large datasets using Python. One common task in data analysis is finding the day of the week for a particular date. In PySpark, this can be achieved by using the “dayofweek” function, which takes a date column as input and returns the corresponding day of the week in numerical form. This function is a part of the PySpark SQL module and can be easily integrated into data processing pipelines. Additionally, the “dayofweek” function also considers the locale and time zone settings, making it a reliable and accurate method for finding the day of the week in PySpark.

Find Day of the Week in PySpark


You can use the following methods to find the day of the week for dates in a PySpark DataFrame:

Method 1: Get Day of Week as Number (Sunday =1)

import pyspark.sql.functions as F

df_new = df.withColumn('day_of_week', F.dayofweek('date'))

Method 2: Get Day of Week as Number (Monday=1)

import pyspark.sql.functions as F

df_new = df.withColumn('day_of_week', ((F.dayofweek('date')+5)%7)+1)

Method 3: Get Day of Week as Abbreviated Name (e.g. Mon)

import pyspark.sql.functions as F

df_new = df.withColumn('day_of_week', F.date_format('date', 'E'))

Method 4: Get Day of Week as Abbreviated Name (e.g. Monday)

import pyspark.sql.functions as F

df_new = df.withColumn('day_of_week', F.date_format('date', 'EEEE'))

The following examples show how to use each method in practice with the following PySpark DataFrame:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['2023-04-11', 22],
        ['2023-04-15', 14],
        ['2023-04-17', 12],
        ['2023-05-21', 15],
        ['2023-05-23', 30],
        ['2023-10-26', 45],
        ['2023-10-28', 32],
        ['2023-10-29', 47]]
  
#define column names
columns = ['date', 'sales']
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----------+-----+
|      date|sales|
+----------+-----+
|2023-04-11|   22|
|2023-04-15|   14|
|2023-04-17|   12|
|2023-05-21|   15|
|2023-05-23|   30|
|2023-10-26|   45|
|2023-10-28|   32|
|2023-10-29|   47|
+----------+-----+

Example 1: Get Day of Week as Number (Sunday =1)

We can use the following syntax to get the day of the week as a number between 1 and 7, assuming Sunday is the start of the week:

import pyspark.sql.functions as F

#add new column that displays day of week as number
df_new = df.withColumn('day_of_week', F.dayofweek('date'))

#view new DataFrame
df_new.show()

+----------+-----+-----------+
|      date|sales|day_of_week|
+----------+-----+-----------+
|2023-04-11|   22|          3|
|2023-04-15|   14|          7|
|2023-04-17|   12|          2|
|2023-05-21|   15|          1|
|2023-05-23|   30|          3|
|2023-10-26|   45|          5|
|2023-10-28|   32|          7|
|2023-10-29|   47|          1|
+----------+-----+-----------+

Example 2: Get Day of Week as Number (Monday=1)

We can use the following syntax to get the day of the week as a number between 1 and 7, assuming Monday is the start of the week:

import pyspark.sql.functions as F

#add new column that displays day of week as number
df_new = df.withColumn('day_of_week', ((F.dayofweek('date')+5)%7)+1)

#view new DataFrame
df_new.show()

+----------+-----+-----------+
|      date|sales|day_of_week|
+----------+-----+-----------+
|2023-04-11|   22|          2|
|2023-04-15|   14|          6|
|2023-04-17|   12|          1|
|2023-05-21|   15|          7|
|2023-05-23|   30|          2|
|2023-10-26|   45|          4|
|2023-10-28|   32|          6|
|2023-10-29|   47|          7|
+----------+-----+-----------+

Example 3: Get Day of Week as Abbreviated Name

We can use the following syntax to get the day of the week as an abbreviated name:

import pyspark.sql.functions as F

#add new column that displays day of week as abbreviated name
df_new = df.withColumn('day_of_week', F.date_format('date', 'E'))

#view new DataFrame
df_new.show()

+----------+-----+-----------+
|      date|sales|day_of_week|
+----------+-----+-----------+
|2023-04-11|   22|        Tue|
|2023-04-15|   14|        Sat|
|2023-04-17|   12|        Mon|
|2023-05-21|   15|        Sun|
|2023-05-23|   30|        Tue|
|2023-10-26|   45|        Thu|
|2023-10-28|   32|        Sat|
|2023-10-29|   47|        Sun|
+----------+-----+-----------+

Example 4: Get Day of Week as Full Name

We can use the following syntax to get the day of the week as a full name:

import pyspark.sql.functions as F

#add new column that displays day of week as full name
df_new = df.withColumn('day_of_week', F.date_format('date', 'EEEE'))

#view new DataFrame
df_new.show()

+----------+-----+-----------+
|      date|sales|day_of_week|
+----------+-----+-----------+
|2023-04-11|   22|    Tuesday|
|2023-04-15|   14|   Saturday|
|2023-04-17|   12|     Monday|
|2023-05-21|   15|     Sunday|
|2023-05-23|   30|    Tuesday|
|2023-10-26|   45|   Thursday|
|2023-10-28|   32|   Saturday|
|2023-10-29|   47|     Sunday|
+----------+-----+-----------+

Additional Resources

The following tutorials explain how to perform other common tasks in PySpark:

x