How can I extract the quarter from a date using PySpark?


You can use the following methods to extract the quarter from a date in a PySpark DataFrame:

Method 1: Extract Quarter from Date

from pyspark.sql.functions import quarter

df_new = df.withColumn('quarter', quarter(df['date']))

Method 2: Extract Year and Quarter from Date

from pyspark.sql.functions import quarter, year, concat, lit

df_new = df.withColumn('year-quarter',
                       concat(year(df['date']), lit('Q'), quarter(df['date'])))

The following examples show how to use each method in practice with the following PySpark DataFrame:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['2022-01-31', 6], 
        ['2022-02-28', 8], 
        ['2022-03-31', 10], 
        ['2022-04-30', 5], 
        ['2022-06-30', 4],
        ['2022-09-30', 8],
        ['2022-11-30', 8],
        ['2023-01-31', 3],
        ['2023-02-28', 5],
        ['2023-03-31', 14]]
  
#define column names
columns = ['date', 'sales'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----------+-----+
|      date|sales|
+----------+-----+
|2022-01-31|    6|
|2022-02-28|    8|
|2022-03-31|   10|
|2022-04-30|    5|
|2022-06-30|    4|
|2022-09-30|    8|
|2022-11-30|    8|
|2023-01-31|    3|
|2023-02-28|    5|
|2023-03-31|   14|
+----------+-----+

Example 1: Extract Quarter from Date in PySpark

We can use the following syntax to extract only the quarter from each string in the date column of the DataFrame:

from pyspark.sql.functions import quarter

#extract quarter from each string in 'date' column
df_new = df.withColumn('quarter', quarter(df['date']))

#view new DataFrame
df_new.show()

+----------+-----+-------+
|      date|sales|quarter|
+----------+-----+-------+
|2022-01-31|    6|      1|
|2022-02-28|    8|      1|
|2022-03-31|   10|      1|
|2022-04-30|    5|      2|
|2022-06-30|    4|      2|
|2022-09-30|    8|      3|
|2022-11-30|    8|      4|
|2023-01-31|    3|      1|
|2023-02-28|    5|      1|
|2023-03-31|   14|      1|
+----------+-----+-------+

The new quarter column contains the quarter of each date in the date column.

Example 2: Extract Year and Quarter from Date in PySpark

We can use the following syntax to extract the year and quarter from each string in the date column of the DataFrame:

from pyspark.sql.functions import quarter, year, concat, lit

#extract year and quarter from each string in 'date' column
df_new = df.withColumn('year-quarter',
                       concat(year(df['date']), lit('Q'), quarter(df['date'])))

#view new DataFrame
df_new.show()

+----------+-----+------------+
|      date|sales|year-quarter|
+----------+-----+------------+
|2022-01-31|    6|      2022Q1|
|2022-02-28|    8|      2022Q1|
|2022-03-31|   10|      2022Q1|
|2022-04-30|    5|      2022Q2|
|2022-06-30|    4|      2022Q2|
|2022-09-30|    8|      2022Q3|
|2022-11-30|    8|      2022Q4|
|2023-01-31|    3|      2023Q1|
|2023-02-28|    5|      2023Q1|
|2023-03-31|   14|      2023Q1|
+----------+-----+------------+

The new year-quarter column contains the year and quarter of each date in the date column.

Note that we used the concat function to concatenate the year and the quarter for each date along with the lit function to add a “Q” in front of the quarter number.

Additional Resources

x