How can I round the values in a column in PySpark to 2 decimal places?

In order to round the values in a column in PySpark to 2 decimal places, the user can utilize the “round” function with the desired precision as a parameter. This allows for precise manipulation of the data in the specified column, ensuring that all values are rounded to the desired number of decimal places. By following this method, the user can easily format the data to meet their specific needs.

PySpark: Round Column Values to 2 Decimal Places


You can use the following syntax to round the values in a column of a PySpark DataFrame to 2 decimal places:

from pyspark.sql.functions import round

#create new column that rounds values in points column to 2 decimal places
df_new = df.withColumn('points2', round(df.points, 2))

This particular example creates a new column named points2 that rounds each of the values in the points column of the DataFrame to 2 decimal places.

The following example shows how to use this syntax in practice.

Example: Round Column Values to 2 Decimal Places in PySpark

Suppose we have the following PySpark DataFrame that contains information about points scored by various basketball players:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['Mavs', 18.3494], 
        ['Nets', 33.5541], 
        ['Lakers', 12.6711], 
        ['Kings', 15.6588], 
        ['Hawks', 19.3215],
        ['Wizards', 24.0399],
        ['Magic', 28.6843],
        ['Jazz', 40.0001],
        ['Thunder', 24.2365],
        ['Spurs', 13.9446]]
  
#define column names
columns = ['team', 'points'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+-------+-------+
|   team| points|
+-------+-------+
|   Mavs|18.3494|
|   Nets|33.5541|
| Lakers|12.6711|
|  Kings|15.6588|
|  Hawks|19.3215|
|Wizards|24.0399|
|  Magic|28.6843|
|   Jazz|40.0001|
|Thunder|24.2365|
|  Spurs|13.9446|
+-------+-------+

Suppose we would like to round each of the values in the points column to 2 decimal places.

We can use the following syntax to do so:

from pyspark.sql.functions import round

#create new column that rounds values in points column to 2 decimal places
df_new = df.withColumn('points2', round(df.points, 2))

#view new DataFrame
df_new.show()

+-------+-------+-------+
|   team| points|points2|
+-------+-------+-------+
|   Mavs|18.3494|  18.35|
|   Nets|33.5541|  33.55|
| Lakers|12.6711|  12.67|
|  Kings|15.6588|  15.66|
|  Hawks|19.3215|  19.32|
|Wizards|24.0399|  24.04|
|  Magic|28.6843|  28.68|
|   Jazz|40.0001|   40.0|
|Thunder|24.2365|  24.24|
|  Spurs|13.9446|  13.94|
+-------+-------+-------+

Notice that the new column named points2 contains each of the values from the points column rounded to 2 decimal places.

For example:

  • 18.3494 had been rounded to 18.35.
  • 33.5541 has been rounded to 33.55.
  • 12.6711 has been rounded to 12.67.

And so on.

Note: You can find the complete documentation for the PySpark round function .

Additional Resources

The following tutorials explain how to perform other common tasks in PySpark:

x