How to Round Column Values to 2 Decimal Places in pyspark?

In PySpark, if you want to round the values in a column to two decimal places, you can use the round() function. The round() function takes two parameters: the column name, and the number of decimal places to round the values to. For example, to round the values in a column called “x” to two decimal places, you could use the following code: df.select(round(“x”, 2).alias(“x”)).show() This will return all the values in the column “x” rounded to two decimal places.


You can use the following syntax to round the values in a column of a PySpark DataFrame to 2 decimal places:

from pyspark.sql.functions import round

#create new column that rounds values in points column to 2 decimal places
df_new = df.withColumn('points2', round(df.points, 2))

This particular example creates a new column named points2 that rounds each of the values in the points column of the DataFrame to 2 decimal places.

The following example shows how to use this syntax in practice.

Example: Round Column Values to 2 Decimal Places in PySpark

Suppose we have the following PySpark DataFrame that contains information about points scored by various basketball players:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['Mavs', 18.3494], 
        ['Nets', 33.5541], 
        ['Lakers', 12.6711], 
        ['Kings', 15.6588], 
        ['Hawks', 19.3215],
        ['Wizards', 24.0399],
        ['Magic', 28.6843],
        ['Jazz', 40.0001],
        ['Thunder', 24.2365],
        ['Spurs', 13.9446]]
  
#define column names
columns = ['team', 'points'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+-------+-------+
|   team| points|
+-------+-------+
|   Mavs|18.3494|
|   Nets|33.5541|
| Lakers|12.6711|
|  Kings|15.6588|
|  Hawks|19.3215|
|Wizards|24.0399|
|  Magic|28.6843|
|   Jazz|40.0001|
|Thunder|24.2365|
|  Spurs|13.9446|
+-------+-------+

Suppose we would like to round each of the values in the points column to 2 decimal places.

We can use the following syntax to do so:

from pyspark.sql.functions import round

#create new column that rounds values in points column to 2 decimal places
df_new = df.withColumn('points2', round(df.points, 2))

#view new DataFrame
df_new.show()

+-------+-------+-------+
|   team| points|points2|
+-------+-------+-------+
|   Mavs|18.3494|  18.35|
|   Nets|33.5541|  33.55|
| Lakers|12.6711|  12.67|
|  Kings|15.6588|  15.66|
|  Hawks|19.3215|  19.32|
|Wizards|24.0399|  24.04|
|  Magic|28.6843|  28.68|
|   Jazz|40.0001|   40.0|
|Thunder|24.2365|  24.24|
|  Spurs|13.9446|  13.94|
+-------+-------+-------+

Notice that the new column named points2 contains each of the values from the points column rounded to 2 decimal places.

For example:

  • 18.3494 had been rounded to 18.35.
  • 33.5541 has been rounded to 33.55.
  • 12.6711 has been rounded to 12.67.

And so on.

Note: You can find the complete documentation for the PySpark round function .

The following tutorials explain how to perform other common tasks in PySpark:

x