Table of Contents
In order to round the values in a column in PySpark to 2 decimal places, the user can utilize the “round” function with the desired precision as a parameter. This allows for precise manipulation of the data in the specified column, ensuring that all values are rounded to the desired number of decimal places. By following this method, the user can easily format the data to meet their specific needs.
PySpark: Round Column Values to 2 Decimal Places
You can use the following syntax to round the values in a column of a PySpark DataFrame to 2 decimal places:
from pyspark.sql.functions import round #create new column that rounds values in points column to 2 decimal places df_new = df.withColumn('points2', round(df.points, 2))
This particular example creates a new column named points2 that rounds each of the values in the points column of the DataFrame to 2 decimal places.
The following example shows how to use this syntax in practice.
Example: Round Column Values to 2 Decimal Places in PySpark
Suppose we have the following PySpark DataFrame that contains information about points scored by various basketball players:
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() #define data data = [['Mavs', 18.3494], ['Nets', 33.5541], ['Lakers', 12.6711], ['Kings', 15.6588], ['Hawks', 19.3215], ['Wizards', 24.0399], ['Magic', 28.6843], ['Jazz', 40.0001], ['Thunder', 24.2365], ['Spurs', 13.9446]] #define column names columns = ['team', 'points'] #create dataframe using data and column names df = spark.createDataFrame(data, columns) #view dataframe df.show() +-------+-------+ | team| points| +-------+-------+ | Mavs|18.3494| | Nets|33.5541| | Lakers|12.6711| | Kings|15.6588| | Hawks|19.3215| |Wizards|24.0399| | Magic|28.6843| | Jazz|40.0001| |Thunder|24.2365| | Spurs|13.9446| +-------+-------+
Suppose we would like to round each of the values in the points column to 2 decimal places.
We can use the following syntax to do so:
from pyspark.sql.functions import round #create new column that rounds values in points column to 2 decimal places df_new = df.withColumn('points2', round(df.points, 2)) #view new DataFrame df_new.show() +-------+-------+-------+ | team| points|points2| +-------+-------+-------+ | Mavs|18.3494| 18.35| | Nets|33.5541| 33.55| | Lakers|12.6711| 12.67| | Kings|15.6588| 15.66| | Hawks|19.3215| 19.32| |Wizards|24.0399| 24.04| | Magic|28.6843| 28.68| | Jazz|40.0001| 40.0| |Thunder|24.2365| 24.24| | Spurs|13.9446| 13.94| +-------+-------+-------+
Notice that the new column named points2 contains each of the values from the points column rounded to 2 decimal places.
For example:
- 18.3494 had been rounded to 18.35.
- 33.5541 has been rounded to 33.55.
- 12.6711 has been rounded to 12.67.
And so on.
Note: You can find the complete documentation for the PySpark round function .
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark: