Convert Integer to String in PySpark (With Example)


You can use the following syntax to convert an integer column to a string column in a PySpark DataFrame:

from pyspark.sql.types import StringType

df = df.withColumn('my_string', df['my_integer'].cast(StringType()))

This particular example creates a new column called my_string that contains the string values from the integer values in the my_integer column.

The following example shows how to use this syntax in practice.

Example: How to Convert Integer to String in PySpark

Suppose we have the following PySpark DataFrame that contains information about points scored by various basketball players:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['A', 11], 
        ['B', 19], 
        ['C', 22], 
        ['D', 25], 
        ['E', 12], 
        ['F', 41],
        ['G', 32],
        ['H', 20]] 
  
#define column names
columns = ['team', 'points']
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----+------+
|team|points|
+----+------+
|   A|    11|
|   B|    19|
|   C|    22|
|   D|    25|
|   E|    12|
|   F|    41|
|   G|    32|
|   H|    20|
+----+------+

We can use the following syntax to display the data type of each column in the DataFrame:

#check data type of each column
df.dtypes

[('team', 'string'), ('points', 'bigint')]

We can see that the points column currently has a data type of integer.

To convert this column from an integer to a string, we can use the following syntax:

from pyspark.sql.types import StringType

#create string column from integer column
df = df.withColumn('points_string', df['points'].cast(StringType()))

#view updated DataFrame
df.show()

+----+------+-------------+
|team|points|points_string|
+----+------+-------------+
|   A|    11|           11|
|   B|    19|           19|
|   C|    22|           22|
|   D|    25|           25|
|   E|    12|           12|
|   F|    41|           41|
|   G|    32|           32|
|   H|    20|           20|
+----+------+-------------+

We can use the dtypes function once again to view the data types of each column in the DataFrame:

#check data type of each column
df.dtypes

[('team', 'string'), ('points', 'bigint'), ('points_string', 'string')]

We can see that the points_string column has a data type of string.

We have successfully created a string column from an integer column.

x