How can I convert the columns of a PySpark DataFrame into a MapType (Dict) format? 2

How can I convert the columns of a PySpark DataFrame into a MapType (Dict) format?

Converting the columns of a PySpark DataFrame into a MapType (Dict) format allows for easier manipulation and analysis of data. To do this, one can use the PySpark function “create_map” which takes in a list of columns and creates a map from those columns. This map can then be used to convert the DataFrame columns into a dictionary format, providing a more structured and organized way to access and work with the data. This can be particularly useful for tasks such as feature engineering or data transformation. Overall, converting DataFrame columns into a MapType format expands the capabilities of PySpark and enables more efficient data handling.

To convert DataFrame columns to a MapType (dictionary) column in PySpark, you can use the create_map function from the pyspark.sql.functions module. This function allows you to create a map from a set of key-value pairs, where the keys and values are columns from the DataFrame.

Let’s create a DataFrame


from pyspark.sql import SparkSession
from pyspark.sql.types import StructType,StructField, StringType, IntegerType

spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
data = [ ("36636","Finance",3000,"USA"), 
    ("40288","Finance",5000,"IND"), 
    ("42114","Sales",3900,"USA"), 
    ("39192","Marketing",2500,"CAN"), 
    ("34534","Sales",6500,"USA") ]
schema = StructType([
     StructField('id', StringType(), True),
     StructField('dept', StringType(), True),
     StructField('salary', IntegerType(), True),
     StructField('location', StringType(), True)
     ])

df = spark.createDataFrame(data=data,schema=schema)
df.printSchema()
df.show(truncate=False)

This yields below output


root
 |-- id: string (nullable = true)
 |-- dept: string (nullable = true)
 |-- salary: integer (nullable = true)
 |-- location: string (nullable = true)

+-----+---------+------+--------+
|id   |dept     |salary|location|
+-----+---------+------+--------+
|36636|Finance  |3000  |USA     |
|40288|Finance  |5000  |IND     |
|42114|Sales    |3900  |USA     |
|39192|Marketing|2500  |CAN     |
|34534|Sales    |6500  |USA     |
+-----+---------+------+--------+

Convert DataFrame Columns to MapType

Now, using create_map() SQL function let’s convert PySpark DataFrame columns salary and location to MapType.


#Convert columns to Map
from pyspark.sql.functions import col,lit,create_map
df = df.withColumn("propertiesMap",create_map(
        lit("salary"),col("salary"),
        lit("location"),col("location")
        )).drop("salary","location")
df.printSchema()
df.show(truncate=False)

This yields below output.


root
 |-- id: string (nullable = true)
 |-- dept: string (nullable = true)
 |-- propertiesMap: map (nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

+-----+---------+---------------------------------+
|id   |dept     |propertiesMap                    |
+-----+---------+---------------------------------+
|36636|Finance  |[salary -> 3000, location -> USA]|
|40288|Finance  |[salary -> 5000, location -> IND]|
|42114|Sales    |[salary -> 3900, location -> USA]|
|39192|Marketing|[salary -> 2500, location -> CAN]|
|34534|Sales    |[salary -> 6500, location -> USA]|
+-----+---------+---------------------------------+

Happy Learning !!

Cite this article

stats writer (2024). How can I convert the columns of a PySpark DataFrame into a MapType (Dict) format?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-i-convert-the-columns-of-a-pyspark-dataframe-into-a-maptype-dict-format/

stats writer. "How can I convert the columns of a PySpark DataFrame into a MapType (Dict) format?." PSYCHOLOGICAL SCALES, 24 Jun. 2024, https://scales.arabpsychology.com/stats/how-can-i-convert-the-columns-of-a-pyspark-dataframe-into-a-maptype-dict-format/.

stats writer. "How can I convert the columns of a PySpark DataFrame into a MapType (Dict) format?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-i-convert-the-columns-of-a-pyspark-dataframe-into-a-maptype-dict-format/.

stats writer (2024) 'How can I convert the columns of a PySpark DataFrame into a MapType (Dict) format?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-i-convert-the-columns-of-a-pyspark-dataframe-into-a-maptype-dict-format/.

[1] stats writer, "How can I convert the columns of a PySpark DataFrame into a MapType (Dict) format?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, June, 2024.

stats writer. How can I convert the columns of a PySpark DataFrame into a MapType (Dict) format?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top