Print One Column of a PySpark DataFrame


You can use the following methods to print one specific column of a PySpark DataFrame:

Method 1: Print Column Values with Column Name

df.select('my_column').show()

Method 2: Print Column Values Only

df.select('my_column').rdd.flatMap(list).collect()

The following examples show how to use each method in practice with the following PySpark DataFrame:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['A', 'East', 11, 4], 
        ['A', 'East', 8, 9], 
        ['A', 'East', 10, 3], 
        ['B', 'West', 6, 12], 
        ['B', 'West', 6, 4], 
        ['C', 'East', 5, 2]] 
  
#define column names
columns = ['team', 'conference', 'points', 'assists'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----+----------+------+-------+
|team|conference|points|assists|
+----+----------+------+-------+
|   A|      East|    11|      4|
|   A|      East|     8|      9|
|   A|      East|    10|      3|
|   B|      West|     6|     12|
|   B|      West|     6|      4|
|   C|      East|     5|      2|
+----+----------+------+-------+

Example 1: Print Column Values with Column Name

We can use the following syntax to print the column values along with the column name for the conference column of the DataFrame:

#print 'conference' column (with column name)
df.select('conference').show()

+----------+
|conference|
+----------+
|      East|
|      East|
|      East|
|      West|
|      West|
|      East|
+----------+

Notice that both the column name and the column values are printed for only the conference column of the DataFrame.

Example 2: Print Column Values Only

We can use the following syntax to print only the column values of the conference column of the DataFrame:

#print values only from 'conference' column
df.select('conference').rdd.flatMap(list).collect() 

['East', 'East', 'East', 'West', 'West', 'East']

Notice that only the values from the conference column are printed and the name of the column is not included.

x