Table of Contents
You can use the following methods to print one specific column of a PySpark DataFrame:
Method 1: Print Column Values with Column Name
df.select('my_column').show()
Method 2: Print Column Values Only
df.select('my_column').rdd.flatMap(list).collect()
The following examples show how to use each method in practice with the following PySpark DataFrame:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['A', 'East', 11, 4],
['A', 'East', 8, 9],
['A', 'East', 10, 3],
['B', 'West', 6, 12],
['B', 'West', 6, 4],
['C', 'East', 5, 2]]
#define column names
columns = ['team', 'conference', 'points', 'assists']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----+----------+------+-------+
|team|conference|points|assists|
+----+----------+------+-------+
| A| East| 11| 4|
| A| East| 8| 9|
| A| East| 10| 3|
| B| West| 6| 12|
| B| West| 6| 4|
| C| East| 5| 2|
+----+----------+------+-------+
Example 1: Print Column Values with Column Name
We can use the following syntax to print the column values along with the column name for the conference column of the DataFrame:
#print 'conference' column (with column name) df.select('conference').show() +----------+ |conference| +----------+ | East| | East| | East| | West| | West| | East| +----------+
Notice that both the column name and the column values are printed for only the conference column of the DataFrame.
Example 2: Print Column Values Only
We can use the following syntax to print only the column values of the conference column of the DataFrame:
#print values only from 'conference' column df.select('conference').rdd.flatMap(list).collect() ['East', 'East', 'East', 'West', 'West', 'East']
Notice that only the values from the conference column are printed and the name of the column is not included.