Add New Rows to PySpark DataFrame (With Examples)


You can use the following methods to add new rows to a PySpark DataFrame:

Method 1: Add One New Row to DataFrame

#define new row to add with values 'C', 'Guard' and 14
new_row = spark.createDataFrame([('C', 'Guard', 14)], columns)

#add new row to DataFrame
df_new = df.union(new_row)

Method 2: Add Multiple New Rows to DataFrame

#define multiple new rows to add
new_rows = spark.createDataFrame([('C', 'Guard', 14),
                                  ('C', 'Forward', 32),
                                  ('D', 'Forward', 21)], columns)

#add new rows to DataFrame
df_new = df.union(new_rows)

The following examples show how to use each method in practice with the following PySpark DataFrame:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['A', 'Guard', 11], 
        ['A', 'Guard', 8], 
        ['A', 'Forward', 22], 
        ['A', 'Forward', 22], 
        ['B', 'Guard', 14], 
        ['B', 'Guard', 14],
        ['B', 'Forward', 13],
        ['B', 'Forward', 7]] 
  
#define column names
columns = ['team', 'position', 'points'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----+--------+------+
|team|position|points|
+----+--------+------+
|   A|   Guard|    11|
|   A|   Guard|     8|
|   A| Forward|    22|
|   A| Forward|    22|
|   B|   Guard|    14|
|   B|   Guard|    14|
|   B| Forward|    13|
|   B| Forward|     7|
+----+--------+------+

Example 1: Add One New Row to DataFrame

We can use the following syntax to add one new row to the end of the existing DataFrame:

#define new row to add
new_row = spark.createDataFrame([('C', 'Guard', 14)], columns)

#add new row to DataFrame
df_new = df.union(new_row)

#view updated DataFrame
df_new.show()

+----+--------+------+
|team|position|points|
+----+--------+------+
|   A|   Guard|    11|
|   A|   Guard|     8|
|   A| Forward|    22|
|   A| Forward|    22|
|   B|   Guard|    14|
|   B|   Guard|    14|
|   B| Forward|    13|
|   B| Forward|     7|
|   C|   Guard|    14|
+----+--------+------+

Notice that one new row has been added to the end of the DataFrame with the values C, Guard and 14 just as we specified.

Example 2: Add Multiple New Rows to DataFrame

We can use the following syntax to add three new rows to the end of the existing DataFrame:

#define multiple new rows to add
new_rows = spark.createDataFrame([('C', 'Guard', 14),
                                  ('C', 'Forward', 32),
                                  ('D', 'Forward', 21)], columns)

#add new rows to DataFrame
df_new = df.union(new_rows)

#view updated DataFrame
df_new.show()

+----+--------+------+
|team|position|points|
+----+--------+------+
|   A|   Guard|    11|
|   A|   Guard|     8|
|   A| Forward|    22|
|   A| Forward|    22|
|   B|   Guard|    14|
|   B|   Guard|    14|
|   B| Forward|    13|
|   B| Forward|     7|
|   C|   Guard|    14|
|   C| Forward|    32|
|   D| Forward|    21|
+----+--------+------+

Notice that three new rows have been added to the end of the DataFrame.

Note that we used the union function in these examples to return a new DataFrame that contained the union of the rows in the existing DataFrame and the values for the new row(s) that we specified.

You can find the complete documentation for the PySpark union function .

x