How can I use NOT LIKE to filter rows in PySpark?


You can use the following syntax to filter a PySpark DataFrame using a NOT LIKE operator:

df.filter(~df.team.like('%avs%')).show()

This particular example filters the DataFrame to only show rows where the string in the team column does not have a pattern like “avs” somewhere in the string.

The following example shows how to use this syntax in practice.

Example: How to Filter Using NOT LIKE in PySpark

Suppose we have the following PySpark DataFrame that contains information about points scored by various basketball players:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['Mavs', 18], 
        ['Nets', 33], 
        ['Lakers', 12], 
        ['Mavs', 15], 
        ['Cavs', 19],
        ['Wizards', 24],
        ['Cavs', 28],
        ['Nets', 40],
        ['Mavs', 24],
        ['Spurs', 13]] 
  
#define column names
columns = ['team', 'points'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+-------+------+
|   team|points|
+-------+------+
|   Mavs|    18|
|   Nets|    33|
| Lakers|    12|
|   Mavs|    15|
|   Cavs|    19|
|Wizards|    24|
|   Cavs|    28|
|   Nets|    40|
|   Mavs|    24|
|  Spurs|    13|
+-------+------+

We can use the following syntax to filter the DataFrame to only contain rows where the team column does not contain a pattern like “avs” somewhere in the string:

#filter DataFrame where team column does not contain pattern like 'avs'
df.filter(~df.team.like('%avs%')).show() 

+-------+------+
|   team|points|
+-------+------+
|   Nets|    33|
| Lakers|    12|
|Wizards|    24|
|   Nets|    40|
|  Spurs|    13|
+-------+------+

Notice that each of the rows in the resulting DataFrame do not contain a pattern like “avs” in the team column.

Note that we used the like function to find all strings in the team column that had a pattern like “avs” and then we used the ~ symbol to negate this function.

The end result is that we’re able to filter for only the rows in the DataFrame that do not have a pattern like “avs” in the team column.

Note: You can find the complete documentation for the PySpark like function .

Additional Resources

The following tutorials explain how to perform other common tasks in PySpark:

x