How to use the setdiff Function in R (With Examples)


The setdiff() function in R can be used to find differences between two sets. This function uses the following syntax:

setdiff(x, y)

where:

  • x, y: Vectors or data frames containing a sequence of items

This tutorial provides several examples of how to use this function in practice.

Example 1: Setdiff with Numeric Vectors

The following code shows how to use setdiff() to identify all of the values in vector a that do not occur in vector b:

#define vectors
a <- c(1, 3, 4, 5, 9, 10)
b <- c(1, 2, 3, 4, 5, 6)

#find all values in a that do not occur in b
setdiff(a, b)

[1]  9 10

There are two values that occur in vector a that do not occur in vector b9 and 10.

If we reverse the order of the vectors in the setdiff() function, we can instead identify all of the values in vector b that do not occur in vector a:

#find all values in b that do not occur in a
setdiff(b, a)

[1] 2 6

There are two values that occur in vector b that do not occur in vector a: 2 and 6.

Example 2: Setdiff with Character Vectors

The following code shows how to use setdiff() to identify all of the values in vector char1 that do not occur in vector char2:

#define character vectors
char1 <- c('A', 'B', 'C', 'D', 'E')
char2 <- c('A', 'B', 'E', 'F', 'G')

#find all values in char1 that do not occur in char2
setdiff(char1, char2)

[1] "C" "D"

Example 3: Setdiff with Data Frames

The following code shows how to use setdiff() to identify all of the values in one data frame column that do not appear in the same column of a second data frame:

#define data frames
df1 <- data.frame(team=c('A', 'B', 'C', 'D'),
                 conference=c('West', 'West', 'East', 'East'),
                 points=c(88, 97, 94, 104))

df2 <- data.frame(team=c('A', 'B', 'C', 'D'),
                 conference=c('West', 'West', 'East', 'East'),
                 points=c(88, 97, 98, 99))

#find differences between the points columns in the two data frames
setdiff(df1$points, df2$points)

[1]  94 104

We can see that the values 94 and 104 occur in the points column of the first data frame, but not in the points column of the second data frame.

How to Perform Partial String Matching in R

x