How to Remove Rows Using dplyr

Using the dplyr package in R, you can remove rows from a dataset by using the filter() function. This function allows you to specify the conditions that you want to filter by, such as certain column values or specific row numbers. After specifying the conditions, the rows that match the conditions will be removed from the dataset.


You can use the following basic syntax to remove rows from a data frame in R using dplyr:

1. Remove any row with NA’s

df %>%
  na.omit()

2. Remove any row with NA’s in specific column

df %>%
  filter(!is.na(column_name))

3. Remove duplicates

df %>%
  distinct()

4. Remove rows by index position

df %>%
  filter(!row_number() %in% c(1, 2, 4))

5. Remove rows based on condition

df %>%
  filter(column1=='A' | column2 > 8)

The following examples show how to use each of these methods in practice with the following data frame:

library(dplyr)

#create data frame
df <- data.frame(team=c('A', 'A', 'B', 'B', 'C', 'C'),
                 points=c(4, NA, 7, 5, 9, 9),
                 assists=c(1, 3, 5, NA, 2, 2))

#view data frame
df

  team points assists
1    A      4       1
2    A     NA       3
3    B      7       5
4    B      5      NA
5    C      9       2
6    C      9       2

Example 1: Remove Any Row with NA’s

The following code shows how to remove any row with NA values from the data frame:

#remove any row with NA
df %>%
  na.omit()

  team points assists
1    A      4       1
3    B      7       5
5    C      9       2
6    C      9       2

Example 2: Remove Any Row with NA’s in Specific Columns

#remove any row with NA in 'points' column:
df %>%
  filter(!is.na(points))

  team points assists
1    A      4       1
2    B      7       5
3    B      5      NA
4    C      9       2
5    C      9       2

Example 3: Remove Duplicate Rows

The following code shows how to remove duplicate rows:

#remove duplicate rows
df %>%
  distinct()

  team points assists
1    A      4       1
2    A     NA       3
3    B      7       5
4    B      5      NA
5    C      9       2

Example 4: Remove Rows by Index Position

The following code shows how to remove rows based on index position:

#remove rows 1, 2, and 4
df %>%
  filter(!row_number() %in% c(1, 2, 4))

  team points assists
1    B      7       5
2    C      9       2
3    C      9       2

Example 5: Remove Rows Based on Condition

The following code shows how to remove rows based on specific conditions:

#only keep rows where team is equal to 'A' or points is greater than 8
df %>%
  filter(column1=='A' | column2 > 8)

  team points assists
1    A      4       1
2    A     NA       3
3    C      9       2
4    C      9       2

The following tutorials explain how to perform other common functions in dplyr:

x