Table of Contents
Using the dplyr package in R, you can remove rows with NA values from a dataset by using the filter() function. This function can take a logical statement which will return only those rows without NA values. To ensure this happens, the statement should include the is.na() function to identify any NA values in the dataset. The syntax would be filter(dataset, !is.na(column_name)). This will then return a dataset with all of the rows that do not contain NA values.
You can use the following methods from the package to remove rows with NA values:
Method 1: Remove Rows with NA Values in Any Column
library(dplyr) #remove rows with NA value in any column df %>% na.omit()
Method 2: Remove Rows with NA Values in Certain Columns
library(dplyr) #remove rows with NA value in 'col1' or 'col2' df %>% filter_at(vars(col1, col2), all_vars(!is.na(.)))
Method 3: Remove Rows with NA Values in One Specific Column
library(dplyr) #remove rows with NA value in 'col1' df %>% filter(!is.na(col1))
The following examples show how to use these methods in practice with the following data frame:
#create data frame with some missing values
df <- data.frame(team=c('A', 'A', 'B', 'B', 'C'),
points=c(99, 90, 86, 88, NA),
assists=c(33, NA, 31, 39, 34),
rebounds=c(NA, 28, 24, 24, 28))
#view data frame
df
team points assists rebounds
1 A 99 33 NA
2 A 90 NA 28
3 B 86 31 24
4 B 88 39 24
5 C NA 34 28
Method 1: Remove Rows with NA Values in Any Column
The following code shows how to remove rows with NA values in any column of the data frame:
library(dplyr) #remove rows with NA value in any column df %>% na.omit() team points assists rebounds 3 B 86 31 24 4 B 88 39 24
The only two rows that are left are the ones without any NA values in any column.
Method 2: Remove Rows with NA Values in Certain Columns
The following code shows how to remove rows with NA values in any column of the data frame:
library(dplyr) #remove rows with NA value in 'points' or 'assists' columns df %>% filter_at(vars(points, assists), all_vars(!is.na(.))) team points assists rebounds 1 A 99 33 NA 2 B 86 31 24 3 B 88 39 24
The only rows left are the ones without any NA values in the ‘points’ or ‘assists’ columns.
Method 3: Remove Rows with NA Values in One Specific Column
The following code shows how to remove rows with NA values in one specific column of the data frame:
library(dplyr) #remove rows with NA value in 'points' column df %>% filter(!is.na(points)) team points assists rebounds 1 A 99 33 NA 2 A 90 NA 28 3 B 86 31 24 4 B 88 39 24
The only rows left are the ones without any NA values in the ‘points’ column.
The following tutorials explain how to perform other common operations using dplyr: