R: invalid factor level, NA generated

R: invalid factor level, NA generated refers to an error that occurs when trying to convert non-existent or invalid levels of a factor variable into numerical values. When this happens, it will generate an NA (not available) value instead of a numerical value. This can happen when the data is incorrectly inputted or there are too many levels that don’t match up with the data.


One warning message you may encounter when using R is:

Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "C") :
  invalid factor level, NA generated

This warning occurs when you attempt to add a value to a factor variable in R that does not already exist as a defined level.

The following example shows how to address this warning in practice.

How to Reproduce the Warning

Suppose we have the following data frame in R:

#create data frame
df <- data.frame(team=factor(c('A', 'A', 'B', 'B', 'B')),
                 points=c(99, 90, 86, 88, 95))

#view data frame
df

  team points
1    A     99
2    A     90
3    B     86
4    B     88
5    B     95

#view structure of data frame
str(df)

'data.frame':	5 obs. of  2 variables:
 $ team  : Factor w/ 2 levels "A","B": 1 1 2 2 2
 $ points: num  99 90 86 88 95

We can see that the team variable is a factor with two levels: “A” and “B”

Now suppose we attempt to to the end of the data frame using a value of “C” for team:

#add new row to end of data frame
df[nrow(df) + 1,] = c('C', 100)

Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "C") :
  invalid factor level, NA generated

We receive a warning message because the value “C” does not already exist as a factor level for the team variable.

It’s important to note that this is simply a warning message and R will still add the new row to the end of the data frame, but it will use a value of NA instead of “C”:

#view updated data frame
df

  team points
1    A     99
2    A     90
3    B     86
4    B     88
5    B     95
6   NA    100

How to Avoid the Warning

To avoid the invalid factor level warning, we must first convert the factor variable to a character variable and then we can convert it back to a factor variable after adding the new row:

#convert team variable to character
df$team <- as.character(df$team)

#add new row to end of data frame
df[nrow(df) + 1,] = c('C', 100)

#convert team variable back to factor
df$team <- as.factor(df$team)

#view updated data frame
df

  team points
1    A     99
2    A     90
3    B     86
4    B     88
5    B     95
6    C    100

Notice that we’re able to successfully add a new row to the end of the data frame and we avoid a warning message.

#view structure of updated data frame
str(df)

'data.frame':	6 obs. of  2 variables:
 $ team  : Factor w/ 3 levels "A","B","C": 1 1 2 2 2 3
 $ points: chr  "99" "90" "86" "88" ...

The following tutorials explain how to fix other common errors in R:

x