How to Interpolate Missing Values in R

Interpolation is a technique for estimating the values of a function between two known data points. In R, interpolation can be done using the approx() function. This function takes two arguments: an x vector of known values and a y vector of known values, and then returns a vector with the interpolated values. It can also accept a third argument, which is the type of interpolation to use, such as linear or cubic.


You can use the following basic syntax to interpolate missing values in a data frame column in R:

library(dplyr)
library(zoo)

df <- df %>%
        mutate(column_name = na.approx(column_name))

The following example shows how to use this syntax in practice.

Example: Interpolate Missing Values in R

Suppose we have the following data frame in R that shows the total sales made by a store during 15 consecutive days:

#create data frame
df <- data.frame(day=1:15,
                 sales=c(3, 6, 8, 10, 14, 17, 20, NA, NA, NA, NA, 35, 39, 44, 49))

#view data frame
df

   day sales
1    1     3
2    2     6
3    3     8
4    4    10
5    5    14
6    6    17
7    7    20
8    8    NA
9    9    NA
10  10    NA
11  11    NA
12  12    35
13  13    39
14  14    44
15  15    49

Notice that we’re missing sales numbers for four days in the data frame.

If we create a simple line chart to visualize the sales over time, here’s what it would look like:

#create line chart to visualize sales
plot(df$sales, type='o', pch=16, col='steelblue', xlab='Day', ylab='Sales')

interpolate missing values in R

To fill in the missing values, we can use the function from the zoo package along with the function from the dplyr package:

library(dplyr)
library(zoo)

#interpolate missing values in 'sales' column
df <- df %>%
        mutate(sales = na.approx(sales))

#view updated data frame
df

   day sales
1    1     3
2    2     6
3    3     8
4    4    10
5    5    14
6    6    17
7    7    20
8    8    23
9    9    26
10  10    29
11  11    32
12  12    35
13  13    39
14  14    44
15  15    49

Notice that each of the missing values has been replaced.

If we create another line chart to visualize the updated data frame, here’s what it would look like:

#create line chart to visualize sales
plot(df$sales, type='o', pch=16, col='steelblue', xlab='Day', ylab='Sales')

Notice that the values chosen by the na.approx() function seem to fit the trend in the data quite well.

The following tutorials provide additional information on how to handle missing values in R:

x