Table of Contents
In R, correlation can be calculated even when there are missing values present in the data. This can be done by using the “cor” function and specifying the desired method for handling missing values, such as “pairwise.complete.obs” or “pairwise.na.ignore”. These methods will calculate the correlation between each pair of variables using only the available data points and ignore any missing values. This allows for a more accurate and complete analysis of the relationship between variables, even when some data is missing.
Calculate Correlation in R with Missing Values
You can use the following methods to calculate correlation coefficients in R when one or more variables have missing values:
Method 1: Calculate Correlation Coefficient with Missing Values Present
cor(x, y, use='complete.obs')Method 2: Calculate Correlation Matrix with Missing Values Present
cor(df, use='pairwise.complete.obs')The following examples show how to use each method in practice.
Example 1: Calculate Correlation Coefficient with Missing Values Present
Suppose we attempt to use the cor() function to calculate the Pearson correlation coefficient between two variables when missing values are present:
#create two variables
x <- c(70, 78, 90, 87, 84, NA, 91, 74, 83, 85)
y <- c(90, NA, 79, 86, 84, 83, 88, 92, 76, 75)
#attempt to calculate correlation coefficient between x and y
cor(x, y)
[1] NAThe cor() function returns NA since we didn’t specify how to handle missing values.
To avoid this issue, we can use the argument use=’complete.obs’ so that R knows to only use pairwise observations where both values are present:
#create two variables
x <- c(70, 78, 90, 87, 84, NA, 91, 74, 83, 85)
y <- c(90, NA, 79, 86, 84, 83, 88, 92, 76, 75)
#calculate correlation coefficient between x and y
cor(x, y, use='complete.obs')
[1] -0.4888749
The correlation coefficient between the two variables turns out to be -0.488749.
Note that the cor() function only used pairwise combinations where both values were present when calculating the correlation coefficient.
Example 2: Calculate Correlation Matrix with Missing Values Present
Suppose we attempt to use the cor() function to create a for a data frame with three variables when missing values are present:
#create data frame with some missing values
df <- data.frame(x=c(70, 78, 90, 87, 84, NA, 91, 74, 83, 85),
y=c(90, NA, 79, 86, 84, 83, 88, 92, 76, 75),
z=c(57, 57, 58, 59, 60, 78, 81, 83, NA, 90))
#attempt to create correlation matrix for variables in data frame
cor(df)
x y z
x 1 NA NA
y NA 1 NA
z NA NA 1
To avoid this issue, we can use the argument use=’pairwise.complete.obs’ so that R knows to only use pairwise observations where both values are present:
#create data frame with some missing values
df <- data.frame(x=c(70, 78, 90, 87, 84, NA, 91, 74, 83, 85),
y=c(90, NA, 79, 86, 84, 83, 88, 92, 76, 75),
z=c(57, 57, 58, 59, 60, 78, 81, 83, NA, 90))
#create correlation matrix for variables using only pairwise complete observations
cor(df, use='pairwise.complete.obs')
x y z
x 1.0000000 -0.4888749 0.1311651
y -0.4888749 1.0000000 -0.1562371
z 0.1311651 -0.1562371 1.0000000
The correlation coefficients for each pairwise combination of variables in the data frame are now shown.
Cite this article
stats writer (2024). How can correlation be calculated in R when there are missing values present in the data?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-correlation-be-calculated-in-r-when-there-are-missing-values-present-in-the-data/
stats writer. "How can correlation be calculated in R when there are missing values present in the data?." PSYCHOLOGICAL SCALES, 24 Jun. 2024, https://scales.arabpsychology.com/stats/how-can-correlation-be-calculated-in-r-when-there-are-missing-values-present-in-the-data/.
stats writer. "How can correlation be calculated in R when there are missing values present in the data?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-correlation-be-calculated-in-r-when-there-are-missing-values-present-in-the-data/.
stats writer (2024) 'How can correlation be calculated in R when there are missing values present in the data?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-correlation-be-calculated-in-r-when-there-are-missing-values-present-in-the-data/.
[1] stats writer, "How can correlation be calculated in R when there are missing values present in the data?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, June, 2024.
stats writer. How can correlation be calculated in R when there are missing values present in the data?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.
