Table of Contents
In R, the polychoric correlation can be calculated using the polychor package. This package uses a maximum likelihood estimation procedure to calculate the correlation between two ordinal variables, taking into account the underlying continuous structure of the variables. This is done by calculating the correlation coefficient between the latent variables that are assumed to underlie the ordinal variables. This allows for a more accurate estimation of the correlation between two variables.
Polychoric correlation is used to calculate the correlation between ordinal variables.
Recall that are variables whose possible values are categorical and have a natural order.
Some examples of variables measured on an ordinal scale include:
- Satisfaction: Very unsatisfied, unsatisfied, neutral, satisfied, very satisfied
- Income level: Low income, medium income, high income
- Workplace status: Entry Analyst, Analyst I, Analyst II, Lead Analyst
- Degree of pain: Small amount, medium amount, high amount
The value for polychoric correlation ranges from -1 to 1 where:
- -1 indicates a perfect negative correlation
- 0 indicates no correlation
- 1 indicates a perfect positive correlation
We can use the polychor(x, y) function from the polycor package to calculate the polychoric correlation between two ordinal variables in R.
The following examples show how to use this function in practice.
Example 1: Calculate Polychoric Correlation for Movie Ratings
Suppose want to know whether or not two different movie ratings agencies have a high correlation between their movie ratings.
We ask each agency to rate 20 different movies on a scale of 1 to 3 where:
- 1 indicates “bad”
- 2 indicates “mediocre”
- 3 indicates “good”
We can use the following code in R to calculate the polychoric correlation between the ratings of the two agencies:
library(polycor) #define movie ratings for each agency agency1 <- c(1, 1, 2, 2, 3, 2, 2, 3, 2, 3, 3, 2, 1, 2, 2, 1, 1, 1, 2, 2) agency2 <- c(1, 1, 2, 1, 3, 3, 3, 2, 2, 3, 3, 3, 2, 2, 2, 1, 2, 1, 3, 3) #calculate polychoric correlation between ratings polychor(agency1, agency2) [1] 0.7828328
The polychoric correlation turns out to be 0.78.
This value is quite high, which indicates that there is a strong positive association between the ratings from each agency.
Example 2: Calculate Polychoric Correlation for Restaurant Ratings
We randomly survey 20 customers who ate at both restaurants and ask them to rate their overall satisfaction a scale of 1 to 5 where:
- 1 indicates “very unsatisfied”
- 2 indicates “unsatisfied”
- 3 indicates “neutral”
- 4 indicates “satisfied”
- 5 indicates “very satisfied”
We can use the following code in R to calculate the polychoric correlation between the ratings of the two restaurants:
library(polycor) #define ratings for each restaurant restaurant1 <- c(1, 1, 2, 2, 2, 3, 3, 3, 2, 2, 3, 4, 4, 5, 5, 4, 3, 4, 5, 5) restaurant2 <- c(4, 3, 3, 4, 3, 3, 4, 5, 4, 4, 4, 5, 5, 4, 2, 1, 1, 2, 1, 4) #calculate polychoric correlation between ratings polychor(restaurant1, restaurant2) [1] -0.1322774
The polychoric correlation turns out to be -0.13.
This value is close to zero, which indicates that there is very little (if any) association between the ratings of the restaurants.
The following tutorials explain how to calculate other common correlation coefficients in R: