Table of Contents
The Matthews Correlation Coefficient (MCC) in R is a measure of the correlation between two binary variables and is calculated using the confusion matrix. It is calculated by taking the product of the true positives and true negatives, subtracting the product of the false positives and false negatives, and then dividing the result by the square root of the product of the sum of the true positives and false negatives multiplied by the sum of the true negatives and false positives. This value ranges from -1 to 1, where 1 represents a perfect prediction and -1 represents a perfect inverted prediction.
Matthews correlation coefficient (MCC) is a metric we can use to assess the performance of a .
It is calculated as:
MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)
where:
- TP: Number of true positives
- TN: Number of true negatives
- FP: Number of false positives
- FN: Number of false negatives
This metric is particularly useful when the two classes are imbalanced – that is, one class appears much more than the other.
The value for MCC ranges from -1 to 1 where:
- -1 indicates total disagreement between predicted classes and actual classes
- 0 is synonymous with completely random guessing
- 1 indicates total agreement between predicted classes and actual classes
For example, suppose a sports analyst uses a to predict whether or not 400 different college basketball players get drafted into the NBA.
The following confusion matrix summarizes the predictions made by the model:
To calculate the MCC of the model, we can use the following formula:
- MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)
- MCC = (15*375-5*5) / √(15+5)(15+5)(375+5)(375+5)
- MCC = 0.7368
Matthews correlation coefficient turns out to be 0.7368.
This value is somewhat close to one, which indicates that the model does a decent job of predicting whether or not players will get drafted.
The following example shows how to calculate MCC for this exact scenario using the mcc() function from the mltools package in R.
Example: Calculating Matthews Correlation Coefficient in R
library(mltools) #define vector of actual classes actual <- rep(c(1, 0), times=c(20, 380)) #define vector of predicted classes preds <- rep(c(1, 0, 1, 0), times=c(15, 5, 5, 375)) #calculate Matthews correlation coefficient mcc(preds, actual) [1] 0.7368421
Matthews correlation coefficient is 0.7368.
This matches the value that we calculated earlier by hand.
If you’d like to calculate Matthews correlation coefficient for a confusion matrix, you can use the confusionM argument as follows:
library(mltools) #create confusion matrix conf_matrix <- matrix(c(15, 5, 5, 375), nrow=2) #view confusion matrix conf_matrix [,1] [,2] [1,] 15 5 [2,] 5 375 #calculate Matthews correlation coefficient for confusion matrix mcc(confusionM = conf_matrix) [1] 0.7368421
Once again, Matthews correlation coefficient is 0.7368
The following tutorials explain how to perform other common tasks in R: