How can I create a confusion matrix in Google Sheets?

Creating a confusion matrix in Google Sheets involves organizing and analyzing data to visualize the performance of a classification model. This involves inputting the predicted and actual values of a dataset into a table, and using built-in functions to calculate the true positive, true negative, false positive, and false negative values. These values are then used to populate the confusion matrix, which displays the accuracy of the model’s predictions. Overall, creating a confusion matrix in Google Sheets provides a comprehensive and easily interpretable way to evaluate the effectiveness of a classification model.

Create a Confusion Matrix in Google Sheets


is a type of regression we use when the is binary.

One common way to evaluate the quality of a logistic regression model is to create a confusion matrix, which is a 2×2 table that shows the predicted values from the model vs. the actual values from the test dataset.

The following step-by-step example shows how to create a confusion matrix in Google Sheets.

Step 1: Enter the Data

First, let’s enter a column of actual values for a response variable along with the predicted values by a logistic regression model:

Step 2: Create the Confusion Matrix

Next, we will create the following outline for a confusion matrix:

Next, we will type the following formula into cell E3:

=COUNTIFS($A$2:$A$21,$D3,$B$2:$B$21,E$2)

We can then click and drag this formula to each remaining cell in the confusion matrix:

Google Sheets confusion matrix

Step 3: Calculate Accuracy, Precision and Recall

Once we’ve created the confusion matrix, we can calculate the following metrics:

  • Accuracy: Percentage of correct predictions
  • Precision: Correct positive predictions relative to total positive predictions
  • Recall: Correct positive predictions relative to total actual positives

 

The higher the accuracy, the better a model is able to correctly classify observations.

In this example, our model has an accuracy of 0.7 which tells us that it correctly classified 70% of observations.

If we’d like, we can compare this accuracy to that of other logistic regression models to determine which model is best at classifying observations into categories of 0 or 1.

Additional Resources

The following tutorials provide additional information about logistic regression:

x