Table of Contents

The Exact Test of Goodness of Fit is a statistical method used to determine if there is a significant difference between the observed data and the expected distribution. It is a formal way to assess whether a particular sample fits a specific theoretical probability distribution. This test calculates the probability of obtaining the observed data or a more extreme result, assuming that the null hypothesis is true. It is considered a more accurate and rigorous approach compared to other goodness-of-fit tests, as it does not rely on large sample sizes or approximations. This test is commonly used in various fields such as biology, economics, and psychology to assess the fit of data to theoretical models and to make informed decisions based on the results.

## What is the Exact Test of Goodness of Fit?

**The Exact Test of Goodness of Fit** is a statistical test used to determine if the proportions of categories in a single qualitative variable significantly differ from an expected or known population proportion. To use it, you should have one group variable with only two options and you should have fewer than 10 values per cell. See more below.

*The Exact Test of Goodness of Fit is also called the Binomial Test, the One Sample Exact Test, the Goodness of Fit Test, and the Binomial Exact Test.*

## Assumptions for the Exact Test of Goodness of Fit

Every statistical method has assumptions. Assumptions mean that your data must satisfy certain properties in order for statistical method results to be accurate.

The assumptions for the Exact Test of Goodness of Fit include:

- Binary variable
- Independence
- Mutually exclusive groups

Let’s dive into what that means.

**Binary**

For this test, your variable must be binary. Binary means that your variable is a category with only two possible values. Some good examples of binary variables include gender (male/female) or any True/False or Yes/No variable.

**Independence**

Each of your observations (data points) should be independent. This means that each value of your variables doesn’t “depend” on any of the others. For example, this assumption is usually violated when there are multiple data points over time from the same unit of observation (e.g. subject/customer/store), because the data points from the same unit of observation are likely to be related or affect one another.

**Mutually Exclusive Groups**

The two groups of your categorical variable should be mutually exclusive. For example, if your categorical variable is hungry (yes/no), then your groups are mutually exclusive, because one person cannot belong to both groups at once.

## When to use the Exact Test of Goodness of Fit?

You should use the Exact Test of Goodness of Fit in the following scenario:

- You want to know the
**difference**between two variables - Your variable of interest is
**proportional or categorical** - You have only
**two options** - You have
**less than 10 in a cell**

Let’s clarify these to help you know when to use the Exact Test of Goodness of Fit.

**Difference**

You are looking for a statistical test to look at how a variable differs between two groups. Other types of analyses include testing for a relationship between two variables or predicting one variable using another variable (prediction).

**Proportional or Categorical**

For this test, your variable of interest must be proportional or categorical. A categorical variable is a variable that contains categories without a natural order. Examples of categorical variables are eye color, city of residence, type of dog, etc. Proportional variables are derived from categorical variables, for instance: the number of people that converted on two different versions of your website (10% vs 15%), percentages, the number of people who voted vs people who did not vote, the proportion of plants that died vs survived an experimental treatment, etc.

*If you have a continuous variable that you want to compare to an expected population, you may want to use a Single Sample Z-Test. *

**Two Options**

Your categorical variable should have only two possible options. Some examples of variables like this are made a purchase (yes/no), color (if just black/white), recovered from disease (yes/no).

*If you have more than two options and less than 10 in a cell, you should consider using the Multinomial Exact Test of Goodness of Fit.*

**Less than 10 in a Cell**

The rule-of-thumb we recommend is to use this test when you have around 10 or fewer observations in each cell. “Cell” in this case refers simply to the count of values in each group. For example, if I have a list of survey responses with 5 “yes” and 1 “no”, there are 5 and 1 value(s) per cell, respectively.

*If you have more than 10 in a cell, we recommend using the One-Proportion Z-Test. And if you have more than 10 in every cell and more than 1000 total observations, we recommend using the G-Test of Goodness of Fit*.

## Exact Test of Goodness of Fit Example

**Variable**: Supports political leader (yes/no)

In this example, we are interested in investigating whether our sample of subjects’ responses to a survey question differ significantly from random (i.e. an expected split of 50-50). The null hypothesis is that there is no difference between the number of “yes” responses compared to the number of “no” responses.

Because our variable is binary with only two possible values (yes/no), we know that the Exact Test of Goodness of Fit is a suitable test.

The analysis will result in a probability or p-value. The p-value represents the chance of seeing our results if there was an actual split of 50-50 in the population. A p-value less than or equal to 0.05 means that our result is statistically significant and we can trust that the difference is not due to chance alone.