# Exact Test of Goodness of Fit (multinomial model)

The Exact Test of Goodness of Fit (multinomial model) is a statistical analysis technique used to determine if a set of observed data fits a specified theoretical distribution. This test is commonly used in cases where the expected frequencies in each category are small and the traditional chi-square test may not be appropriate. The Exact Test calculates the exact probability of observing the data, given the specified distribution, and compares it to a pre-determined significance level to determine if there is a significant difference between the observed and expected frequencies. This test is particularly useful in situations where small sample sizes or rare events make it difficult to accurately assess the fit of a theoretical model.

## What is the Exact Test of Goodness of Fit (multinomial model)?

The Exact Test of Goodness of Fit (multinomial model) is a statistical test used to determine if the proportions of categories in a single qualitative variable significantly differ from an expected or known population proportion. To use it, you should have one group variable with more than two options and you should have fewer than 10 values per cell. See more below.

The Exact Test of Goodness of Fit (multinomial model)is also called the Multinomial Test, the Multinomial Model, the Goodness of Fit Test, and the Multinomial Exact Test.

## Assumptions for the Exact Test of Goodness of Fit (multinomial model)

Every statistical method has assumptions. Assumptions mean that your data must satisfy certain properties in order for statistical method results to be accurate.

The assumptions for the Exact Test of Goodness of Fit (multinomial model) include:

1. Categorical variable
2. Independence
3. Mutually exclusive groups

Let’s dive into what that means.

#### Categorical

For this test, your variable must be categorical with more than two categories. A categorical variable is a variable that is a category without a natural order. Examples of categorical variables are eye color, city of residence, type of dog, etc.

#### Independence

Each of your observations (data points) should be independent. This means that each value of your variables doesn’t “depend” on any of the others. For example, this assumption is usually violated when there are multiple data points over time from the same unit of observation (e.g. subject/customer/store), because the data points from the same unit of observation are likely to be related or affect one another.

#### Mutually Exclusive Groups

The groups of your categorical variable should be mutually exclusive. For example, if your categorical variable is city of residence, then your groups are mutually exclusive, because one person cannot live in multiple cities at once.

## When to use the Exact Test of Goodness of Fit (multinomial model)?

You should use the Exact Test of Goodness of Fit (multinomial model) in the following scenario:

1. You want to know the difference between two variables
2. Your variable of interest is proportional or categorical
3. You have more than two options
4. You have less than 10 in a cell

Let’s clarify these to help you know when to use the Exact Test of Goodness of Fit (multinomial model).

#### Difference

You are looking for a statistical test to look at how a variable differs between two groups. Other types of analyses include testing for a relationship between two variables or predicting one variable using another variable (prediction).

#### Proportional or Categorical

For this test, your variable of interest must be proportional or categorical. A categorical variable is a variable that contains categories without a natural order. Examples of categorical variables are eye color, city of residence, type of dog, etc. Proportional variables are derived from categorical variables, for instance: the number of people that converted on two different versions of your website (10% vs 15%), percentages, the number of people who voted vs people who did not vote, the proportion of plants that died vs survived an experimental treatment, etc.

If you have a continuous variable that you want to compare to an expected population, you may want to use a Single Sample Z-Test.

#### More than Two Options

Your categorical variable should have more than two options. Some examples of variables like this are eye color, city of residence, and type of dog.

If you have only two options and less than 10 in a cell, you should consider using the Binomial Exact Test of Goodness of Fit.

#### Less than 10 in a Cell

The rule-of-thumb we recommend is to use this test when you have around 10 or fewer observations in each cell. “Cell” in this case refers simply to the count of values in each group. For example, if I have a list of survey responses with 5 “yes” and 1 “no”, there are 5 and 1 value(s) per cell, respectively.

If you have more than 10 in a cell, we recommend using the One-Proportion Z-Test. And if you have more than 10 in every cell and more than 1000 total observations, we recommend using the G-Test of Goodness of Fit.

## Exact Test of Goodness of Fit (multinomial model) Example

Variable: Political party

In this example, we have a group of subjects and are interested in investigating whether their political party alignment differs from the typical proportions of the population from which the sample was drawn. The null hypothesis is that there is no difference between the proportions in each political party between the sample and population.

Because our variable is categorical with more than two values (one value for each political party), we know that the Exact Test of Goodness of Fit (multinomial model) is a suitable test.

The analysis will result in a probability or p-value. The p-value represents the chance of seeing our results if the sample was randomly selected from the population. The lower the p-value, the more different our sample proportions are from the population. A p-value less than or equal to 0.05 means that our result is statistically significant and we can conclude that our sample is different from the population on our variable of interest.

x