Table of Contents

The Chi-Square Test of Independence is a statistical test used to determine if there is a significant association between two categorical variables. It is used to observe if the distributions of two variables differ from one another, and to determine the strength of that relationship. The test calculates the significance of the differences between observed and expected frequencies in the data. It is used to assess whether the variables are independent of each other, or if they are related in some way.

A **Chi-Square Test of Independence** is used to determine whether or not there is a significant association between two categorical variables.

This tutorial explains the following:

- The motivation for performing a Chi-Square Test of Independence.
- The formula to perform a Chi-Square Test of Independence.
- An example of how to perform a Chi-Square Test of Independence.

**Chi-Square Test of Independence: Motivation**

A Chi-Square test of independence can be used to determine if there is an association between two categorical variables in a many different settings. Here are a few examples:

- We want to know if gender is associated with political party preference so we survey 500 voters and record their gender and political party preference.
- We want to know if a person’s favorite color is associated with their favorite sport so we survey 100 people and ask them about their preferences for both.
- We want to know if education level and marital status are associated so we collect data about these two variables on a simple random sample of 50 people.

In each of these scenarios we want to know if two categorical variables are associated with each other. In each scenario, we can use a Chi-Square test of independence to determine if there is a statistically significant association between the variables.

**Chi-Square Test of Independence: Formula**

A Chi-Square test of independence uses the following null and alternative hypotheses:

**H**The two variables are independent._{0}: (null hypothesis)**H**The two variables are_{1}: (alternative hypothesis)*not*independent. (i.e. they are associated)

We use the following formula to calculate the Chi-Square test statistic X^{2}:

**X ^{2} = Σ(O-E)^{2} / E**

where:

**Σ:**is a fancy symbol that means “sum”**O:**observed value**E:**expected value

If the p-value that corresponds to the test statistic X^{2} with (#rows-1)*(#columns-1) degrees of freedom is less than your chosen significance level then you can reject the null hypothesis.

**Chi-Square Test of Independence: Example**

Suppose we want to know whether or not gender is associated with political party preference. We take a simple random sample of 500 voters and survey them on their political party preference. The following table shows the results of the survey:

Republican | Democrat | Independent | Total | |

Male | 120 | 90 | 40 | 250 |

Female | 110 | 95 | 45 | 250 |

Total | 230 | 185 | 85 | 500 |

Use the following steps to perform a Chi-Square test of independence to determine if gender is associated with political party preference.

**Step 1: Define the hypotheses.**

We will perform the Chi-Square test of independence using the following hypotheses:

**H**Gender and political party preference are independent._{0}:**H**Gender and political party preference are_{1}:*not*independent.

**Step 2: Calculate the expected values.**

Next, we will calculate the expected values for each cell in the contingency table using the following formula:

Expected value = (row sum * column sum) / table sum.

For example, the expected value for Male Republicans is: (230*250) / 500 = **115**.

We can repeat this formula to obtain the expected value for each cell in the table:

Republican | Democrat | Independent | Total | |

Male | 115 | 92.5 | 42.5 | 250 |

Female | 115 | 92.5 | 42.5 | 250 |

Total | 230 | 185 | 85 | 500 |

**Step 3: Calculate (O-E) ^{2} / E for each cell in the table.**

Next we will calculate **(O-E) ^{2} / E **for each cell in the table where:

**O:**observed value**E:**expected value

For example, Male Republicans would have a value of: (120-115)^{2} /115 = **0.2174**.

We can repeat this formula for each cell in the table:

Republican | Democrat | Independent | |

Male | 0.2174 | 0.0676 | 0.1471 |

Female | 0.2174 | 0.0676 | 0.1471 |

**Step 4: Calculate the test statistic X ^{2} and the corresponding p-value.**

**X ^{2 }**= Σ(O-E)

^{2}/ E = 0.2174 + 0.2174 + 0.0676 + 0.0676 + 0.1471 + 0.1471 =

**0.8642**

According to the Chi-Square Score to P Value Calculator, the p-value associated with X^{2} = 0.8642 and (2-1)*(3-1) = 2 degrees of freedom is **0.649198**.

**Step 5: Draw a conclusion.**

Since this p-value is not less than 0.05, we fail to reject the null hypothesis. This means we do not have sufficient evidence to say that there is an association between gender and political party preference.

**Note: **You can also perform this entire test by simply using the Chi-Square Test of Independence Calculator.

The following tutorials explain how to perform a Chi-Square test of independence using different statistical programs:

How to Perform a Chi-Square Test of Independence in Stata

How to Perform a Chi-Square Test of Independence in Excel

How to Perform a Chi-Square Test of Independence in SPSS

How to Perform a Chi-Square Test of Independence in Python

How to Perform a Chi-Square Test of Independence in R

Chi-Square Test of Independence on a TI-84 Calculator

Chi-Square Test of Independence Calculator