What is Chi-Square Test of Independence?

The Chi-Square Test of Independence is a statistical test used to determine if there is a significant association between two categorical variables. It is used to observe if the distributions of two variables differ from one another, and to determine the strength of that relationship. The test calculates the significance of the differences between observed and expected frequencies in the data. It is used to assess whether the variables are independent of each other, or if they are related in some way.


A Chi-Square Test of Independence is used to determine whether or not there is a significant association between two categorical variables.

This tutorial explains the following:

  • The motivation for performing a Chi-Square Test of Independence.
  • The formula to perform a Chi-Square Test of Independence.
  • An example of how to perform a Chi-Square Test of Independence.

Chi-Square Test of Independence: Motivation

A Chi-Square test of independence can be used to determine if there is an association between two categorical variables in a many different settings. Here are a few examples:

  • We want to know if gender is associated with political party preference so we survey 500 voters and record their gender and political party preference.
  • We want to know if a person’s favorite color is associated with their favorite sport so we survey 100 people and ask them about their preferences for both.
  • We want to know if education level and marital status are associated so we collect data about these two variables on a simple random sample of 50 people.

In each of these scenarios we want to know if two categorical variables are associated with each other. In each scenario, we can use a Chi-Square test of independence to determine if there is a statistically significant association between the variables. 

Chi-Square Test of Independence: Formula

A Chi-Square test of independence uses the following null and alternative hypotheses:

  • H0: (null hypothesis) The two variables are independent.
  • H1: (alternative hypothesis) The two variables are not independent. (i.e. they are associated)

We use the following formula to calculate the Chi-Square test statistic X2:

X2 = Σ(O-E)2 / E

where:

  • Σ: is a fancy symbol that means “sum”
  • O: observed value
  • E: expected value

If the p-value that corresponds to the test statistic X2 with (#rows-1)*(#columns-1) degrees of freedom is less than your chosen significance level then you can reject the null hypothesis.

Chi-Square Test of Independence: Example

Suppose we want to know whether or not gender is associated with political party preference. We take a simple random sample of 500 voters and survey them on their political party preference. The following table shows the results of the survey:

  Republican Democrat Independent Total
Male 120 90 40 250
Female 110 95 45 250
Total 230 185 85 500

Use the following steps to perform a Chi-Square test of independence to determine if gender is associated with political party preference.

Step 1: Define the hypotheses.

We will perform the Chi-Square test of independence using the following hypotheses:

  • H0Gender and political party preference are independent.
  • H1: Gender and political party preference are not independent.

Step 2: Calculate the expected values.

Next, we will calculate the expected values for each cell in the contingency table using the following formula:

Expected value = (row sum * column sum) / table sum.

For example, the expected value for Male Republicans is: (230*250) / 500 = 115.

We can repeat this formula to obtain the expected value for each cell in the table:

  Republican Democrat Independent Total
Male 115 92.5 42.5 250
Female 115 92.5 42.5 250
Total 230 185 85 500

Step 3: Calculate (O-E)2 / E for each cell in the table.

Next we will calculate (O-E)2 / E for each cell in the table where:

  • O: observed value
  • E: expected value

For example, Male Republicans would have a value of: (120-115)2 /115 = 0.2174.

We can repeat this formula for each cell in the table:

  Republican Democrat Independent
Male 0.2174 0.0676 0.1471
Female 0.2174 0.0676 0.1471

Step 4: Calculate the test statistic X2 and the corresponding p-value.

X= Σ(O-E)2 / E = 0.2174 + 0.2174 + 0.0676 + 0.0676 + 0.1471 + 0.1471 = 0.8642

According to the Chi-Square Score to P Value Calculator, the p-value associated with X2 = 0.8642 and (2-1)*(3-1) = 2 degrees of freedom is 0.649198.

Step 5: Draw a conclusion.

Since this p-value is not less than 0.05, we fail to reject the null hypothesis. This means we do not have sufficient evidence to say that there is an association between gender and political party preference.

Note: You can also perform this entire test by simply using the Chi-Square Test of Independence Calculator.

The following tutorials explain how to perform a Chi-Square test of independence using different statistical programs:

How to Perform a Chi-Square Test of Independence in Stata
How to Perform a Chi-Square Test of Independence in Excel
How to Perform a Chi-Square Test of Independence in SPSS
How to Perform a Chi-Square Test of Independence in Python
How to Perform a Chi-Square Test of Independence in R
Chi-Square Test of Independence on a TI-84 Calculator
Chi-Square Test of Independence Calculator

x