How to Perform a Chi-Square Test of Independence in Stata

A chi-square test of independence is used to test the association between two categorical variables. In Stata, the command “chi2” can be used to perform the test. This command takes the same input as the “tab” command, which is two variables that represent the two categorical variables. The output of the command will show the chi-square statistic, the associated p-value, and the degrees of freedom. The p-value can then be used to determine if the association between the two variables is significant.


A  is used to determine whether or not there is a significant association between two categorical variables.

This tutorial explains how to perform a Chi-Square Test of Independence in Stata.

Example: Chi-Square Test of Independence in Stata

For this example we will use a dataset called auto, which contains information about 74 different automobiles from 1978.

Use the following steps to perform a Chi-Square Test of Independence to determine if there is a significant association between the following two variables:

  • rep78: the number of times the car received a repair in 1978 (ranges from  1 to 5)
  • foreign: whether or not the car type is foreign (0 = no, 1 = yes)

Step 1: Load and view the raw data.

First, we will load the data by typing in the following command:

sysuse auto

We can view the raw data by typing in the following command:

br

Raw data for auto dataset in Stata

Each line displays information for an individual car including price, mpg, weight, length, and a variety of other variables. The only two variables that we care about are rep78 and foreign.

Step 3: Perform the Chi-Square Test of Independence.

We will use the following syntax to perform the test:

tab first_variable second_variable, chi2

Here is the exact syntax we’ll use in our case:

tab rep78 foreign, chi2

Chi-Square test of independence output in Stata

Here is how to interpret the output:

Summary table: This table shows the total counts for each combination of rep78 and foreign. For example

  • There were 2 cars that were domestic and received 1 repair in 1978.
  • There were 8 cars that were domestic and received 2 repairs in 1978.
  • There were 27 cars that were domestic and received 3 repairs in 1978.

And so on.

Pearson chisq(4): This is the Chi-Square test statistic for the test. It turns out to be 27.2640.

Pr: This is the p-value associated with the Chi-Square test statistic. It turns out to be 0.000. Since this is less than 0.05, we fail to reject the null hypothesis that the two variables are independent. We have sufficient evidence to conclude that there is a statistically significant association between whether or not a car was foreign and the total number of repairs it received.

x