How to Create a Contingency Table in Python

Creating a contingency table in Python is relatively simple. It involves using a pandas DataFrame object and setting the index and columns to the variables of interest and then using the crosstab function to generate the table. After that, you can perform analysis on the table to answer questions related to the data.


contingency table is a type of table that summarizes the relationship between two categorical variables.

To create a contingency table in Python, we can use the function, which uses the following sytax:

pandas.crosstab(index, columns)

where:

  • index: name of variable to display in the rows of the contingency table
  • columns: name of variable to display in the columns of the contingency table

The following step-by-step example shows how to use this function to create a contingency table in Python.

Step 1: Create the Data

First, let’s create a dataset that shows information for 20 different product orders, including the type of product purchased (TV, computer, or radio) along with the country (A, B, or C) that the product was purchased in:

import pandas as pd

#create data
df = pd.DataFrame({'Order': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
                            11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
                   'Product': ['TV', 'TV', 'Comp', 'TV', 'TV', 'Comp',
                               'Comp', 'Comp', 'TV', 'Radio', 'TV', 'Radio', 'Radio',
                               'Radio', 'Comp', 'Comp', 'TV', 'TV', 'Radio', 'TV'],
                   'Country': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B',
                               'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C']})

#view data
df

        Order	Product	Country
0	1	TV	A
1	2	TV	A
2	3	Comp	A
3	4	TV	A
4	5	TV	B
5	6	Comp	B
6	7	Comp	B
7	8	Comp	B
8	9	TV	B
9	10	Radio	B
10	11	TV	B
11	12	Radio	B
12	13	Radio	C
13	14	Radio	C
14	15	Comp	C
15	16	Comp	C
16	17	TV	C
17	18	TV	C
18	19	Radio	C
19	20	TV	C

Step 2: Create the Contingency Table

The following code shows how to create a contingency table to count the number of each product ordered by each country:

#create contingency table
pd.crosstab(index=df['Country'], columns=df['Product'])

Product	Comp	Radio	TV
Country			
A	1	0	3
B	3	2	3
C	2	3	3

Here’s how to interpret the table:

  • A total of computer was purchased from country A.
  • A total of computers were purchased from country B.
  • A total of computers were purchased from country C.
  • A total of radios were purchased from country A.
  • A total of radios were purchased from country B.
  • A total of radios were purchased from country C.
  • A total of TV’s were purchased from country A.
  • A total of TV’s were purchased from country B.
  • A total of TV’s were purchased from country C.

Step 3: Add Margin Totals to the Contingency Table

We can use the argument margins=True to add the margin totals to the contingency table:

#add margins to contingency table
pd.crosstab(index=df['Country'], columns=df['Product'], margins=True)

Product	Comp	Radio	TV	All
Country				
A	1	0	3	4
B	3	2	3	8
C	2	3	3	8
All	6	5	9	20 

Row Totals:

  • A total of orders were made from country A.
  • A total of orders were made from country B.
  • A total of 8 orders were made from country C.

Column Totals:

  • A total of 6 computers were purchased.
  • A total of 5 radios were purchased.
  • A total of 9 TV’s were purchased.

The value in the bottom right corner of the table shows that a total of 20 products were ordered from all countries.

x