What is the Third Variable Problem?


In statistics, a third variable problem occurs when an observed correlation between two variables can actually be explained by a third variable that hasn’t been accounted for.

When this third variable is not taken into account, the correlation between the two variables under study can be misleading and even confusing.

This tutorial provides several examples of third variable problems in different settings.

Example 1: Dogs & Fire Hydrants

A researcher observes that cities with more fire hydrants tend to also have more dogs.

However, these two variables are only correlated because they both have a high correlation with a third variable: population size.

Larger cities tend to have both more fire hydrants and more dogs. Conversely, smaller cities tend to have fewer fire hydrants and fewer dogs.

Example of third variable problem

Example 2: Ice Cream Sales & Shark Attacks

A researcher finds that ice cream sales and shark attacks are highly positively correlated.

However, these two variables are only correlated because they both have a high correlation with a third variable: temperature.

When it’s warmer out, more people buy ice cream and more people swim in the ocean which explains why the values for both ice cream sales and shark attacks tend to increase during the same times of the year.

Third variable problem

Example 3: Volunteers & Natural Disasters

A study finds that the more volunteers that show up after a natural disaster, the greater the damage. 

However, these two variables are only correlated because they both have a high correlation with a third variable: size of the natural disaster.

Larger natural disasters are highly correlated with more damage done as well as an increase in the number of volunteers.

Example of a third variable problem in the real world

Related Articles

What is a Confounding Variable?

x