Table of Contents

Log-Linear Analysis is a statistical method used to analyze the relationship between categorical variables. It is based on the concept of using logarithms to transform the data, making it easier to interpret. This technique is commonly used in fields such as sociology, psychology, and market research to identify patterns and associations between variables. It allows for the examination of complex relationships between multiple variables, making it a valuable tool for understanding and predicting behavior. The results of Log-Linear Analysis can provide insights into the underlying factors influencing the data, allowing for informed decision-making and problem-solving.

## What is Log-Linear Analysis?

**Log-Linear Analysis** is a statistical test used to determine if the proportions of categories in two or more group variables significantly differ from each other. To use this test, you should have two or more group variables with two or more options in each group variable. See more below.

*Log-Linear Analysis is also called Multi-Way Frequency Tables, Log-Linear Analysis of Frequency Tables, or Log Linear Models.*

## Assumptions for Log-Linear Analysis

Every statistical method has assumptions. Assumptions mean that your data must satisfy certain properties in order for statistical method results to be accurate.

The assumptions for Log-Linear Analysis include:

- Random Sample
- Independence
- Mutually exclusive groups

Let’s dive into what that means.

**Random Sample**

The data points for each group in your analysis must have come from a simple random sample. This is important because if your groups were not randomly determined then your analysis will be incorrect. In statistical terms this is called bias, or a tendency to have incorrect results because of bad data.

**Independence**

Each of your observations (data points) should be independent. This means that each value of your variables doesn’t “depend” on any of the others. For example, this assumption is usually violated when there are multiple data points over time from the same unit of observation (e.g. subject/customer/store), because the data points from the same unit of observation are likely to be related or affect one another.

**Mutually Exclusive Groups**

The two groups of your categorical variable should be mutually exclusive. For example, if your categorical variable is hungry (yes/no), then your groups are mutually exclusive, because one person cannot belong to both groups at once.

## When to use Log-Linear Analysis?

You should use Log-Linear Analysis in the following scenario:

- You want to test the
**difference**between two or more variables - Your variable of interest is
**proportional or categorical** - You have
**two or more options**

Let’s clarify these to help you know when to use Log-Linear Analysis.

**Difference**

You are looking for a statistical test to look at how a variable differs between two groups. Other types of analyses include testing for a relationship between two variables or predicting one variable using another variable (prediction).

**Proportional or Categorical**

For this test, your variable of interest must be proportional or categorical. A categorical variable is a variable that contains categories without a natural order. Examples of categorical variables are eye color, city of residence, type of dog, etc. Proportional variables are derived from categorical variables, for instance: the number of people that converted on two different versions of your website (10% vs 15%), percentages, the number of people who voted vs people who did not vote, the proportion of plants that died vs survived an experimental treatment, etc.

*If you want to compare two or more continuous variables, you may want to use a One-Way ANOVA. *

**Two or more Options**

Your categorical variables should have two or more possible options. Some examples of variables like this are made a purchase (yes/no), color (black/white/red/etc), recovered from disease (yes/no).

## Log-Linear Analysis Example

**Group Variable 1**: Bird Size (large/small)

**Group Variable 2**: Bird Color (black/white/gray)

**Group Variable 3**: Bird Habitat (island/mainland)

In this example, we are interested in investigating whether there are significant relationships among our variables of bird size, color, and habitat. The null hypothesis is that there is no relationship among the variables.

Because our variable has two or more possible values (yes/no), and our data meet all other assumptions, we know that the Chi-Square Test of Independence is appropriate to use.

The analysis will result in a probability or p-value for each interaction between variables. The p-value represents the chance of seeing our results if there was actually no relationship among the variables in question. A p-value less than or equal to 0.05 means that our result is statistically significant and we can trust that the difference is not due to chance alone.