Table of Contents

Log-likelihood values are used to measure the performance of a statistical model. The larger the log-likelihood value, the better the fit of the model. A positive log-likelihood value indicates that the data is more likely to have occurred given the model than it is to have occurred randomly. Conversely, a negative log-likelihood value indicates that the data is less likely to have occurred given the model than it is to have occurred randomly. To interpret log-likelihood values, compare the log-likelihood of two models and select the model with the higher log-likelihood value, as it is the better fit.

The log-likelihood value of a regression model is a way to measure the goodness of fit for a model. The higher the value of the log-likelihood, the better a model fits a dataset.

The log-likelihood value for a given model can range from negative infinity to positive infinity. The actual log-likelihood value for a given model is mostly meaningless, but it’s useful for comparing two or more models.

In practice, we often fit several regression models to a dataset and choose the model with the highest log-likelihood value as the model that fits the data best.

The following example shows how to interpret log-likelihood values for different regression models in practice.

Example: Interpreting Log-Likelihood Values

Suppose we have the following dataset that shows the number of bedrooms, number of bathrooms, and selling price of 20 different houses in a particular neighborhood:

loglik1

Suppose we’d like to fit the following two regression models and determine which one offers a better fit to the data:

Model 1: Price = β₀ + β₁(number of bedrooms)

Model 2: Price = β₀ + β₁(number of bathrooms)

The following code shows how to fit each regression model and calculate the log-likelihood value of each model in R:

#define data
df <- data.frame(beds=c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3,
                        3, 3, 3, 3, 4, 4, 4, 5, 5, 6),
                 baths=c(2, 1, 4, 3, 2, 2, 3, 5, 4, 3,
                         4, 4, 3, 4, 2, 4, 3, 5, 6, 7),
                 price=c(120, 133, 139, 185, 148, 160, 192, 205, 244, 213,
                         236, 280, 275, 273, 312, 311, 304, 415, 396, 488))

#fit models
model1 <- lm(price~beds, data=df)
model2 <- lm(price~baths, data=df)

#calculate log-likelihood value of each model
logLik(model1)

'log Lik.' -91.04219 (df=3)

logLik(model2)

'log Lik.' -111.7511 (df=3)

The first model has a higher log-likelihood value (-91.04) than the second model (-111.75), which means the first model offers a better fit to the data.

Cautions on Using Log-Likelihood Values

When calculating log-likelihood values, it’s important to note that adding more predictor variables to a model will almost always increase the log-likelihood value even if the additional predictor variables aren’t statistically significant.

This means you should only compare the log-likelihood values between two regression models if each model has the same number of predictor variables.

To compare models with different numbers of predictor variables, you can perform a to compare the goodness of fit of two nested regression models.

How to Interpret Log-Likelihood Values (With Examples)

Example: Interpreting Log-Likelihood Values

Cautions on Using Log-Likelihood Values

Requst a

Scale

Example: Interpreting Log-Likelihood Values

Cautions on Using Log-Likelihood Values

Related terms:

Requst a

Scale