What is Zero-truncated Negative Binomial Regression, and how is it used in Mplus data analysis?

What is Zero-truncated Negative Binomial Regression, and how is it used in Mplus data analysis?

Zero-truncated Negative Binomial Regression is a statistical technique used in Mplus data analysis to model count data with excess zeros. It is a type of regression analysis that is suitable for analyzing data with a high proportion of zero values, where the traditional Negative Binomial Regression may not be appropriate. This method takes into account the truncation of the data at zero, which means it only models the counts above zero. It is often used in social science research to analyze data such as number of absences, number of hospitalizations, or number of criminal offenses. By incorporating the excess zeros, this method provides a more accurate estimation of the relationship between the independent variables and the count outcome. It is particularly useful in analyzing data with overdispersion, where the variance is greater than the mean. In summary, Zero-truncated Negative Binomial Regression is a specialized statistical tool that allows for the analysis of count data with a high proportion of zeros, providing researchers with a more precise understanding of the relationships between variables in their data.

Zero-truncated Negative Binomial Regression | Mplus Data Analysis Examples

Version info: Code for this page was tested in Mplus version 6.12.

Zero-truncated negative binomial regression is used to model count data for which the value zero
cannot occur and when there is evidence of over dispersion . 

Please Note: The purpose of this page is to show how to use various data analysis commands.
It does not cover all aspects of the research process which researchers are expected to do. In
particular, it does not cover data cleaning and verification, verification of assumptions, model
diagnostics and potential follow-up analyses.

Examples of zero-truncated negative binomial

Example 1.

A study of the length of hospital stay, in days, as a function of age, kind of health insurance and whether
or not the patient died while in the hospital.
Length of hospital stay is recorded as a minimum of at least one day.

Example 2.

A study of the number of journal articles published by tenured faculty as a function of
discipline (fine arts, science, social science, humanities, medical,
etc). To get tenure faculty must publish, i.e., there are no tenured faculty with
zero publications.

Example 3.

A study by the county traffic court on the number of tickets received by teenagers
as predicted by school performance, amount of driver training and gender. Only individuals
who have received at least one citation are in the traffic court files.

Description of the data

Let’s pursue Example 1 from above.

We have a hypothetical data file available here with 1,493 observations.
The variable describing length of hospital visit is stay.
The variable age gives the age group from 1 to 9 which will be treated as
interval in this example.
The variables hmo and died are binary indicator variables for HMO
insured patients and patients who died while in hospital, respectively.

Let’s look at the data.

Data:
  File is C:ztnb.dat;
Variable:
  Names are 
     stay age hmo died;
  Missing are all (-9999) ; 
Analysis: 
  Type = basic ;
Plot: 
  Type = plot1;

     ESTIMATED SAMPLE STATISTICS


           Means
              STAY          AGE           HMO           DIED
              ________      ________      ________      ________
      1         9.729         5.234         0.160         0.343


           Covariances
              STAY          AGE           HMO           DIED
              ________      ________      ________      ________
 STAY          66.100
 AGE           -0.615         2.785
 HMO           -0.169        -0.006         0.134
 DIED          -0.447         0.121         0.000         0.225


           Correlations
              STAY          AGE           HMO           DIED
              ________      ________      ________      ________
 STAY           1.000
 AGE           -0.045         1.000
 HMO           -0.057        -0.010         1.000
 DIED          -0.116         0.152         0.000         1.000

	
Image ztnb_hist_stayImage ztnb_hist_ageImage ztnb_hist_hmoImage ztnb_hist_died

Analysis methods you might consider

Before we show how you can analyze these data with a zero-truncated negative binomial analysis, let’s
consider some other methods that you might use.

Zero-trunacated negative binomial regression 

In the syntax below, we have indicated that stay is a count
variable by using the count statement.  The (nbt) option is
used to indicate 2 things: that we are modeling our count variable with a
negative binomial distribution, and that we are specifying a zero-truncated model. 
Without the (t) option we would be estimating a negative
binomial model without
zero-truncation.  Also, we do not need a usevariables statement
because
we are using all of the variables in the data set in the current model. 
We have omitted the missing statement because we have no missing data in
this data set.  The default estimation method is MLR – maximum likelihood
parameter estimates with standard errors and a chi-square test statistic that
are robust to non-normality and non-independence of observations. The MLR standard errors
are computed using a sandwich estimator. This is what we generally call robust
standard errors.  To get the “regular” standard errors, we use the estimator
= ml
on the analysis statement.  (In the next example, we will
omit the analysis statement and obtain the robust standard errors.) 
Our regression equations is specified in the model statement: we are predicting
length of stay using age, hmo status and whether the
patient died. 

Data:
  File is C:ztnb.dat ;
Variable:
  Names = stay age hmo died;
  Count = stay(nbt); 
Model:
  stay on age hmo died;
    
MODEL FIT INFORMATION

Number of Free Parameters                        5

Loglikelihood

          H0 Value                       -4755.280
          H0 Scaling Correction Factor       1.156
            for MLR

Information Criteria

          Akaike (AIC)                    9520.559
          Bayesian (BIC)                  9547.102
          Sample-Size Adjusted BIC        9531.218
            (n* = (n + 2) / 24)



MODEL RESULTS

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 STAY       ON
    AGE               -0.016      0.013     -1.194      0.233
    HMO               -0.147      0.057     -2.571      0.010
    DIED              -0.218      0.053     -4.142      0.000

 Intercepts
    STAY               2.408      0.075     32.039      0.000

 Dispersion
    STAY               0.566      0.037     15.316      0.000

In the MODEL FIT INFORMATION portion of the output, you will find the log
likelihood for the final model as well as a number of fit statistics. In the
MODEL RESULTS section of the output you will find the negative binomial
regression coefficients (estimates) for each of the variables, standard errors
and the ratio of the estimate to its standard error.  This can be used as a
Z test, where values greater than 2 are considered to be statistically
significant.  We see that hmo and died but not age are
significant predictors of stay.  Thus, for example, for patients who
use HMO services compared to those who do not, the log count of days stayed is
about 0.147 less. 

Now let’s rerun the model without the analysis: estimator = ml statement in order to obtain robust standard errors.

Data:
  File is C:ztnb.dat ;
Variable:
  Names = stay age hmo died;
  Missing = all (-9999) ;
  Count = stay(nbt); 
Model:
  stay on age hmo died;
Analysis:
  estimator = ml;
    

MODEL RESULTS

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 STAY       ON
    AGE               -0.016      0.013     -1.197      0.231
    HMO               -0.147      0.059     -2.483      0.013
    DIED              -0.218      0.046     -4.718      0.000

 Intercepts
    STAY               2.408      0.072     33.457      0.000

 Dispersion
    STAY               0.566      0.031     18.132      0.000

Robust standard errors tend to be larger than “regular” standard errors,
though not always as we see for the variable age.  The results
changed very little when using regular standard errors.

Things to consider

References

 

 

Cite this article

stats writer (2024). What is Zero-truncated Negative Binomial Regression, and how is it used in Mplus data analysis?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/what-is-zero-truncated-negative-binomial-regression-and-how-is-it-used-in-mplus-data-analysis/

stats writer. "What is Zero-truncated Negative Binomial Regression, and how is it used in Mplus data analysis?." PSYCHOLOGICAL SCALES, 29 Jun. 2024, https://scales.arabpsychology.com/stats/what-is-zero-truncated-negative-binomial-regression-and-how-is-it-used-in-mplus-data-analysis/.

stats writer. "What is Zero-truncated Negative Binomial Regression, and how is it used in Mplus data analysis?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/what-is-zero-truncated-negative-binomial-regression-and-how-is-it-used-in-mplus-data-analysis/.

stats writer (2024) 'What is Zero-truncated Negative Binomial Regression, and how is it used in Mplus data analysis?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/what-is-zero-truncated-negative-binomial-regression-and-how-is-it-used-in-mplus-data-analysis/.

[1] stats writer, "What is Zero-truncated Negative Binomial Regression, and how is it used in Mplus data analysis?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, June, 2024.

stats writer. What is Zero-truncated Negative Binomial Regression, and how is it used in Mplus data analysis?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top