Table of Contents
In order to fit a random intercept or mixed effects model with heteroskedastic errors in Stata, one must utilize the “xtmixed” command. This command allows for the inclusion of random effects and heteroskedastic errors in the model, which accounts for the varying levels of variability in the data. Additionally, the “robust” option can be added to the command in order to adjust for heteroskedasticity. By using the “xtmixed” command and specifying the appropriate options, one can effectively fit a random intercept or mixed effects model with heteroskedastic errors in Stata.
How can I fit a random intercept or mixed effects model with heteroskedastic errors in Stata? | Stata FAQ
It is common to fit a model where a variable (or variables) has an effect on the
expected mean. You may also want to fit a model where a variable has an effect
on the variance, that is a model with heteroskedastic errors.
One example is a hierarchical model where the error variance is not constant
over some categorical variable (e.g., gender). Another example is a longitudinal model where you might want to allow the errors
to vary across
time. This page describes how
to fit such a model and demonstrates with an example.
Example 1: Heteroskedastic errors by group
It may be useful to note that a multi-level model (or a
single level model) that allows different error terms between groups is similar
to a multi-group structural equation model in which all parameters in the model
are constrained to equality except for the error terms.
The dataset for this example includes data on 7185 students in 160
school. This dataset and model come from the HLM manual. The variable mathach is a measure of each student’s
math achievement, the variable female is a binary variable equal to one if the student is
female and zero otherwise, and the variable id contains the school
identifier.
Below we run a model that uses female to predict mathach. This
model assumes equal error variances for males and females.
use https://stats.idre.ucla.edu/stat/stata/faq/hsb, clear
mixed mathach female || id:, var mle nolog
Mixed-effects ML regression Number of obs = 7185
Group variable: id Number of groups = 160
Obs per group: min = 14
avg = 44.9
max = 67
Wald chi2(1) = 62.89
Log likelihood = -23526.66 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
mathach | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -1.35939 .1714111 -7.93 0.000 -1.69535 -1.02343
_cons | 13.34526 .253935 52.55 0.000 12.84756 13.84297
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | 8.108982 1.018274 6.339834 10.37181
-----------------------------+------------------------------------------------
var(Residual) | 38.84481 .6555316 37.58101 40.15112
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 936.66 Prob >= chibar2 = 0.0000In order to fit a model with heteroskedastic errors, we must modify the above model. The model shown above can be written as:
mathach_ij = b0 + b1*female_ij + u_i + e_ij (1)
This model assumes:
e_ij ~ N(0,s2) u_i ~ N(0,t2)
Where e_ij is the level 1 errors (i.e., residuals), and u_i is the random intercept across classrooms (i.e., the level 2
variance). We want to allow
the variance of e_ij (i.e., s2) to be different
for male and female students. One way to think about this would be to replace the single
e_ij
term, with
separate terms for males and females, in equation form:
e_ij = e(m)_ij*male + e(f)_ij*female (2)
Where male is a dummy variable equal to one if the student is male and
equal to zero otherwise, and e(m)_ij is the error term for males. The
variable female and the term e(f)_ij are the same except that they
are for females. Another way to write this is:
e_ij ~ N(0,s2_1) males
N(0,s2_2) femalesAnother way to think about this is to use one
group as the reference group, and estimate the difference between the variance
of the two groups. In equation form this would be:
e_ij = e(m)_ij + e(f)_ij*female (3)
Where e(m)_ij is the error term for males, and e(f)_ij is the
difference between the errors for males and females. This can also be written:
e_ij ~ N(0,s2 + s2_2*female)
This second way of looking
at the error structure is part of how we will restructure our model to allow for
heteroskedastic error terms.
Equation 3 says that the residual variance is a function of gender. So we can
write it as r_ijk, where i is the group, j is the subject, and k is the
gender.
mathach_ij = b0 + b1*female + u_i + r_ijk (4)
To model this, we add a level to our model. Now
the third level will be classrooms (previously level 2), the second level will be
students (previously level 1), and level 1 will be a single case within each
student.
The only random effect at level 1 is gender (even the intercept is fixed). The new model can be written as:
mathach_ij = b0 + b1*female_ij + u_i + e_ij*female + r_ij0
r_ijk = r_ij0 male
r_ij1 female=e_ij+r_ij0Where the level one error
is represented by r_ij0. Because males are the omitted category in the
random portion of the model, the variance of r_ij0 is the variance of the
errors males (i.e., female=0). This is similar to the interpretation of
the intercept in the fixed portion of the mode, where b0 represents the
intercept for males, since they are the omitted category. The difference between
the error terms for males and females is e_ij so the error term for females is
r_ij + e1_ij.
A final consideration before we can actually fit the model is
that Stata restricts the estimates of the error variances to be greater than zero. As a result, we must use the
group with the smallest variance as the omitted category, since this will result in the values of the error variances for the
group(s) being positive. To establish which group has the lowest variance
you can examine the residuals from a model without heteroskedastic errors. In this example,
examination of the residuals shows that the residual variance is smaller for females than
males.
The code below creates two new variables, male, and nid. The variable nid
will be the identifier for the students.
recode female (0=1) (1=0), gen(male) gen nid = _n
The resulting data set should look something like the one shown below. Where each classroom
(identified by the variable id) has multiple rows in the dataset, but each case
(i.e., each student, identified by the variable nid) has only one row.
id nid
1224 1
1224 2
1224 3
...
1288 48
1288 49
1288 50
...
1296 73
1296 74
1296 75Once the necessary variables are created, we can run the model as shown
below, which allows for a difference in the variance of the errors for males and
females. It is necessary to specify the nocons
option suppresses the random intercept at level 2, so that the only random
effect at level 2 is gender (i.e., male).
mixed mathach female || id: || nid: male, nocons var mle nolog
Mixed-effects ML regression Number of obs = 7185
-----------------------------------------------------------
| No. of Observations per Group
Group Variable | Groups Minimum Average Maximum
----------------+------------------------------------------
id | 160 14 44.9 67
nid | 7185 1 1.0 1
-----------------------------------------------------------
Wald chi2(1) = 63.03
Log likelihood = -23522.932 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
mathach | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -1.363969 .1718025 -7.94 0.000 -1.700695 -1.027242
_cons | 13.34707 .2548619 52.37 0.000 12.84755 13.84659
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | 8.090527 1.016457 6.324639 10.34946
-----------------------------+------------------------------------------------
nid: Identity |
var(male) | 3.622413 1.333432 1.76062 7.452988
-----------------------------+------------------------------------------------
var(Residual) | 37.13827 .8657943 35.47953 38.87456
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(2) = 944.12 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.The second table in the output shows the estimated random effects. The residual variance for females is equal to var(Residuals) = 37.138,
while the variance for males is var(Residuals) + var(male) =
37.1383 + 3.622 = 40.7607. Since the 95% confidence interval for var(male)
does not include zero, we can say that the difference between the variances is
statistically significant at the p<0.05 level.
Traditionally, and in some other packages (e.g., HLM), the variances in models with heteroskedastic errors have been parameterized as:
s2_ij = exp(a_0 + a_1*x_ij)
Where s2_ij is the variance of the residuals, x_ij is a
dichotomous variable equal to one for one group and zero for the other, and a_0 and a_1 are estimated parameters for the
variance. a_0 is the variance in the group where x_ij
= 0, and a_1 is the difference in the variances between the two groups.
If you want to compare the results of your analysis in Stata to those from another
package, it is possible to translate between the two. As a first step, let’s
think about what Stata is estimating a little more. One way to write what Stata models
is:
s2_ij = var(Residuals) + var(x_ij)*x_ij
Where var(Residuals) is the variance of the level 1 errors, and
var(x_ij) is the random effect of the dummy variable x_ij. Because Stata models the natural log of the standard deviation of the error term,
the above is visually clear, but not quite correct. The actual model is:
s2_ij = exp(2*lnsd_0 ) + exp(2*lnsd_1)*x_ij
Where lnsd_0 is the natural log of the standard deviation of the level
1 errors, and lnsd_1 is the natural log of the standard deviation of
the level 2 random effect. We can convert the estimates Stata gives us to a_0
and a_1 using the formulas below (each formula is written twice, the
first for visual clarity, the second to reflect what is actually modeled by
Stata):
a_0 = ln(var(Residuals)) a_0 = ln(exp(2*lnsd_0)) a_1 = ln(var(Residuals)+var(x_ij)) - ln(var(Residuals)) a_1 = ln[exp(2*lnsd_0) + exp(2*lnsd_1)] - ln[exp(2*lnsd_0)]
We can use the display command to calculate these values. Stata stores the ln(sd) for the
level 1 residuals as [lnsig_e]_cons, and the ln(sd) of the random effect
of male in [lns2_1_1]_cons. Below we use these values to calculate a_0 and a_1.
di "a_0=" ln(exp(2 * [lnsig_e]_cons)) a_0=3.6147746 di "a_1=" ln(exp(2 * [lnsig_e]_cons)+exp(2 * [lns2_1_1]_cons)) - ln(exp(2 * [lnsig_e]_cons)) a_1=.09308814
Example 2: A growth curve model
When you run a growth curve model in the structural equation modeling
framework, it is not uncommon to allow the error variance to be different across
time points, or at least to test whether such differences exist. A longitudinal
model that allows different error variances across time points is similar to a
growth model in the structural equation modeling setting, where all parameters
except for the error terms across time points are constrained to equality. Below we show
how to allow for different error variances across time points, and how to test
whether the error variances are significantly different from each other.
The example dataset contains data on 239 subjects. The dataset includes
observations at up to five time points per subject, for a total of 1079 observations.
The variable id uniquely identifies subjects. The time at which an
observation was taken is indicated by the variable time, which takes on
the values 0 to 4.
Below we use mixed to fit a model where
the variables x and time
predict the variable attit. This model assumes the variance is the same
across time points.
use https://stats.idre.ucla.edu/stat/stata/faq/nys2, clear
mixed attit x time ||id:, var mle nolog
Mixed-effects ML regression Number of obs = 1079
Group variable: id Number of groups = 239
Obs per group: min = 1
avg = 4.5
max = 5
Wald chi2(2) = 324.65
Log likelihood = 140.31043 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
attit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .0241245 .0033107 7.29 0.000 .0176357 .0306134
time | .0600884 .0039557 15.19 0.000 .0523353 .0678415
_cons | .1191882 .0184893 6.45 0.000 .0829498 .1554267
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | .0308536 .003533 .0246511 .0386167
-----------------------------+------------------------------------------------
var(Residual) | .0311131 .0015204 .0282714 .0342405
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 324.66 Prob >= chibar2 = 0.0000The model above can be written as:
attit_it = b0 + b1*x_it + b2*time_it + u_i + e_it (4)
It is assumed that:
e_it ~ N(0,s2) u_i ~ N(0,t2)
Where u_i is the random intercept across individuals (i.e., the level 2
variance), and e_it represents the level 1 errors (i.e., residuals). We want to allow
the variance of e_it to be different
at each time point. One way to think about this would be to replace the single
e_ij
term, with
separate terms for each time point, in equation form:
e_it = e_i0*t0 + e_i1*t1 + e_i2*t2 + e_i3*t3 + e_i4*t4
Where t0 is a dummy variable equal to one if time = 0 (i.e. the
first time point) and equal to zero otherwise, and e_i0 is the error for
the first measurement occasion (time=0). The variables t1 to t4, are dummy variables for time
points 1 to 4. The terms
e_i1 to e_i4 are the error terms for time points 1 to 4.
In order to
model the heteroskedastic errors, we add a third level to our model. In this new model,
the third level will be individuals (previously level 2), the second level will be time
points (previously level 1), and level 1 will be a single case within each time point.
Since the effect of time is in the level at model 2, only random effects for
time are included at level 1. The new model can be written as:
attit_itk = b0 + b1*x_it + b2*time_it + e_i1*t1 + e_i2*t2 + e_i3*t3 + e_i4*t4 + r_ij0
Where the level one error
is represented by r_it0. Because the time=0 is the omitted category, the variance of
r_it0 is the variance of the
errors at time=0. The
random effects e_i1 to e_i4 represent the difference between the variance
of the errors and the variance of the errors at time=0 for each time
point.
A final consideration before we can actually fit the model is
that Stata restricts the estimates of the error variances to be greater than zero. As a result, we must use the time point (or other
category/group) with the smallest variance as the omitted category, since this will result in the values of the error variances for the other time points
being positive. To establish which time point or group has the lowest variance
you can examine the residuals from a model without heteroskedastic errors. In
this example, the residual variance is smallest when time=0.
In order to fit the model we will need dummy variables for each of the other
time points.
gen t1 = (time==1) gen t2 = (time==2) gen t3 = (time==3) gen t4 = (time==4)
Now we are ready to fit our model using mixed. The nocons option
suppresses the random intercept at level 2.
mixed attit x time ||id: ||time: t1 t2 t3 t4, nocons var mle nolog
Mixed-effects ML regression Number of obs = 1079
-----------------------------------------------------------
| No. of Observations per Group
Group Variable | Groups Minimum Average Maximum
----------------+------------------------------------------
id | 239 1 4.5 5
time | 1079 1 1.0 1
-----------------------------------------------------------
Wald chi2(2) = 319.37
Log likelihood = 143.67787 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
attit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .0238029 .0033059 7.20 0.000 .0173235 .0302823
time | .0597567 .0039638 15.08 0.000 .0519878 .0675256
_cons | .1208 .0178047 6.78 0.000 .0859035 .1556965
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | .0280981 .0034551 .0220805 .0357557
-----------------------------+------------------------------------------------
time: Independent |
var(t1) | .0027798 .0046901 .0001018 .0758888
var(t2) | .0058393 .0053178 .0009799 .0347972
var(t3) | .0125841 .0060606 .0048964 .0323421
var(t4) | .01354 .0059784 .0056988 .0321702
-----------------------------+------------------------------------------------
var(Residual) | .0247609 .0033388 .0190103 .032251
------------------------------------------------------------------------------
LR test vs. linear regression: chi2(5) = 331.40 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
Once we have run the model, we may want to test whether the estimated
variances are different either from zero or different from each other. Below nlcom is used to
get the estimated variance at each time point.
nlcom (time0: exp(2 * [lnsig_e]_cons)) ///
(time1: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_1]_cons)) ///
(time2: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_2]_cons)) ///
(time3: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_3]_cons)) ///
(time4: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_4]_cons))
time0: exp(2 * [lnsig_e]_cons)
time1: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_1]_cons)
time2: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_2]_cons)
time3: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_3]_cons)
time4: exp(2 * [lnsig_e]_cons) + exp(2 * [lns2_1_4]_cons)
------------------------------------------------------------------------------
attit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time0 | .0247609 .0033388 7.42 0.000 .018217 .0313048
time1 | .0275406 .0034705 7.94 0.000 .0207385 .0343427
time2 | .0306002 .0035976 8.51 0.000 .0235491 .0376513
time3 | .037345 .0044345 8.42 0.000 .0286536 .0460364
time4 | .0383009 .0044824 8.54 0.000 .0295155 .0470862
------------------------------------------------------------------------------We can also use nlcom to estimate the difference in the variance between time points,
in this case, the difference between the variance at time = 1 and time = 4.
nlcom (t4_t1: exp(2 * [lns2_1_4]_cons) - exp(2 * [lns2_1_1]_cons))
t4_t1: exp(2 * [lns2_1_4]_cons) - exp(2 * [lns2_1_1]_cons)
------------------------------------------------------------------------------
attit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
t4_t1 | .0107602 .0060541 1.78 0.076 -.0011055 .0226259
------------------------------------------------------------------------------
References
Gutierrez, Roberto G. (2008) Tricks of the Trade: Getting the most out of
xtmixed. 2008 Fall North American Stata Users Group Meeting, San Francisco, CA.
Raudenbush, Stephen, Anthony Bryk, Yuk Fai Cheong, and Richard Congdon
(2001) HLM 5: Hierarchical Linear and Nonlinear Modeling. Scientific
Software International. Lincolnwood, IL.
Cite this article
stats writer (2024). How can I fit a random intercept or mixed effects model with heteroskedastic errors in Stata?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-i-fit-a-random-intercept-or-mixed-effects-model-with-heteroskedastic-errors-in-stata/
stats writer. "How can I fit a random intercept or mixed effects model with heteroskedastic errors in Stata?." PSYCHOLOGICAL SCALES, 1 Jul. 2024, https://scales.arabpsychology.com/stats/how-can-i-fit-a-random-intercept-or-mixed-effects-model-with-heteroskedastic-errors-in-stata/.
stats writer. "How can I fit a random intercept or mixed effects model with heteroskedastic errors in Stata?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-i-fit-a-random-intercept-or-mixed-effects-model-with-heteroskedastic-errors-in-stata/.
stats writer (2024) 'How can I fit a random intercept or mixed effects model with heteroskedastic errors in Stata?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-i-fit-a-random-intercept-or-mixed-effects-model-with-heteroskedastic-errors-in-stata/.
[1] stats writer, "How can I fit a random intercept or mixed effects model with heteroskedastic errors in Stata?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, July, 2024.
stats writer. How can I fit a random intercept or mixed effects model with heteroskedastic errors in Stata?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.
