Table of Contents
Robust regression is a technique used in linear regression when you suspect your data might have outliers or influential observations that could distort the results of ordinary least squares (OLS) regression.
Robust Regression | Stata Data Analysis Examples
Version info: Code for this page was tested in Stata 12.
Robust regression is an alternative to least squares
regression when data is contaminated with outliers or influential
observations and it can
also be used for the purpose of detecting influential observations.
Please note: The purpose of this page is to show how to use various data
analysis commands. It does not cover all aspects of the research process which
researchers are expected to do. In particular, it does not cover data
cleaning and checking, verification of assumptions, model diagnostics or
potential follow-up analyses.
Introduction
Let’s begin our discussion on robust regression with some terms in linear
regression.
Residual: The difference between the predicted value (based on
the regression equation) and the actual, observed value.
Outlier: In linear regression, an outlier is an observation with
large residual. In other words, it is an observation whose
dependent-variable value is unusual given its value on the predictor variables.
An outlier may indicate a sample peculiarity or may indicate a data entry error
or other problem.
Leverage: An observation with an extreme value on a predictor
variable is a point with high leverage. Leverage is a measure of how far
an independent variable deviates from its mean. High leverage points can
have a great amount of effect on the estimate of regression coefficients.
Influence: An observation is said to be influential if removing
the observation substantially changes the estimate of the regression coefficients.
Influence can be thought of as the product of leverage and outlierness.
Cook’s distance (or Cook’s D): A measure that combines the information
of leverage and residual of the observation.
Robust regression can be used in any situation in which you would use least
squares regression. When fitting a least squares regression, we
might find some outliers or high leverage data points. We have decided that
these data points are not data entry errors, neither they are from a
different population than most of our data. So we have no compelling reason to
exclude them from the analysis. Robust regression might be a good strategy
since it is
a compromise between excluding these points entirely from the analysis and
including all the data points and treating all them equally in OLS regression.
The idea of robust regression is to weigh the observations differently based on
how well behaved these observations are. Roughly speaking, it is a form of weighted
and reweighted least squares
regression.
Stata’s rreg command implements a version of robust regression. It
first runs the OLS regression, gets the Cook’s
D for each observation, and then drops any observation with Cook’s distance
greater than 1. Then iteration process begins in which weights are calculated based on
absolute residuals. The iterating stops when the maximum change between
the weights from one iteration to the next is below
tolerance. Two types of weights are used. In Huber weighting,
observations with small residuals get a weight of 1, the larger the residual,
the smaller the weight. With biweighting, all
cases with a non-zero residual get down-weighted at least a little. The two different kinds of weight are used because Huber weights can have
difficulties with severe outliers, and biweights can have difficulties
converging or may yield multiple solutions. Using the Huber weights first
helps to minimize problems with the biweights. You can see the iteration
history of both types of weights at the top of the robust regression output. Using the Stata defaults, robust regression is about 95% as efficient as OLS
(Hamilton, 1991).
In short, the most influential points are dropped, and then cases with large
absolute residuals are down-weighted.
Description of the data
For our data analysis below, we will use the crime data set. This dataset appears in
Statistical Methods for Social Sciences, Third Edition by Alan Agresti and
Barbara Finlay (Prentice Hall, 1997). The variables are state id (sid),
state name (state), violent crimes per 100,000 people (crime),
murders per 1,000,000 (murder), the percent of the population living in
metropolitan areas (pctmetro), the percent of the population that is
white (pctwhite), percent of population with a high school education or
above (pcths), percent of population living under poverty line (poverty),
and percent of population that are single parents (single). It has
51 observations. We are going to use poverty and single to predict
crime.
use https://stats.idre.ucla.edu/stat/stata/dae/crime, clearsummarize crime poverty single Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- crime | 51 612.8431 441.1003 82 2922 poverty | 51 14.25882 4.584242 8 26.4 single | 51 11.32549 2.121494 8.4 22.1
Robust regression analysis
In most cases, we begin by running an OLS regression and doing some
diagnostics. We will begin by running an OLS regression. The lvr2plot is used to create a graph showing the
leverage versus the squared residuals, and the mlabel option is used to
label the points on the graph with the two-letter abbreviation for each state.
regress crime poverty singleSource | SS df MS Number of obs = 51 -------------+------------------------------ F( 2, 48) = 57.96 Model | 6879872.44 2 3439936.22 Prob > F = 0.0000 Residual | 2848602.3 48 59345.8813 R-squared = 0.7072 -------------+------------------------------ Adj R-squared = 0.6950 Total | 9728474.75 50 194569.495 Root MSE = 243.61 ------------------------------------------------------------------------------ crime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- poverty | 6.787359 8.988529 0.76 0.454 -11.28529 24.86001 single | 166.3727 19.42291 8.57 0.000 127.3203 205.425 _cons | -1368.189 187.2052 -7.31 0.000 -1744.59 -991.7874 ------------------------------------------------------------------------------ lvr2plot, mlabel(state)
As we can see, DC, Florida and Mississippi have either high leverage or large residuals.
Let’s compute Cook’s D and display the observations that have relatively large
values of Cook’s D. To this end, we use the predict command with the
cooksd option to create a new variable called d1 containing the values of Cook’s D.
Another conventional cut-off
point is 4/n, where n is the number of observations in the
data set. We will use this criterion to select the values to display.
predict d1, cooksd
clist state crime poverty single d1 if d1>4/51, noobs
state crime poverty single d1
ak 761 9.1 14.3 .125475
fl 1206 17.8 10.6 .1425891
ms 434 24.7 14.7 .6138721
dc 2922 26.4 22.1 2.636252
Since DC has a Cook’s D larger than 1, rreg will assign a missing
weight to it so it will be excluded from the robust regression analysis. We
probably should drop DC to begin with since it is not even a state. We include
it in the analysis just to show that it has large Cook’s D and will be dropped
by rreg. Now we will look at the residuals. We will again use the predict
command, this time with the rstandard option. We will generate a new
variable called absr1, which is the absolute value of the standardized residuals
(because the sign of the residual doesn’t matter). The gsort
command is used to sort the data by descending order.
predict r1, rstandard
gen absr1 = abs(r1)
gsort -absr1
clist state absr1 in 1/10, noobs
state absr1
ms 3.56299
fl 2.902663
dc 2.616447
vt 1.742409
mt 1.460884
me 1.426741
ak 1.397418
nj 1.354149
il 1.338192
md 1.287087Now let’s run our robust regression and we will make use of the generate option to have Stata save the
final weights to a new variable which we call weight in the data set.
rreg crime poverty single, gen(weight)
Huber iteration 1: maximum difference in weights = .66846346
Huber iteration 2: maximum difference in weights = .11288069
Huber iteration 3: maximum difference in weights = .01810715
Biweight iteration 4: maximum difference in weights = .29167992
Biweight iteration 5: maximum difference in weights = .10354281
Biweight iteration 6: maximum difference in weights = .01421094
Biweight iteration 7: maximum difference in weights = .0033545
Robust regression Number of obs = 50
F( 2, 47) = 31.15
Prob > F = 0.0000
------------------------------------------------------------------------------
crime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
poverty | 10.36971 7.629288 1.36 0.181 -4.978432 25.71786
single | 142.6339 22.17042 6.43 0.000 98.03276 187.235
_cons | -1160.931 224.2564 -5.18 0.000 -1612.076 -709.7849
------------------------------------------------------------------------------Comparing the OLS regression and robust regression models, we can see that
the results are fairly different, especially with respect to the coefficients of
single. You will also notice that no R-squared, adjusted R-squared or root
MSE from rreg output.
Notice that the number of observations in the robust regression analysis is 50, instead of 51.
This is because observation for DC has been dropped since its Cook’s D is greater than 1. We can
also see that it is being dropped by looking at the final weight.
clist state weight if state =="dc", noobs
state weight
dc .Now let’s look at other observations with relatively small weight.
sort weight
clist sid state weight absr1 d1 in 1/10, noobs
sid state weight absr1 d1
25 ms .02638862 3.56299 .6138721
9 fl .11772218 2.902663 .1425891
46 vt .59144513 1.742409 .0427155
26 mt .66441582 1.460884 .016755
20 md .67960728 1.287087 .0356962
14 il .69124917 1.338192 .0126569
21 me .69766511 1.426741 .0223313
31 nj .74574796 1.354149 .0222918
19 ma .75392127 1.198541 .016399
5 ca .80179038 1.015206 .0123064
Roughly, as the residual goes down, the weight goes up. In other words,
cases with a large residuals tend to be down-weighted, and the values of Cook’s D
don’t closely correspond to the weights. This output shows us that the
observation for Mississippi will be down-weighted the most. Florida will
also be substantially down-weighted. In OLS regression, all
cases have a weight of 1. Hence, the more cases in the robust regression
that have a weight close to one, the closer the results of the OLS and robust
regressions. We can also visualize this relationship by graphing the data points with the weight information as
the size of circles.
twoway (scatter crime single [weight=weight], msymbol(oh)) if state !="dc"
Many post-estimation commands are available after running rreg, such
as test command and margins command. For example, we can get
the predicted values with respect to a set of values of variable single
holding poverty at its mean.
margins, at(single=(8(2)22)) vsquish
Predictive margins Number of obs = 50
Expression : Fitted values, predict()
1._at : single = 8
2._at : single = 10
3._at : single = 12
4._at : single = 14
5._at : single = 16
6._at : single = 18
7._at : single = 20
8._at : single = 22
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | 125.4825 74.88788 1.68 0.094 -21.29505 272.26
2 | 410.7503 38.20604 10.75 0.000 335.8678 485.6328
3 | 696.0181 35.2623 19.74 0.000 626.9053 765.1309
4 | 981.2859 70.42285 13.93 0.000 843.2596 1119.312
5 | 1266.554 112.2833 11.28 0.000 1046.482 1486.625
6 | 1551.821 155.5247 9.98 0.000 1246.999 1856.644
7 | 1837.089 199.25 9.22 0.000 1446.567 2227.612
8 | 2122.357 243.1982 8.73 0.000 1645.697 2599.017
------------------------------------------------------------------------------
This table shows that as the percent of single parents increases so does the
predicted crime rate.
Things to consider
See also
References
Cite this article
stats writer (2024). How to do a Robust Regression in Stata?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/what-does-the-word-rewrite-mean/
stats writer. "How to do a Robust Regression in Stata?." PSYCHOLOGICAL SCALES, 29 Jun. 2024, https://scales.arabpsychology.com/stats/what-does-the-word-rewrite-mean/.
stats writer. "How to do a Robust Regression in Stata?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/what-does-the-word-rewrite-mean/.
stats writer (2024) 'How to do a Robust Regression in Stata?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/what-does-the-word-rewrite-mean/.
[1] stats writer, "How to do a Robust Regression in Stata?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, June, 2024.
stats writer. How to do a Robust Regression in Stata?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.


