The discrepancy between the test of the overall survey regression model in Stata and the results from SAS and SUDAAN can be attributed to several factors. It is important to note that different statistical software programs may use different algorithms and methods to calculate the test results. This can lead to slight differences in the output, even though the same data and model are used. Additionally, Stata may use default settings that differ from those used in SAS and SUDAAN, which can also contribute to variations in the test results. Furthermore, the survey design and weights used in the analysis may not be identical in all three programs, which can affect the overall results. Therefore, it is important to carefully consider and understand the specific settings and procedures used in each software when interpreting and comparing test results.
Stata FAQ: Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?
Version info: Code for this page was tested in Stata 12.
NOTE: We will use the NHANES II data as an example.
The question
Let’s say that you ran an OLS regression model with survey data in Stata.
use http://www.stata-press.com/data/r12/nhanes2.dta, clear
svyset psu [pw=finalwgt], strata(strata)
pweight: finalwgt
VCE: linearized
Single unit: missing
Strata 1: strata
SU 1: psu
FPC 1: <zero>
svy: regress weight height age female
(running regress on estimation sample)
Survey: Linear regression
Number of strata = 31 Number of obs = 10351
Number of PSUs = 62 Population size = 117157513
Design df = 31
F( 3, 29) = 1177.18
Prob > F = 0.0000
R-squared = 0.2827
------------------------------------------------------------------------------
| Linearized
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
height | .7405073 .027744 26.69 0.000 .6839229 .7970917
age | .1484546 .0116501 12.74 0.000 .124694 .1722153
female | -2.898197 .5888597 -4.92 0.000 -4.099184 -1.697209
_cons | -57.6088 4.955696 -11.62 0.000 -67.716 -47.50159
------------------------------------------------------------------------------At the top of the output, you see the test of the overall regression model:
F(3, 29) = 1177.18, p < 0.0000.
Next, you run the same model in SAS.
proc surveyreg data = nhanes2; cluster psu; strata strata; weight finalwgt; model weight = height age female ; run;The SURVEYREG Procedure Regression Analysis for Dependent Variable weight Data Summary Number of Observations 10351 Sum of Weights 117157513 Weighted Mean of weight 71.90064 Weighted Sum of weight 8423699699 Design Summary Number of Strata 31 Number of Clusters 62 Fit Statistics R-square 0.2827 Root MSE 13.0725 Denominator DF 31 Tests of Model Effects Effect Num DF F Value Pr > F Model 3 1258.00 <.0001 Intercept 1 135.10 <.0001 height 1 712.19 <.0001 age 1 162.33 <.0001 female 1 24.22 <.0001 NOTE: The denominator degrees of freedom for the F tests is 31. Estimated Regression Coefficients Standard Parameter Estimate Error t Value Pr > |t| Intercept -57.608796 4.95641443 -11.62 <.0001 height 0.740507 0.02774807 26.69 <.0001 age 0.148455 0.01165183 12.74 <.0001 female -2.898197 0.58894508 -4.92 <.0001 NOTE: The denominator degrees of freedom for the t tests is 31.
The results for the overall test of the regression model are reported as F(3,
31) = 1258.00, p < .0001. Both the test statistic and denominator degrees
of freedom are different from your Stata output, so you decide to run the model
in SUDAAN.
proc regress data = nhanes2 filetype = sas design = wr; weight finalwgt; nest strata psu; model weight = height age female; run;S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute October 2009 Release 10.0.1 DESIGN SUMMARY: Variances will be computed using the Taylor Linearization Method, Assuming a With Replacement (WR) Design Sample Weight: FINALWGT Stratification Variables(s): STRATA Primary Sampling Unit: PSU Number of observations read : 10351 Weighted count:117157513 Observations used in the analysis : 10351 Weighted count:117157513 Denominator degrees of freedom : 31 Maximum number of estimable parameters for the model is 4 File NHANES2 contains 62 Clusters 62 clusters were used to fit the model Maximum cluster size is 288 records Minimum cluster size is 67 records Weighted mean response is 71.900636 Multiple R-Square for the dependent variable WEIGHT: 0.282704------------------------------------------------------------------------------------------------ Independent P-value Variables and Beta Lower 95% Upper 95% T-Test Effects Coeff. SE Beta Limit Beta Limit Beta T-Test B=0 B=0 ------------------------------------------------------------------------------------------------ Intercept -57.61 4.96 -67.72 -47.50 -11.62 0.0000 HEIGHT 0.74 0.03 0.68 0.80 26.69 0.0000 AGE 0.15 0.01 0.12 0.17 12.74 0.0000 FEMALE -2.90 0.59 -4.10 -1.70 -4.92 0.0000 ------------------------------------------------------------------------------------------------------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 4 58649.64 0.0000 MODEL MINUS INTERCEPT 3 1258.36 0.0000 INTERCEPT 1 135.14 0.0000 HEIGHT 1 712.39 0.0000 AGE 1 162.38 0.0000 FEMALE 1 24.22 0.0000 -------------------------------------------------------
The test of the overall model is F(3, 31) = 1258.36, p < 0.000. The
test statistic is pretty close to the SAS output, and the denominator degrees of
freedom match the SAS output. What is going on?
The answer
By default, Stata reports an adjusted Wald F test in the output, while SAS
and SUDAAN do not. To have Stata match the results given by SAS and
SUDAAN, you can use the nosvyadjust option on the test command.
(We use the test command with all of the predictor variables in the model
to recreate the test of the overall regression shown at the top of the Stata output.)
svy: regress weight height age female
(running regress on estimation sample)
Survey: Linear regression
Number of strata = 31 Number of obs = 10351
Number of PSUs = 62 Population size = 117157513
Design df = 31
F( 3, 29) = 1177.18
Prob > F = 0.0000
R-squared = 0.2827
------------------------------------------------------------------------------
| Linearized
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
height | .7405073 .027744 26.69 0.000 .6839229 .7970917
age | .1484546 .0116501 12.74 0.000 .124694 .1722153
female | -2.898197 .5888597 -4.92 0.000 -4.099184 -1.697209
_cons | -57.6088 4.955696 -11.62 0.000 -67.716 -47.50159
------------------------------------------------------------------------------
test height age female
Adjusted Wald test
( 1) height = 0
( 2) age = 0
( 3) female = 0
F( 3, 29) = 1177.18
Prob > F = 0.0000The output from regress and test match.
test height age female, nosvyadjust
Unadjusted Wald test
( 1) height = 0
( 2) age = 0
( 3) female = 0
F( 3, 31) = 1258.36
Prob > F = 0.0000The output from test, nosvyadjust is different from the above Stata
output but match the SAS and SUDAAN output. Alternatively, you could use
the adjwaldf and adjwaldp options on the print command in
SUDAAN to reproduce the results given by default by Stata.
The “why” and the degrees of freedom
A discussion of the adjusted Wald test is given on page 2184 of the Stata 12
Reference Guide (in the section for the -test- command). This cites the
1990 American Statistician article by Edward Korn and Barry Graubard entitled
“Simultaneous testing of regression coefficients with complex survey data:
Use of Bonferroni t statistics”. Basically, they argue that this
test statistic is more appropriate when you have more than a few terms being
tested simultaneously (in other words, more predictors in the model.)
The test statistic (what the authors call the Wald procedure) has numerator
degrees of freedom a p, the number of predictors (excluding the intercept), and
denominator degrees of freedom # of PSUs – # of strata – p + 1. In the
example above, we have 62 PSUs, 31 strata and 3 predictors. Hence, the
denominator degrees of freedom are calculated as 62 – 31- 3 + 1 = 29. In
SAS and SUDAAN, you see notes indicating that the denominator degrees of freedom
equals 31, which is simply 62 – 31 = 31.
References
Korn, E. and Graubard, B. (1990). Simultaneous testing of regression
coefficients with complex survey data: Use of Bonferroni t
statistics. American Statistician, Vol. 4, No. 4, pages 270-276.
Stata 12 Base Reference Manual. College Station, TX: Stata Press.
Cite this article
stats writer (2024). Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/why-doesnt-the-test-of-the-overall-survey-regression-model-in-stata-match-the-results-from-sas-and-sudaan/
stats writer. "Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?." PSYCHOLOGICAL SCALES, 1 Jul. 2024, https://scales.arabpsychology.com/stats/why-doesnt-the-test-of-the-overall-survey-regression-model-in-stata-match-the-results-from-sas-and-sudaan/.
stats writer. "Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/why-doesnt-the-test-of-the-overall-survey-regression-model-in-stata-match-the-results-from-sas-and-sudaan/.
stats writer (2024) 'Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/why-doesnt-the-test-of-the-overall-survey-regression-model-in-stata-match-the-results-from-sas-and-sudaan/.
[1] stats writer, "Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, July, 2024.
stats writer. Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.
