Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?

Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?

The discrepancy between the test of the overall survey regression model in Stata and the results from SAS and SUDAAN can be attributed to several factors. It is important to note that different statistical software programs may use different algorithms and methods to calculate the test results. This can lead to slight differences in the output, even though the same data and model are used. Additionally, Stata may use default settings that differ from those used in SAS and SUDAAN, which can also contribute to variations in the test results. Furthermore, the survey design and weights used in the analysis may not be identical in all three programs, which can affect the overall results. Therefore, it is important to carefully consider and understand the specific settings and procedures used in each software when interpreting and comparing test results.

Stata FAQ: Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?

Version info: Code for this page was tested in Stata 12.

NOTE:  We will use the NHANES II data as an example. 

The question

Let’s say that you ran an OLS regression model with survey data in Stata. 

use http://www.stata-press.com/data/r12/nhanes2.dta, clear

svyset psu [pw=finalwgt], strata(strata)

      pweight: finalwgt
          VCE: linearized
  Single unit: missing
     Strata 1: strata
         SU 1: psu
        FPC 1: <zero>

svy: regress weight height age female
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =        31                 Number of obs      =      10351
Number of PSUs     =        62                 Population size    =  117157513
                                               Design df          =         31
                                               F(   3,     29)    =    1177.18
                                               Prob > F           =     0.0000
                                               R-squared          =     0.2827

------------------------------------------------------------------------------
             |             Linearized
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      height |   .7405073    .027744    26.69   0.000     .6839229    .7970917
         age |   .1484546   .0116501    12.74   0.000      .124694    .1722153
      female |  -2.898197   .5888597    -4.92   0.000    -4.099184   -1.697209
       _cons |   -57.6088   4.955696   -11.62   0.000      -67.716   -47.50159
------------------------------------------------------------------------------

At the top of the output, you see the test of the overall regression model:
F(3, 29) = 1177.18, p < 0.0000. 

Next, you run the same model in SAS. 

proc surveyreg data = nhanes2;
cluster psu;
strata strata;
weight finalwgt;
model weight = height age female ;
run;
The SURVEYREG Procedure

Regression Analysis for Dependent Variable weight

            Data Summary

Number of Observations          10351
Sum of Weights              117157513
Weighted Mean of weight      71.90064
Weighted Sum of weight     8423699699

         Design Summary

Number of Strata              31
Number of Clusters            62

      Fit Statistics

R-square            0.2827
Root MSE           13.0725
Denominator DF          31

         Tests of Model Effects

Effect       Num DF    F Value    Pr > F

Model             3    1258.00    <.0001
Intercept         1     135.10    <.0001
height            1     712.19    <.0001
age               1     162.33    <.0001
female            1      24.22    <.0001

NOTE: The denominator degrees of freedom for the F tests is 31.

             Estimated Regression Coefficients

                             Standard
Parameter      Estimate         Error    t Value    Pr > |t|

Intercept    -57.608796    4.95641443     -11.62      <.0001
height         0.740507    0.02774807      26.69      <.0001
age            0.148455    0.01165183      12.74      <.0001
female        -2.898197    0.58894508      -4.92      <.0001

NOTE: The denominator degrees of freedom for the t tests is 31.

The results for the overall test of the regression model are reported as F(3,
31) = 1258.00, p < .0001.  Both the test statistic and denominator degrees
of freedom are different from your Stata output, so you decide to run the model
in SUDAAN.

proc regress data = nhanes2 filetype = sas design = wr;
weight finalwgt;
nest strata psu;
model weight = height age female;
run;
                                  S U D A A N
            Software for the Statistical Analysis of Correlated Data
          Copyright      Research Triangle Institute      October 2009
                                Release 10.0.1

DESIGN SUMMARY: Variances will be computed using the Taylor Linearization Method, Assuming a
With Replacement (WR) Design
    Sample Weight: FINALWGT
    Stratification Variables(s): STRATA
    Primary Sampling Unit: PSU

Number of observations read       :  10351    Weighted count:117157513
Observations used in the analysis :  10351    Weighted count:117157513
Denominator degrees of freedom    :     31

Maximum number of estimable parameters for the model is  4

File NHANES2 contains   62 Clusters
  62 clusters were used to fit the model
Maximum cluster size is 288 records
Minimum cluster size is  67 records

Weighted mean response is 71.900636

Multiple R-Square for the dependent variable WEIGHT: 0.282704
------------------------------------------------------------------------------------------------
Independent                                                                             P-value
  Variables and        Beta                      Lower 95%    Upper 95%                 T-Test
  Effects              Coeff.          SE Beta   Limit Beta   Limit Beta   T-Test B=0   B=0
------------------------------------------------------------------------------------------------
Intercept                  -57.61         4.96       -67.72       -47.50       -11.62     0.0000
HEIGHT                       0.74         0.03         0.68         0.80        26.69     0.0000
AGE                          0.15         0.01         0.12         0.17        12.74     0.0000
FEMALE                      -2.90         0.59        -4.10        -1.70        -4.92     0.0000
------------------------------------------------------------------------------------------------
-------------------------------------------------------

Contrast               Degrees
                       of                      P-value
                       Freedom        Wald F   Wald F
-------------------------------------------------------
OVERALL MODEL                 4     58649.64     0.0000
MODEL MINUS
  INTERCEPT                   3      1258.36     0.0000
INTERCEPT                     1       135.14     0.0000
HEIGHT                        1       712.39     0.0000
AGE                           1       162.38     0.0000
FEMALE                        1        24.22     0.0000
-------------------------------------------------------

The test of the overall model is F(3, 31) = 1258.36, p < 0.000.  The
test statistic is pretty close to the SAS output, and the denominator degrees of
freedom match the SAS output.  What is going on?

The answer

By default, Stata reports an adjusted Wald F test in the output, while SAS
and SUDAAN do not.  To have Stata match the results given by SAS and
SUDAAN, you can use the nosvyadjust option on the test command. 
(We use the test command with all of the predictor variables in the model
to recreate the test of the overall regression shown at the top of the Stata output.)

svy: regress weight height age female
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =        31                 Number of obs      =      10351
Number of PSUs     =        62                 Population size    =  117157513
                                               Design df          =         31
                                               F(   3,     29)    =    1177.18
                                               Prob > F           =     0.0000
                                               R-squared          =     0.2827

------------------------------------------------------------------------------
             |             Linearized
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      height |   .7405073    .027744    26.69   0.000     .6839229    .7970917
         age |   .1484546   .0116501    12.74   0.000      .124694    .1722153
      female |  -2.898197   .5888597    -4.92   0.000    -4.099184   -1.697209
       _cons |   -57.6088   4.955696   -11.62   0.000      -67.716   -47.50159
------------------------------------------------------------------------------

test height age female

Adjusted Wald test

 ( 1)  height = 0
 ( 2)  age = 0
 ( 3)  female = 0

       F(  3,    29) = 1177.18
            Prob > F =    0.0000

The output from regress and test match.

test height age female, nosvyadjust

Unadjusted Wald test

 ( 1)  height = 0
 ( 2)  age = 0
 ( 3)  female = 0

       F(  3,    31) = 1258.36
            Prob > F =    0.0000

The output from test, nosvyadjust is different from the above Stata
output but match the SAS and SUDAAN output.  Alternatively, you could use
the adjwaldf and adjwaldp options on the print command in
SUDAAN to reproduce the results given by default by Stata.

The “why” and the degrees of freedom

A discussion of the adjusted Wald test is given on page 2184 of the Stata 12
Reference Guide (in the section for the -test- command).  This cites the
1990 American Statistician article by Edward Korn and Barry Graubard entitled
“Simultaneous testing of regression coefficients with complex survey data: 
Use of Bonferroni t statistics”.  Basically, they argue that this
test statistic is more appropriate when you have more than a few terms being
tested simultaneously (in other words, more predictors in the model.)  
The test statistic (what the authors call the Wald procedure) has numerator
degrees of freedom a p, the number of predictors (excluding the intercept), and
denominator degrees of freedom # of PSUs – # of strata – p + 1.  In the
example above, we have 62 PSUs, 31 strata and 3 predictors.  Hence, the
denominator degrees of freedom are calculated as 62 – 31- 3 + 1 = 29.  In
SAS and SUDAAN, you see notes indicating that the denominator degrees of freedom
equals 31, which is simply 62 – 31 = 31. 

References

Korn, E. and Graubard, B.  (1990). Simultaneous testing of regression
coefficients with complex survey data:  Use of Bonferroni t
statistics.  American Statistician, Vol. 4, No. 4, pages 270-276.

Stata 12 Base Reference Manual. College Station, TX: Stata Press.

Cite this article

stats writer (2024). Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/why-doesnt-the-test-of-the-overall-survey-regression-model-in-stata-match-the-results-from-sas-and-sudaan/

stats writer. "Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?." PSYCHOLOGICAL SCALES, 1 Jul. 2024, https://scales.arabpsychology.com/stats/why-doesnt-the-test-of-the-overall-survey-regression-model-in-stata-match-the-results-from-sas-and-sudaan/.

stats writer. "Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/why-doesnt-the-test-of-the-overall-survey-regression-model-in-stata-match-the-results-from-sas-and-sudaan/.

stats writer (2024) 'Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/why-doesnt-the-test-of-the-overall-survey-regression-model-in-stata-match-the-results-from-sas-and-sudaan/.

[1] stats writer, "Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, July, 2024.

stats writer. Why doesn’t the test of the overall survey regression model in Stata match the results from SAS and SUDAAN?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top