“How can I incorporate multiply imputed data sets into my analysis using SUDAAN?”

“How can I incorporate multiply imputed data sets into my analysis using SUDAAN?”

Using SUDAAN, a statistical software package designed for analyzing complex survey data, one can easily incorporate multiply imputed data sets into their analysis. This process involves combining the results from multiple imputations to obtain a single, valid estimate of the desired parameter. SUDAAN offers various options for handling imputed data sets, such as using the “MIANALYZE” procedure to combine estimators from each imputed data set. This allows for the proper adjustment of standard errors and confidence intervals to account for the uncertainty introduced by imputation. By incorporating multiply imputed data sets into their analysis, researchers can potentially improve the accuracy and reliability of their findings, particularly when dealing with missing data in large survey studies.

 
How can I use multiply imputed data sets in SUDAAN? | SUDAAN FAQ

One of the new features of SUDAAN 9 is its ability to use multiply imputed
data sets.  According the SUDAAN 9 Language Manual (page 91), SUDAAN does not accept the data sets stacked into a single data set as the SAS
proc
mi
creates.  Rather, SUDAAN accepts multiply imputed data sets in two
forms:  either the individually imputed data sets or a single data set with
variables in the same file with different numeric suffixes.  For our
examples below, we will use the NHANES III data.  The NHANES III multiply imputed data sets can be found at
http://www.cdc.gov/nchs/nhanes.htm about half way down
the page.  You need to have each of the imputed data sets sorted by strata
and PSU just as you have to have a non-imputed data set sorted.  You can
use the macro shown below (with minor changes for the names of the data sets) to
sort the data sets, or you can have multiple calls to proc sort.  Once the
data sets are sorted, you need to either add one more option, mi_count
to the proc statement, or a mi_file statement.  If the data
sets are sequentially numbers, such as nh3mi1, nh3mi2, etc., you can use the
mi_count
option and indicate the number of imputed data sets.  On the
data option, specify the first of the imputed data sets.  Remember
that the variables in each of the imputed data sets need to be in the same
order, of the same type, etc.  SUDAAN will issue a warning if the number of
cases differs between the imputed data sets.

NOTE:  The examples of the use of these options and statements in the
SUDAAN 9 Manual (page 91) show the use of quotes around a file path
specification.  This will work only for stand-alone SUDAAN, not the
SAS-callable version.

%MACRO srt(NUMBER);
PROC SORT DATA=nh3mi&NUMBER;
by sdpstra6 sdppsu6;
run;
%mend srt;

%srt(1);
%srt(2);
%srt(3);
%srt(4);
%srt(5);
proc descript data = NH3MI1 filetype = sas mi_count = 5 design = wr;
nest sdpstra6 sdppsu6 / missunit;
weight WTPFQX6 ;
var TCPMI;
setenv colwidth = 19;
setenv decwidth = 3;
print nsum wsum mean semean / nohead;
run;
Variance Estimation Method: Taylor Series (WR) Using Multiply Imputed Data
Results for Summary Over All Imputations
by: Variable, One.

------------------------------------------------------------
|                 |                  |
| Variable        |                  | One
|                 |                  | 1                   |
------------------------------------------------------------
|                 |                  |                     |
| Serum           | Sample Size      |           28012.000 |
| cholesterol     | Weighted Size    |       235771269.750 |
| (mg/dL)         | Mean             |             194.357 |
|                 | SE Mean          |               0.577 |
------------------------------------------------------------

In the example below, the mi_files statement is used instead of the
mi_count
option.  As before, the first of the imputed data sets is
listed on the data option on the proc statement.  The rest of the
files are listed on the mi_files statement.

proc regress data = nh3mi1 filetype = sas design = wr;
nest sdpstra6 sdppsu6 / missunit;
weight WTPFQX6 ;
class HAN6SRMI ;
model BMPWSTMI =  HAM5MI HAN6SRMI HSSEX;
mi_files  nh3mi2 nh3mi3 nh3mi4 nh3mi5 ;
run;
Frequencies and Values for CLASS Variables
Results for Summary Over All Imputations
by: Beer/wine/liquor (recode).

--------------------------------------
Beer/wine/l-
  iquor
  (recode)          Frequency    Value
--------------------------------------
Ordered
  Position:
  1                 11230.000        1
Ordered
  Position:
  2                  5546.600        2
Ordered
  Position:
  3                  3273.400        3
--------------------------------------
Variance Estimation Method: Taylor Series (WR) Using Multiply Imputed Data
SE Method: Robust (Binder, 1983)
Working Correlations: Independent
Link Function: Identity
Response variable BMPWSTMI: Waist circumference (cm)
Results for Summary Over All Imputations
by: Independent Variables and Effects.

-------------------------------------------------------------------------------------
Independent
  Variables and        Beta                      Lower 95%    Upper 95%
  Effects              Coeff.          SE Beta   Limit Beta   Limit Beta   T-Test B=0
-------------------------------------------------------------------------------------
Intercept                   44.05         4.54        34.90        53.20         9.71
How tall are you
  without shoes-
  inchs                      0.74         0.06         0.62         0.86        12.41
Beer/wine/liquor
  (recode)
  1                          4.23         0.38         3.46         4.99        11.13
  2                          0.91         0.41         0.07         1.75         2.20
  3                          0.00         0.00         0.00         0.00          .
Sex                         -2.84         0.49        -3.83        -1.84        -5.75
-------------------------------------------------------------------------------------
----------------------------------------
Independent            P-value
  Variables and        T-Test     DDF
  Effects              B=0        Beta
----------------------------------------
Intercept                0.0000   43.516
How tall are you
  without shoes-
  inchs                  0.0000   43.551
Beer/wine/liquor
  (recode)
  1                      0.0000   45.861
  2                      0.0342   37.224
  3                       .       49.000
Sex                      0.0000   43.755
----------------------------------------
Variance Estimation Method: Taylor Series (WR) Using Multiply Imputed Data
SE Method: Robust (Binder, 1983)
Working Correlations: Independent
Link Function: Identity
Response variable BMPWSTMI: Waist circumference (cm)
Results for Summary Over All Imputations
by: Contrast.

-------------------------------------------------------

Contrast               Degrees
                       of                      P-value
                       Freedom        Wald F   Wald F
-------------------------------------------------------
OVERALL MODEL                 5     40326.90     0.0000
MODEL MINUS
  INTERCEPT                   4       182.83     0.0000
INTERCEPT                     .          .        .
HAM5MI                        1       153.97     0.0000
HAN6SRMI                      2        62.84     0.0000
HSSEX                         1        33.04     0.0000
-------------------------------------------------------

In the two examples below, we show that you can use either method of
correcting the standard errors, strata/PSUs or replicate weights.

proc crosstabs data = NH3MI1 filetype = sas mi_count = 5 design = wr;
nest sdpstra6 sdppsu6 / missunit;
weight WTPFQX6 ;
subgroups  DMARETHN HAE7;
levels 2 2;
tables  DMARETHN*HAE7;
setenv colwidth = 12;
run;
Variance Estimation Method: Taylor Series (WR) Using Multiply Imputed Data
Results for Summary Over All Imputations
by: Race-ethnicity, Ever told had high cholesterol.

-----------------------------------------------------------------------------------
|                 |                  |
| Race-ethnicity  |                  | Ever told had high cholesterol
|                 |                  | Total        | 1            | 2            |
-----------------------------------------------------------------------------------
|                 |                  |              |              |              |
| Total           | Sample Size      |         7830 |         2548 |         5282 |
|                 | Weighted Size    |  94129643.85 |  31162209.08 |  62967434.77 |
|                 | Tot Percent      |       100.00 |        33.11 |        66.89 |
|                 | Col Percent      |       100.00 |       100.00 |       100.00 |
|                 | SE Col Percent   |         0.00 |         0.00 |         0.00 |
|                 | Row Percent      |       100.00 |        33.11 |        66.89 |
|                 | SE Row Percent   |         0.00 |         0.82 |         0.82 |
-----------------------------------------------------------------------------------
|                 |                  |              |              |              |
| 1               | Sample Size      |         5378 |         1856 |         3522 |
|                 | Weighted Size    |  84795202.29 |  28668926.11 |  56126276.18 |
|                 | Tot Percent      |        90.08 |        30.46 |        59.63 |
|                 | Col Percent      |        90.08 |        92.00 |        89.14 |
|                 | SE Col Percent   |         0.69 |         0.62 |         0.79 |
|                 | Row Percent      |       100.00 |        33.81 |        66.19 |
|                 | SE Row Percent   |         0.00 |         0.90 |         0.90 |
-----------------------------------------------------------------------------------
|                 |                  |              |              |              |
| 2               | Sample Size      |         2452 |          692 |         1760 |
|                 | Weighted Size    |   9334441.56 |   2493282.97 |   6841158.59 |
|                 | Tot Percent      |         9.92 |         2.65 |         7.27 |
|                 | Col Percent      |         9.92 |         8.00 |        10.86 |
|                 | SE Col Percent   |         0.69 |         0.62 |         0.79 |
|                 | Row Percent      |       100.00 |        26.71 |        73.29 |
|                 | SE Row Percent   |         0.00 |         1.00 |         1.00 |
-----------------------------------------------------------------------------------
proc crosstabs data = NH3MI1 filetype = sas mi_count = 5 design = brr;
repwgt WTPQRP1 - WTPQRP52 / adjfay = 1.7;
weight WTPFQX6 ;
subgroups  DMARETHN HAE7;
levels 2 2;
tables  DMARETHN*HAE7;
setenv colwidth = 12;
print nsum wsum totper colper secol rowper serow;
run;
Variance Estimation Method: BRR Using Multiply Imputed Data
Results for Summary Over All Imputations
by: Race-ethnicity, Ever told had high cholesterol.

-----------------------------------------------------------------------------------
|                 |                  |
| Race-ethnicity  |                  | Ever told had high cholesterol
|                 |                  | Total        | 1            | 2            |
-----------------------------------------------------------------------------------
|                 |                  |              |              |              |
| Total           | Sample Size      |         7830 |         2548 |         5282 |
|                 | Weighted Size    |  94129643.85 |  31162209.08 |  62967434.77 |
|                 | Tot Percent      |       100.00 |        33.11 |        66.89 |
|                 | Col Percent      |       100.00 |       100.00 |       100.00 |
|                 | SE Col Percent   |         0.00 |         0.00 |         0.00 |
|                 | Row Percent      |       100.00 |        33.11 |        66.89 |
|                 | SE Row Percent   |         0.00 |         0.64 |         0.64 |
-----------------------------------------------------------------------------------
|                 |                  |              |              |              |
| 1               | Sample Size      |         5378 |         1856 |         3522 |
|                 | Weighted Size    |  84795202.29 |  28668926.11 |  56126276.18 |
|                 | Tot Percent      |        90.08 |        30.46 |        59.63 |
|                 | Col Percent      |        90.08 |        92.00 |        89.14 |
|                 | SE Col Percent   |         0.23 |         0.35 |         0.29 |
|                 | Row Percent      |       100.00 |        33.81 |        66.19 |
|                 | SE Row Percent   |         0.00 |         0.70 |         0.70 |
-----------------------------------------------------------------------------------
|                 |                  |              |              |              |
| 2               | Sample Size      |         2452 |          692 |         1760 |
|                 | Weighted Size    |   9334441.56 |   2493282.97 |   6841158.59 |
|                 | Tot Percent      |         9.92 |         2.65 |         7.27 |
|                 | Col Percent      |         9.92 |         8.00 |        10.86 |
|                 | SE Col Percent   |         0.23 |         0.35 |         0.29 |
|                 | Row Percent      |       100.00 |        26.71 |        73.29 |
|                 | SE Row Percent   |         0.00 |         0.92 |         0.92 |
-----------------------------------------------------------------------------------

To illustrate the use of the multiple imputed variables in a single data
file, we will create a small example data set and then use the mi_vars
statement.

data temp;
input x x1 x2 x3 y;
cards;
1 1 1 1 7
3 3 3 3 8
. 2 1 3 5
. 1 5 4 8
4 4 4 4 9
6 6 6 6 7
. 7 5 4 9
;
run;

proc regress data = temp filetype = sas design = wr;
weight _one_;
nest _one_;
model y = x;
mi_vars x1 x2 x3;
run;

Cite this article

stats writer (2024). “How can I incorporate multiply imputed data sets into my analysis using SUDAAN?”. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-i-incorporate-multiply-imputed-data-sets-into-my-analysis-using-sudaan/

stats writer. "“How can I incorporate multiply imputed data sets into my analysis using SUDAAN?”." PSYCHOLOGICAL SCALES, 1 Jul. 2024, https://scales.arabpsychology.com/stats/how-can-i-incorporate-multiply-imputed-data-sets-into-my-analysis-using-sudaan/.

stats writer. "“How can I incorporate multiply imputed data sets into my analysis using SUDAAN?”." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-i-incorporate-multiply-imputed-data-sets-into-my-analysis-using-sudaan/.

stats writer (2024) '“How can I incorporate multiply imputed data sets into my analysis using SUDAAN?”', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-i-incorporate-multiply-imputed-data-sets-into-my-analysis-using-sudaan/.

[1] stats writer, "“How can I incorporate multiply imputed data sets into my analysis using SUDAAN?”," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, July, 2024.

stats writer. “How can I incorporate multiply imputed data sets into my analysis using SUDAAN?”. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top