Table of Contents
The process of incorporating categorical independent variables into regression analyses using SUDAAN involves first identifying the categorical variables in the dataset. These variables should then be coded into numerical values in order to be used in the regression analysis. Next, the SUDAAN software should be utilized to perform a weighted analysis, taking into account the complex survey design and potential sampling errors. The categorical variables can then be included in the regression model as dummy variables to examine their effects on the dependent variable. This method ensures accurate and reliable results by properly accounting for the survey design and potential biases.
How can I use categorical independent variables in regression analyses in SUDAAN? | SUDAAN FAQ
Using the class statement
The class statement is available in SUDAAN 9. You use it in the
same way that you use the class statement works in SAS: you list
categorical variables on this statement so that those variables are not treated
as continuous variables by the program. Dummy variables (0/1 variables) do
not need to be listed on the class statement. If you include
srsex on the class statement, the results will exactly match those obtained
using the subgroup and levels statements. In this example, srsex is coded 1 = male and 2 = female, and
racehpra is coded 1 = Latino, 2 = Pacific Islander, 2 = AIAN, 4 = Asian,
5 = African American, 6 = White and 7 = Other.
NOTE: The class statement in SUDAAN uses dummy coding (0 1, or
what SAS calls reference coding). The class statement in SAS uses
effect coding (-1 1). Assuming that you had a two-level categorical
variable, in SUDAAN the reference category is coded 0 0, while in SAS it is
coded -1 -1. This means that if you run an analysis in both SAS and SUDAAN
using the class statement, the coefficients of the dummies of the categorical
variable will not match. To get the results to match, use the param =
ref option on the class statement in SAS. You can tell what
kind of coding is being used by looking at the top of the output (in either
SUDAAN or SAS).
proc regress data=chis filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ab1 = srsex racehpra; class racehpra; run;Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 55383 Weighted count: 23829382 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 8 Weighted mean response is 2.502603 Multiple R-Square for the dependent variable AB1: 0.043591Frequencies and Values for CLASS Variables by: SRSEX. ----------------------------------- SRSEX Frequency Value ----------------------------------- Ordered Position: 1 23002 1 Ordered Position: 2 32426 2 -----------------------------------Frequencies and Values for CLASS Variables by: RACEHPRA. ------------------------------------------------- RACEHPRA Frequency Value ------------------------------------------------- Ordered Position: 1 9458 1 Ordered Position: 2 219 2 Ordered Position: 3 781 3 Ordered Position: 4 3956 4 Ordered Position: 5 2764 5 Ordered Position: 6 36729 6 Ordered Position: 7 1521 7 -------------------------------------------------Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AB1: AB1 by: Independent Variables and Effects.------------------------------------------------------------------------------------- Independent Variables and Beta Lower 95% Upper 95% Effects Coeff. SE Beta Limit Beta Limit Beta T-Test B=0 ------------------------------------------------------------------------------------- Intercept 2.398056 0.047290 2.303947 2.492165 50.710087 SRSEX 0.078808 0.011561 0.055801 0.101815 6.816660 RACEHPRA LATINO 0.347818 0.041945 0.264346 0.431291 8.292332 PACIFIC ISLANDER 0.000548 0.115201 -0.228710 0.229806 0.004759 AIAN 0.221383 0.071965 0.078169 0.364598 3.076281 ASIAN -0.005809 0.043640 -0.092656 0.081038 -0.133110 AFRICAN AMERICAN 0.103354 0.044022 0.015748 0.190960 2.347804 WHITE -0.184855 0.041252 -0.266949 -0.102761 -4.481135 OTH SINGL/MULTI RACE 0.000000 0.000000 0.000000 0.000000 . -------------------------------------------------------------------------------------------------------------------- Independent P-value Variables and T-Test Effects B=0 ------------------------------- Intercept 0.000000 SRSEX 0.000000 RACEHPRA LATINO 0.000000 PACIFIC ISLANDER 0.996215 AIAN 0.002869 ASIAN 0.894440 AFRICAN AMERICAN 0.021355 WHITE 0.000024 OTH SINGL/MULTI RACE . -------------------------------------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 8.000000 ********** 0.000000 MODEL MINUS INTERCEPT 7.000000 178.070752 0.000000 INTERCEPT . . . SRSEX 1.000000 46.466848 0.000000 RACEHPRA 6.000000 183.206577 0.000000 -------------------------------------------------------
Using the subgroup and levels statements
If you are using an earlier version of SUDAAN or if you want more control
over the handling of your categorical variables, you can use the subgroup
and levels statements. For each variable listed on the subgroup
statement, you need to list the number of levels of categories that each
variable has.
By default, the last category (i.e., the highest numbered category) is used
as the reference category when you have categorical predictors in a regression
model.
proc regress data=chis filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ab1 = srsex racehpra; subgroup srsex racehpra; levels 2 7; run;Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 55383 Weighted count: 23829382 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 8 Weighted mean response is 2.502603 Multiple R-Square for the dependent variable AB1: 0.043591Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AB1: AB1 by: Independent Variables and Effects. ------------------------------------------------------------------------------------- Independent Variables and Beta Lower 95% Upper 95% Effects Coeff. SE Beta Limit Beta Limit Beta T-Test B=0 ------------------------------------------------------------------------------------- Intercept 2.555672 0.040901 2.474277 2.637067 62.484879 SRSEX MALE -0.078808 0.011561 -0.101815 -0.055801 -6.816660 FEMALE 0.000000 0.000000 0.000000 0.000000 . RACEHPRA LATINO 0.347818 0.041945 0.264346 0.431291 8.292332 PACIFIC ISLANDER 0.000548 0.115201 -0.228710 0.229806 0.004759 AIAN 0.221383 0.071965 0.078169 0.364598 3.076281 ASIAN -0.005809 0.043640 -0.092656 0.081038 -0.133110 AFRICAN AMERICAN 0.103354 0.044022 0.015748 0.190960 2.347804 WHITE -0.184855 0.041252 -0.266949 -0.102761 -4.481135 OTH SINGL/MULTI RACE 0.000000 0.000000 0.000000 0.000000 . -------------------------------------------------------------------------------------------------------------------- Independent P-value Variables and T-Test Effects B=0 ------------------------------- Intercept 0.000000 SRSEX MALE 0.000000 FEMALE . RACEHPRA LATINO 0.000000 PACIFIC ISLANDER 0.996215 AIAN 0.002869 ASIAN 0.894440 AFRICAN AMERICAN 0.021355 WHITE 0.000024 OTH SINGL/MULTI RACE . ------------------------------------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 8.000000 ********** 0.000000 MODEL MINUS INTERCEPT 7.000000 178.070752 0.000000 INTERCEPT . . . SRSEX 1.000000 46.466848 0.000000 RACEHPRA 6.000000 183.206577 0.000000 -------------------------------------------------------
Changing the default reference category
To change the reference category, you can use the reflevel statement.
In this example, we have changed the reference category for both variables.
Also, we have used only the first four categories of the race variable,
racehpra. If you want to use only some of the categories, you can
recode the variable such that the categories that you want to use are the first
ones (e.g., coded 1, 2, 3, etc.) and then just give the number of desired
categories on the levels statement.
proc regress data=chis filetype=sas design = jackknife;
weight rakedw0;
jackwgts rakedw1--rakedw80 / adjjack=1;
reflevel racehpra = 2 srsex = 1 ;
model ab1 = srsex racehpra;
subgroup srsex racehpra;
levels 2 4;
run;
----------------------------------------------------------------------
Independent P-value
Variables and Beta T-Test
Effects Coeff. SE Beta T-Test B=0 B=0
----------------------------------------------------------------------
Intercept 2.46 0.11 22.88 0.0000
SRSEX
MALE 0.00 0.00 . .
FEMALE 0.11 0.03 4.51 0.0000
RACEHPRA
LATINO 0.35 0.11 3.27 0.0016
PACIFIC ISLANDER 0.00 0.00 . .
AIAN 0.22 0.12 1.91 0.0598
ASIAN -0.01 0.11 -0.06 0.9562
----------------------------------------------------------------------
-------------------------------------------------------
Contrast Degrees
of P-value
Freedom Wald F Wald F
-------------------------------------------------------
OVERALL MODEL 5 12285.21 0.0000
MODEL MINUS
INTERCEPT 4 56.79 0.0000
INTERCEPT . . .
SRSEX 1 20.36 0.0000
RACEHPRA 3 71.68 0.0000
-------------------------------------------------------Using only some of the categories in a categorical variable
You can specify just some of the levels of a categorical variable by
listing only the desired levels on the catlevel statement.
proc descript data=chis filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; var srsex racehpra racehpra racehpra; catlevel 1 1 3 5; setenv colwidth=12; print nsum wsum total setotal; run;----------------------------------------------------- | | | | Variable | | One | | | 1 | ----------------------------------------------------- | | | | | SRSEX: MALE | Sample Size | 55428 | | | Weighted Size | 23847415.32 | | | Total | 11631728.37 | | | SE Total | 515.26 | ----------------------------------------------------- | | | | | RACEHPRA: | Sample Size | 55428 | | LATINO | Weighted Size | 23847415.32 | | | Total | 5643945.79 | | | SE Total | 28469.00 | ----------------------------------------------------- | | | | | RACEHPRA: AIAN | Sample Size | 55428 | | | Weighted Size | 23847415.32 | | | Total | 85146.30 | | | SE Total | 4008.83 | ----------------------------------------------------- | | | | | RACEHPRA: | Sample Size | 55428 | | AFRICAN | Weighted Size | 23847415.32 | | AMERICAN | Total | 1387993.65 | | | SE Total | 11564.25 | -----------------------------------------------------
The coding of categorical variables
The numerical values used for the codes of a categorical variable are very
important. Values of variables listed on the subgroup statement
must be positive or else they are considered as missing. This means that
you cannot list a 0/1 dummy variable on the subgroup statement. In
most regression analyses, this is not a problem; you just include the variable
in the model and not on the subgroup statement (just as you would not
include a dummy variable on a class statement in SAS). However, in
other procedures, such as proc descript, you may want to include a dummy
variable on the subgroup statement. In this case, you would want to
recode the variable either using the recode statement or in a SAS data
step.
Cite this article
stats writer (2024). How can I incorporate categorical independent variables into regression analyses using SUDAAN?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-i-incorporate-categorical-independent-variables-into-regression-analyses-using-sudaan/
stats writer. "How can I incorporate categorical independent variables into regression analyses using SUDAAN?." PSYCHOLOGICAL SCALES, 1 Jul. 2024, https://scales.arabpsychology.com/stats/how-can-i-incorporate-categorical-independent-variables-into-regression-analyses-using-sudaan/.
stats writer. "How can I incorporate categorical independent variables into regression analyses using SUDAAN?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-i-incorporate-categorical-independent-variables-into-regression-analyses-using-sudaan/.
stats writer (2024) 'How can I incorporate categorical independent variables into regression analyses using SUDAAN?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-i-incorporate-categorical-independent-variables-into-regression-analyses-using-sudaan/.
[1] stats writer, "How can I incorporate categorical independent variables into regression analyses using SUDAAN?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, July, 2024.
stats writer. How can I incorporate categorical independent variables into regression analyses using SUDAAN?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.
