How to Use PROC SURVEYSELECT in SAS (With Examples)

PROC SURVEYSELECT is a procedure in SAS which allows you to sample data from a population. It can be used to select a random sample from a population or to stratify a sample from a population based on certain criteria. It is commonly used to generate a sample for survey research. PROC SURVEYSELECT works by randomly selecting observations from the population and can be used to generate samples of different sizes and proportions. It can also be used to select a systematic sample from a population.


You can use PROC SURVEYSELECT to select a random sample from a dataset in SAS.

Here are three common ways to use this procedure in practice:

Example 1: Use PROC SURVEYSELECT to Select Simple Random Sample

proc surveyselect data=my_data
    out=my_sample
    method=srs    /*use simple random sampling*/
    n=5           /*select a total of 5 observations*/
    seed=1;       /*set seed to make this example reproducible*/
run;

This particular example selects 5 random observations from the entire dataset.

Example 2: Use PROC SURVEYSELECT to Select Stratified Random Sample

proc surveyselect data=my_data
    out=my_sample
    method=srs           /*use simple random sampling*/
    n=2                  /*select 2 observations from each strata*/
    seed=1;              /*set seed to make this example reproducible*/
    strata grouping_var; /*specify variable to use for stratification*/
run;

This particular example selects 2 random observations from each unique stratum in the dataset.

The strata statement specifies the variable to use for stratification.

Example 3: Use PROC SURVEYSELECT to Select Clustered Random Sample

proc surveyselect data=my_data
    out=my_sample
    n=2                   /*select 2 clusters*/
    seed=1;               /*set seed to make this example reproducible*/
    cluster grouping_var; /*specify variable to use for stratification*/
run;

This particular example selects 2 random clusters from the dataset and includes every observation from each cluster in the sample.

The cluster statement specifies the variable to use for clustering.

The following examples show how to use each method in practice with the following dataset in SAS that contains information about basketball players on various teams:

/*create dataset*/
data my_data;
    input team $ points;
    datalines;
A 12
A 14
A 22
A 35
A 40
B 12
B 10
B 29
B 33
C 40
C 25
C 11
C 10
C 15
;
run;

/*view dataset*/
proc print data = my_data;

Example 1: Use PROC SURVEYSELECT to Select Simple Random Sample

proc surveyselect data=my_data
    out=my_sample
    method=srs    /*use simple random sampling*/
    n=5           /*select a total of 5 observations*/
    seed=1;       /*set seed to make this example reproducible*/
run;

/*view sample*/
proc print data=my_sample;

The resulting sample contains 5 randomly chosen from the entire dataset.

Example 2: Use PROC SURVEYSELECT to Select Stratified Random Sample

We can use the following syntax to perform stratified random sampling in which 2 observations are randomly chosen from each team to be included in the sample:

proc surveyselect data=my_data
    out=my_sample
    method=srs    /*use simple random sampling within strata*/
    n=2           /*select 2 observations from each strata*/
    seed=1;       /*set seed to make this example reproducible*/
    strata grouping_var; /*specify variable to use for stratification*/
run;

/*view sample*/
proc print data=my_sample;

The resulting sample contains 2 randomly chosen from each team.

Related:

Example 3: Use PROC SURVEYSELECT to Select Clustered Random Sample

We can use the following syntax to perform clustered random sampling in which we use the teams as clusters and randomly select 2 clusters and include each observation from those clusters in the sample:

proc surveyselect data=my_data
    out=my_sample
    n=2           /*select a total of 2 clusters*/
    seed=1;       /*set seed to make this example reproducible*/
    cluster grouping_var; /*specify variable to use for clustering*/
run;

/*view sample*/
proc print data=my_sample;

This particular sample contains every observation from teams A and B, which were the two “clusters” randomly chosen.

Note: You can find the complete documentation for PROC SURVEYSELECT .

The following tutorials explain how to perform other common tasks in SAS:

x