How to use PROC SORT with KEEP statement to keep selected observations?

The KEEP statement in the PROC SORT procedure can be used to keep selected observations. It is used to specify the data set that contains the observations you want to keep. The KEEP statement can be used to specify which variables you want to keep, which values of a given variable you want to keep, or a selection of observations. The KEEP statement is an easy way to keep only the observations that are of interest for further analysis.


You can use PROC SORT with the KEEP statement in SAS to sort the rows in a dataset and only keep specific columns after sorting.

You can use the following basic syntax to do so:

proc sort data=my_data out=sorted_data (keep=var1 var2);
    by var2;
run;

This particular example sorts the rows in the dataset based on the values in the var2 column and then only keeps the var1 and var2 columns after sorting.

The following example shows how to use this syntax in practice.

Example: Use PROC SORT with KEEP Statement in SAS

Suppose we have the following dataset in SAS that contains information about various basketball teams:

/*create dataset*/
data my_data;
    input team $ points assists;
    datalines;
Mavs 113 22
Pacers 95 19
Cavs 100 34
Lakers 114 20
Heat 123 39
Kings 100 22
Raptors 105 11
Hawks 95 25
Magic 103 26
Spurs 119 29
;
run;

/*view dataset*/
proc print data=my_data;

We could use the following syntax to sort the rows in the dataset based on the values in the points column:

/*sort rows in dataset based on values in points column*/
proc sort data=my_data out=sorted_data;
    by points;
run;

/*view sorted dataset*/
proc print data=sorted_data;

Notice that the rows are now sorted in ascending order based on the values in the points column.

By default, SAS keeps all of the columns in the dataset after sorting.

However, you can use the KEEP statement to specify which columns to keep after sorting.

For example, we can use the following syntax to sort the rows in the dataset based on the values in the points column and then only keep the team and points columns:

/*sort rows in dataset based on values in points column and only keep team and points*/
proc sort data=my_data out=sorted_data (keep=team points);
    by points;
run;

/*view sorted dataset*/
proc print data=sorted_data;

Once again, the rows are sorted in ascending order based on the values in the points column but this time we used the KEEP statement to only keep the team and points columns after sorting.

The following tutorials explain how to perform other common tasks in SAS:

x