How do I use the DATA step in SAS?

The DATA step is a powerful tool in SAS that allows you to read, manipulate, and write data. It consists of a series of statements and procedures that read raw data, manipulate it, and then write the results to a data set. You can use the DATA step to filter, sort, and summarize data, as well as create new variables, eliminate duplicate records, and more.


You can use the DATA step in SAS to create datasets.

There are two common ways to use the DATA step:

1. Create a dataset from scratch.

2. Create a dataset from an existing dataset.

The following examples show how to use each method in practice.

Example 1: Use DATA Step to Create Dataset from Scratch

The following syntax shows how to use the DATA step to create a dataset with three variables:

/*create dataset*/
data my_data;
    input team $ position $ points;
    datalines;
A Guard 25
A Guard 20
A Guard 30
A Forward 25
A Forward 10
B Guard 10
B Guard 22
B Forward 30
B Forward 10
B Forward 10
B Forward 25
;
run;

/*view dataset*/
proc print data=my_data;

Here is exactly what we did in this example:

First, we used data to name the dataset.

Then, we used input to specify the variable names ($ specifies a character variable).

Then, we used datalines to tell SAS that the upcoming lines represented values in the dataset.

Example 2: Use DATA Step to Create Dataset from Existing Dataset

We can use the data step along with the set statement to create a dataset from another dataset that already exists.

For example, we can use the following syntax to create a new dataset called new_data that uses the variables from the dataset called my_data but drops the ‘returns’ variable:

/*create new dataset that drops returns from my_data*/
data new_data;
    set my_data;
    drop returns;
run;

/*view dataset*/
proc print data=new_data;

Here is exactly what we did in this example:

First, we used data to name the new dataset.

Then, we used set to specify the existing dataset to create the new dataset from.

Then, we used drop to drop the ‘returns’ variable from the new dataset.

The end result is a new dataset contains the exact same variables from the original dataset except the ‘returns’ variable has been dropped.

Related:

The following tutorials explain how to perform other common tasks in SAS:

x