SAS: Have you ever use (in=a) in Merge Statement?

SAS is a statistical software package used for data manipulation, visualization, and analytics. The “in=a” statement is used in a Merge statement to tell SAS to use observations from the data set that is listed first in the Merge statement. This statement allows one to compare observations between data sets and combine them into a single data set. This can be used to join data sets together or to update values in an existing data set.


When merging two datasets in SAS, you can use the IN statement to only return rows where a value exists in a particular dataset.

Here are a few common ways to use the IN statement in practice:

Method 1: Return Rows where Value Exists in First Dataset (in = a)

data final_data;
  merge data1 (in=a) data2;
  by ID;
  if a;
run;

This particular example merges the datasets called data1 and data2 and only returns the rows where a value exists in data1.

Method 2: Return Rows where Value Exists in Second Dataset (in = b)

data final_data;
  merge data1 data2 (in=b);
  by ID;
  if b;
run;

This particular example merges the datasets called data1 and data2 and only returns the rows where a value exists in data2.

Method 3: Return Rows where Value Exists in Both Datasets (in = a) and (in = b)

data final_data;
  merge data1 (in = a) data2 (in=b);
  by ID;
  if a and b;
run;

This particular example merges the datasets called data1 and data2 and only returns the rows where a value exists in both data1 and data2.

The following examples show how to use each method in practice with the following two datasets:

/*create first dataset*/
data data1;
    input ID Gender $;
    datalines;
1 Male
2 Male
3 Female
4 Male
5 Female
;
run;

title "data1";
proc print data = data1;

/*create second dataset*/
data data2;
    input ID Sales;
    datalines;
1 22
2 15
4 29
6 31
7 20
8 13
;
run;

title "data2";
proc print data = data2;

Example 1: Return All Rows

We can use the following merge statement without any IN statement to merge the two datasets based on the value in the ID column and return all rows from both datasets:

/*perform merge*/
data final_data;
  merge data1 data2;
  by ID;
run;

/*view results*/
title "final_data";
proc print data=final_data;

Notice that all rows from both datasets are returned, regardless if there are missing values due to an ID value not existing in both datasets.

Example 2: Return Rows where Value Exists in First Dataset (in = a)

We can use the following merge statement with (in = a) to merge the two datasets based on the value in the ID column and return only the rows where a value exists in the first dataset:

/*perform merge*/
data final_data;
  merge data1 (in = a) data2;
  by ID;
  if a;
run;

/*view results*/
title "final_data";
proc print data=final_data;

Notice that only the rows where a value exists in the first dataset are returned.

Example 3: Return Rows where Value Exists in Second Dataset (in = b)

We can use the following merge statement with (in = b) to merge the two datasets based on the value in the ID column and return only the rows where a value exists in the second dataset:

/*perform merge*/
data final_data;
  merge data1 data2 (in = b);
  by ID;
  if b;
run;

/*view results*/
title "final_data";
proc print data=final_data;

Notice that only the rows where a value exists in the second dataset are returned.

Example 4: Return Rows where Value Exists in Both Datasets (in = a) and (in = b)

We can use the following merge statement with (in = a) and (in = b) to merge the two datasets based on the value in the ID column and return only the rows where a value exists in both datasets:

/*perform merge*/
data final_data;
  merge data1 (in = a) data2 (in = b);
  by ID;
  if a and b;
run;

/*view results*/
title "final_data";
proc print data=final_data;

Notice that only the rows where a value exists in both datasets are returned.

Note: You can find the complete documentation for the SAS merge statement .

The following tutorials explain how to perform other common tasks in SAS:

x