How do you Use Proc Compare in SAS? (With Examples)

Proc Compare in SAS is used to compare two datasets and identify any differences in variable names, labels, attributes, and data values. It can be used to compare two datasets with the same structure or two datasets with different structures. For example, if you wanted to compare the values of a dataset from last year to the values of the same dataset from this year, you can use Proc Compare to compare the two datasets and see if there are any differences. The output includes a report that contains the differences between the two datasets.


You can use PROC COMPARE in SAS to quickly identify the similarities and differences between two datasets.

This procedure uses the following basic syntax:

proc compare
    base=data1
    compare=data2;
run;

The following example shows how to use this procedure in practice.

Example: Using Proc Compare in SAS

Suppose we have the following two datasets in SAS:

/*create datasets*/
data data1;
    input team $ points rebounds;
    datalines;
A 25 10
B 18 4
C 18 7
D 24 12
E 27 11
;
run;

data data2;
    input team $ points;
    datalines;
A 25
B 18
F 27
G 21
H 20
;
run;

/*view datasets*/
proc print data=data1;
proc print data=data2;

We can use the following PROC COMPARE statement to find the similarities and differences between the two datasets:

/*compare the two datasets*/
proc compare
    base=data1
    compare=data2;
run;

This will produce three tables in the output:

Table 1: A Summary of Both Tables

The first table shows a brief summary of each dataset, including:

1. The number of variables (NVar) and observations (NObs) in each dataset.

  • Data1 has 3 variables and 5 observations
  • Data2 has 2 variables and 5 observations

2. The number of variables in common between the two datasets.

  • Data1 and Data2 have 2 variables in common (team and points)

Table 2: A Summary of the Number of Differences in Values

The second table summarizes the number of differences in values between the two tables.

The most interesting part of this output is located at the end of the table where we can see a summary of differences between the variables:

  • The team variable has 3 observations with different values.
  • The points variables has 3 observations with different values. The max difference is 9.

Table 3: The Actual Differences Between Observations

The third table shows the actual differences between the observations in the two datasets.

The first table shows the differences in the team variable between the two datasets.

  • For example, in data1 the third observation has a value of C for team while in data2 the third observation has a value of F.

The second table shows the differences in the points variables between the two datasets.

  • For example, in data1 the third observation has a value of 18 for points while in data2 the third observation has a value of 27. The difference between the two values is 9.

These three tables give us a complete understanding of the differences between the two datasets.

Note that if you only want to compare the differences between the two datasets for one specific variable, you can use the following syntax:

/*compare the differences between the datasets only for 'points' variable*/
proc compare
    base=data1
    compare=data2;
    var points;
run;

This will produce the same three tables as earlier, but only the output for the points variable will be shown.

Note: You can find the complete documentation for PROC COMPARE .

The following tutorials explain how to perform other common tasks in SAS:

x