How can I utilize the SAS SELECT DISTINCT statement in PROC SQL to remove duplicate records in my dataset? 2

How can I utilize the SAS SELECT DISTINCT statement in PROC SQL to remove duplicate records in my dataset?

The SAS SELECT DISTINCT statement in PROC SQL is a useful tool for removing duplicate records from a dataset. This statement allows the user to select unique values from a specific column or set of columns, eliminating any duplicate entries. By utilizing this statement, one can easily clean and organize their data by removing redundant information. This can be particularly helpful in cases where duplicate records may skew analytical results or create confusion in data interpretation. Overall, the SAS SELECT DISTINCT statement is an efficient way to streamline and improve the accuracy of datasets in SAS PROC SQL.

SAS: Use SELECT DISTINCT in PROC SQL


You can use the SELECT DISTINCT statement within PROC SQL in SAS to select only unique rows from a dataset.

The following example shows how to use this statement in practice.

Example: Using SELECT DISTINCT in SAS

Suppose we have the following dataset in SAS that contains information about various basketball players:

/*create dataset*/
data my_data;
    input team $ position $ points;
    datalines;
A Guard 14
A Guard 14
A Guard 24
A Forward 13
A Forward 13
B Guard 22
B Guard 22
B Forward 34
C Forward 15
C Forward 18
;
run;

/*view dataset*/
proc printdata=my_data;

We can use the SELECT DISTINCT statement within PROC SQL to select all unique rows from the dataset:

/*select all unique rows*/
proc sql;
    select distinct *
    from my_data;
quit;

Note: The star ( * ) symbol after SELECT DISTINCT tells SAS to select all columns in the dataset.

Notice that all unique rows are shown in the output.

For example, there are multiple rows that have a team value of A, position value of Forward and points value of 13 but only one of these rows is shown.

Note that we can also specify which columns we’d like to select:

/*select all unique combinations of team and position*/
proc sql;
    select distinct team, position
    from my_data;
quit;

Notice that only the unique combinations of teams and positions are shown in the output.

Cite this article

stats writer (2024). How can I utilize the SAS SELECT DISTINCT statement in PROC SQL to remove duplicate records in my dataset?. PSYCHOLOGICAL SCALES. Retrieved from https://scales.arabpsychology.com/stats/how-can-i-utilize-the-sas-select-distinct-statement-in-proc-sql-to-remove-duplicate-records-in-my-dataset/

stats writer. "How can I utilize the SAS SELECT DISTINCT statement in PROC SQL to remove duplicate records in my dataset?." PSYCHOLOGICAL SCALES, 23 Jun. 2024, https://scales.arabpsychology.com/stats/how-can-i-utilize-the-sas-select-distinct-statement-in-proc-sql-to-remove-duplicate-records-in-my-dataset/.

stats writer. "How can I utilize the SAS SELECT DISTINCT statement in PROC SQL to remove duplicate records in my dataset?." PSYCHOLOGICAL SCALES, 2024. https://scales.arabpsychology.com/stats/how-can-i-utilize-the-sas-select-distinct-statement-in-proc-sql-to-remove-duplicate-records-in-my-dataset/.

stats writer (2024) 'How can I utilize the SAS SELECT DISTINCT statement in PROC SQL to remove duplicate records in my dataset?', PSYCHOLOGICAL SCALES. Available at: https://scales.arabpsychology.com/stats/how-can-i-utilize-the-sas-select-distinct-statement-in-proc-sql-to-remove-duplicate-records-in-my-dataset/.

[1] stats writer, "How can I utilize the SAS SELECT DISTINCT statement in PROC SQL to remove duplicate records in my dataset?," PSYCHOLOGICAL SCALES, vol. X, no. Y, ص Z-Z, June, 2024.

stats writer. How can I utilize the SAS SELECT DISTINCT statement in PROC SQL to remove duplicate records in my dataset?. PSYCHOLOGICAL SCALES. 2024;vol(issue):pages.

Download Post (.PDF)
Slide Up
x
PDF
Scroll to Top