How to calculate Skewness & Kurtosis in SAS?

Skewness and kurtosis are measures of the shape of a data distribution. In SAS, these measures can be calculated using the UNIVARIATE procedure. This procedure will produce summary statistics of the data, including the skewness and kurtosis values. These values can then be used to assess the symmetry and peakedness of the distribution.


In statistics, skewness and kurtosis are two ways to measure the shape of a distribution.

Skewness measures the asymmetry of a distribution.

  • A negative skew indicates that the tail is on the left side of the distribution.
  • A positive skew indicates that the tail is on the right side of the distribution.
  • A value of zero indicates that there is no skewness in the distribution at all, meaning the distribution is perfectly symmetrical.

Kurtosis measures whether or not a distribution is heavy-tailed or light-tailed relative to a normal distribution.

  • The kurtosis of a normal distribution is 0.
  • If a given distribution has a kurtosis less than 0, it is said to be playkurtic, which means it tends to produce fewer and less extreme outliers than the normal distribution.
  • If a given distribution has a kurtosis greater than 0, it is said to be leptokurtic, which means it tends to produce more outliers than the normal distribution.

To calculate skewness and kurtosis for variables in SAS, you can use the SKEWNESS and KURTOSIS statements in PROC MEANS.

The following example shows how to use these statements in practice.

Example: Calculating Skewness & Kurtosis in SAS

Suppose we have the following dataset in SAS that contains information about various basketball players:

/*create dataset*/
data my_data;
    input team $ points assists;
    datalines;
A 10 2
A 17 5
A 17 6
A 18 3
A 15 0
B 10 2
B 14 5
B 13 4
B 29 0
B 25 2
C 12 1
C 30 1
C 34 3
C 12 4
C 11 7
;
run;

/*view dataset*/
proc print data=my_data;

We can use PROC MEANS with the SKEWNESS and KURTOSIS statements to calculate the skewness and kurtosis for each numeric variable in the dataset:

/*calculate skewness and kurtosis for each numeric variable*/
proc means data=my_data SKEWNESS KURTOSIS;
run;

The table in the output displays the skewness and kurtosis values for each numeric variable in the dataset:

(1) points

  • The points variable has a skewness of 1.009. Since this value is greater than 0, it means the tail is on the right side of the distribution.
  • The points variable has a kurtosis of -0.299. Since this value is less than 0, it means the distribution has slightly fewer and less extreme outliers than the normal distribution.
  • The assists variable has a skewness of 0.304. Since this value is greater than 0, it means the tail is on the right side of the distribution.
  • The assists variable has a kurtosis of -0.782. Since this value is less than 0, it means the distribution has fewer and less extreme outliers than the normal distribution.

To visualize the distribution of values for each numeric variable in the dataset, you can use PROC UNIVARIATE to create histograms for both the points and assists variables:

/*create histograms for points and assists variables*/
proc univariate data=my_data;
    var points assists;
    histogram points assists;
run;

This produces the following histogram for the points variable:

And the following histogram for the assists variable:

The following tutorials explain how to perform other common tasks in SAS:

x