Table of Contents
Winsorizing data in Excel involves replacing extreme values in a data set with a more representative value. This is done by replacing values that are above a certain threshold with the value of the threshold, and similarly replacing values below a certain threshold with the value of the threshold. This technique can be used to reduce the influence of outliers on a data set and make the results more representative of the overall population.
To winsorize data means to set extreme outliers equal to a specified percentile of the data.
For example, a 90% winsorization sets all greater than the 95th percentile equal to the value at the 95th percentile and all observations less than the 5th percentile equal to the value at the 5th percentile.
This tutorial provides a step-by-step example of how to winsorize a dataset in Excel.
Step 1: Create the Data
First, we’ll create the following dataset:
Step 2: Calculate the Upper and Lower Percentiles
For this example, we’ll perform a 90% winsorization. This means we’ll set all values greater than the 95th percentile equal to the 95th percentile and all values less than the 5th percentile equal to the 5th percentile.
The following formulas show how to find the 5th and 95th percentiles:
The 5th percentile turns out to be 12.35 and the 95th percentile turns out to be 92.05.
Step 3: Winsorize the Data
Lastly, we’ll use the following formula to winsorize the data:
Note that we just copy and pasted the formula in cell F2 down to the remaining cells in column F.
In this case, the value 3 became changed to 12.35 and the value 98 became changed to 92.05.
Note that in this example we performed a 90% winsorization, but it’s possible to also perform an 80% winsorization, 95% winsorization, 99% winsorization, etc. by simply calculating different upper and lower percentiles.