How to calculate deciles in R


In statistics, deciles are numbers that split a dataset into ten groups of equal frequency.

The first decile is the point where 10% of all data values lie below it. The second decile is the point where 20% of all data values lie below it, and so on.

We can use the following syntax to calculate the deciles for a dataset in R:

quantile(data, probs = seq(.1, .9, by = .1))

The following example shows how to use this function in practice.

Example: Calculate Deciles in R

The following code shows how to create a fake dataset with 20 values and then calculate the values for the deciles of the dataset:

#create dataset
data <- c(56, 58, 64, 67, 68, 73, 78, 83, 84, 88,
          89, 90, 91, 92, 93, 93, 94, 95, 97, 99)

#calculate deciles of dataset
quantile(data, probs = seq(.1, .9, by = .1))

 10%  20%  30%  40%  50%  60%  70%  80%  90% 
63.4 67.8 76.5 83.6 88.5 90.4 92.3 93.2 95.2 

The way to interpret the deciles is as follows:

  • 10% of all data values lie below 63.4
  • 20% of all data values lie below 67.8.
  • 30% of all data values lie below 76.5.
  • 40% of all data values lie below 83.6.
  • 50% of all data values lie below 88.5.
  • 60% of all data values lie below 90.4.
  • 70% of all data values lie below 92.3.
  • 80% of all data values lie below 93.2.
  • 90% of all data values lie below 95.2.

It’s worth noting that the value at the 50th percentile is equal to the median value of the dataset.

Example: Place Values into Deciles in R

To place each data value into a decile, we can use the ntile(x, ngroups) function from the package in R.

Here’s how to use this function for the dataset we created in the previous example:

library(dplyr) 

#create dataset
data <- data.frame(values=c(56, 58, 64, 67, 68, 73, 78, 83, 84, 88,
                            89, 90, 91, 92, 93, 93, 94, 95, 97, 99))

#place each value into a decile
data$decile <- ntile(data, 10)

#view data
data

   values decile
1      56      1
2      58      1
3      64      2
4      67      2
5      68      3
6      73      3
7      78      4
8      83      4
9      84      5
10     88      5
11     89      6
12     90      6
13     91      7
14     92      7
15     93      8
16     93      8
17     94      9
18     95      9
19     97     10
20     99     10

The way to interpret the output is as follows:

  • The data value 56 falls between the percentile 0% and 10%, thus it falls in the first decile.
  • The data value 58 falls between the percentile 0% and 10%, thus it falls in the first decile.
  • The data value 64 falls between the percentile 10% and 20%, thus it falls in the second decile.
  • The data value 67 falls between the percentile 10% and 20%, thus it falls in the second decile.
  • The data value 68 falls between the percentile 20% and 30%, thus it falls in the third decile.

x