How to Use the ntile() Function in dplyr

The ntile() function in dplyr is used to divide a data set into equal-sized groups. It takes one argument, n, which indicates the number of groups to create. The function can be used to group data for further analysis, such as determining the quartiles or deciles of a dataset. It can also be used in conjunction with other dplyr functions for more complex data manipulation tasks.


You can use the ntile() function from the package in R to break up an input vector into n buckets.

This function uses the following basic syntax:

ntile(x, n)

where:

  • x: Input vector
  • n: Number of buckets

Note: The size of the buckets can differ by up to one.

The following examples show how to use this function in practice.

Example 1: Use ntile() with a Vector

The following code shows how to use the ntile() function to break up a vector with 11 elements into 5 different buckets:

library(dplyr)

#create vector
x <- c(1, 3, 4, 6, 7, 8, 10, 13, 19, 22, 23)

#break up vector into 5 buckets
ntile(x, 5)

 [1] 1 1 1 2 2 3 3 4 4 5 5

From the output we can see that each element from the original vector has been placed into one of five buckets.

The smallest values are assigned to bucket 1 while the largest values are assigned to bucket 5.

For example:

  • The smallest values of 1, 3, and 4 are assigned to bucket 1.
  • The largest values of 22 and 23 are assigned to bucket 5.

Example 2: Use ntile() with a Data Frame

Suppose we have the following data frame in R that shows the points scored by various basketball players:

#create data frame
df <- data.frame(player=LETTERS[1:9],
                 points=c(12, 19, 7, 22, 24, 28, 30, 19, 15))

#view data frame
df

  player points
1      A     12
2      B     19
3      C      7
4      D     22
5      E     24
6      F     28
7      G     30
8      H     19
9      I     15

library(dplyr)

#create new column that assigns players into buckets based on points
df$bucket <- ntile(df$points, 3)

#view updated data frame
df

  player points bucket
1      A     12      1
2      B     19      2
3      C      7      1
4      D     22      2
5      E     24      3
6      F     28      3
7      G     30      3
8      H     19      2
9      I     15      1

The new bucket column assigns a value between 1 and 3 to each player.

The players with the lowest points receive a value of 1 and the players with the highest points receive a value of 3.

The following tutorials explain how to use other common functions in R:

x