How to use spread function in R (With Examples)

The spread function in R is used to transform data from wide to long format and vice-versa. It takes two arguments, namely, the data and the key-value pairs. The data argument refers to the data frame and the key-value pairs refer to the column names. An example of how to use the spread function is to transform a data frame with columns “state”, “year”, and “value” to a data frame with columns “state”, “2012”, “2013”, and “2014”. This can be done by using the spread function with “state” as the key and “year” and “value” as the value.


The spread() function from the package can be used to “spread” a key-value pair across multiple columns.

This function uses the following basic syntax:

spread(data, key value)

where:

  • data: Name of the data frame
  • key: Column whose values will become variable names
  • value: Column where values will fill under new variables created from key

The following examples show how to use this function in practice.

Example 1: Spread Values Across Two Columns

Suppose we have the following data frame in R:

#create data frame
df <- data.frame(player=rep(c('A', 'B'), each=4),
                 year=rep(c(1, 1, 2, 2), times=2),
                 stat=rep(c('points', 'assists'), times=4),
                 amount=c(14, 6, 18, 7, 22, 9, 38, 4))

#view data frame
df

  player year    stat amount
1      A    1  points     14
2      A    1 assists      6
3      A    2  points     18
4      A    2 assists      7
5      B    1  points     22
6      B    1 assists      9
7      B    2  points     38
8      B    2 assists      4

We can use the spread() function to turn the values in the stat column into their own columns:

library(tidyr)

#spread stat column across multiple columns
spread(df, key=stat, value=amount)

  player year assists points
1      A    1       6     14
2      A    2       7     18
3      B    1       9     22
4      B    2       4     38

Example 2: Spread Values Across More Than Two Columns

Suppose we have the following data frame in R:

#create data frame
df2 <- data.frame(player=rep(c('A'), times=8),
                 year=rep(c(1, 2), each=4),
                 stat=rep(c('points', 'assists', 'steals', 'blocks'), times=2),
                 amount=c(14, 6, 2, 1, 29, 9, 3, 4))

#view data frame
df2

  player year    stat amount
1      A    1  points     14
2      A    1 assists      6
3      A    1  steals      2
4      A    1  blocks      1
5      A    2  points     29
6      A    2 assists      9
7      A    2  steals      3
8      A    2  blocks      4

We can use the spread() function to turn the four unique values in the stat column into four new columns:

library(tidyr)

#spread stat column across multiple columns
spread(df2, key=stat, value=amount)

  player year assists blocks points steals
1      A    1       6      1     14      2
2      A    2       9      4     29      3

  • Every column is a variable.
  • Every row is an observation.
  • Every cell is a single value.

The tidyr package uses four core functions to create tidy data:

1. The spread() function.

2. The function.

3. The function.

4. The function.

If you can master these four functions, you will be able to create “tidy” data from any data frame.

x