Table of Contents
When using the dplyr package in R, you can select random rows by using the sample_frac() function. This function requires two arguments: the data frame from which to select the rows and the fraction of rows to sample. By setting the fraction to a value between 0 and 1, you can select a random sample from the data frame. For example, sample_frac(df, 0.5) will randomly select half of the rows from the data frame df.
You can use the following methods to select random rows from a data frame in R using functions from the package:
Method 1: Select Random Number of Rows
df %>% sample_n(5)
This function randomly selects 5 rows from the data frame.
Method 2: Select Random Fraction of Rows
df %>% sample_frac(.25)
This function randomly selects 25% of all rows from the data frame.
The following examples show how to use each method in practice with the following data frame in R:
#create data frame
df <- data.frame(team=c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
points=c(10, 10, 8, 6, 15, 15, 12, 12),
rebounds=c(8, 8, 4, 3, 10, 11, 7, 7))
#view data frame
df
team points rebounds
1 A 10 8
2 B 10 8
3 C 8 4
4 D 6 3
5 E 15 10
6 F 15 11
7 G 12 7
8 H 12 7
Example 1: Select Random Number of Rows
We can use the following code to randomly select 5 rows from the data frame:
library(dplyr)
#randomly select 5 rows from data frame
df %>% sample_n(5)
team points rebounds
1 F 15 11
2 A 10 8
3 D 6 3
4 G 12 7
5 B 10 8
Notice that five rows are randomly selected from the data frame.
Example 2: Select Random Fraction of Rows
We can use the following code to randomly select 25% of all rows from the data frame:
library(dplyr)
#randomly select 25% of all rows from data frame
df %>% sample_frac(.25)
team points rebounds
1 E 15 10
2 G 12 7
Since the original data frame had 8 total values, 25% of 8 is equal to 2.
Note: You can find the complete documentation for the sample_n and sample_frac functions in dplyr .
The following tutorials explain how to perform other common operations in dplyr: