How do I sample rows with replacement in Pandas?

In Pandas, you can sample rows with replacement by using the DataFrame.sample() function, which allows you to specify the proportion of rows to sample and whether or not you want to sample with replacement. This function will then return a new DataFrame containing the sampled rows. You can also specify weights or labels to sample from specific groups.


You can use the argument replace=True within the pandas sample() function to randomly sample rows in a DataFrame with replacement:

#randomly select n rows with repeats allowed
df.sample(n=5, replace=True) 

By using replace=True, you allow the same row to be included in the sample multiple times.

The following example shows how to use this syntax in practice.

Example: Sample Rows with Replacement in Pandas

Suppose we have the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
                   'points': [18, 22, 19, 14, 14, 11, 20, 28],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})
                   
#view DataFrame
print(df)

  team  points  assists  rebounds
0    A      18        5        11
1    B      22        7         8
2    C      19        7        10
3    D      14        9         6
4    E      14       12         6
5    F      11        9         5
6    G      20        9         9
7    H      28        4        12

Suppose we use the sample() function to randomly select a sample of rows:

#randomly select 6 rows from DataFrame (without replacement)
df.sample(n=6, random_state=0)

        team	points	assists	rebounds
6	G	20	9	9
2	C	19	7	10
1	B	22	7	8
7	H	28	4	12
3	D	14	9	6
0	A	18	5	11

Notice that six rows have been selected from the DataFrame and none of the rows appear multiple times in the sample.

Note: The argument random_state=0 ensures that this example is reproducible.

Now suppose we use the argument replace=True to select a random sample of rows with replacement:

#randomly select 6 rows from DataFrame (with replacement)
df.sample(n=6, replace=True, random_state=0)

        team	points	assists	rebounds
4	E	14	12	6
7	H	28	4	12
5	F	11	9	5
0	A	18	5	11
3	D	14	9	6
3	D	14	9	6

Notice that the row with team “D” appears multiple times.

By using the argument replace=True, we allow the same row to appear in the sample multiple times.

Also note that we could select a random fraction of the DataFrame to be included in the sample by using the frac argument.

For example, the following example shows how to select 75% of rows to be included in the sample with replacement:

#randomly select 75% of rows (with  replacement)
df.sample(frac=0.75, replace=True, random_state=0) 

        team	points	assists	rebounds
4	E	14	12	6
7	H	28	4	12
5	F	11	9	5
0	A	18	5	11
3	D	14	9	6
3	D	14	9	6

Notice that 75% of the number of rows (6 out of 8) were included in the sample and at least one of the rows (with team “D”) appeared in the sample twice.

Note: You can find the complete documentation for the pandas sample() function .

The following tutorials explain how to perform other common sampling methods in Pandas:

x