How do I Shuffle Rows in a Pandas DataFrame ?

To shuffle rows in a Pandas DataFrame, you can use the np.random.permutation() function to randomly permute the index of the DataFrame. Then reassign the df.index to be the shuffled array of indices and call df.reset_index(drop=True) to reset the index and drop the old index column. This will shuffle the rows in the DataFrame.


You can use the following syntax to randomly shuffle the rows in a pandas DataFrame:

#shuffle entire DataFrame
df.sample(frac=1)

#shuffle entire DataFrame and reset index
df.sample(frac=1).reset_index(drop=True)

Here’s what each piece of the code does:

  • The sample() function takes a sample of all rows without replacement.
  • The frac argument specifies the fraction of rows to return in the sample. A frac value of 1 specifies to use all rows.
  • The reset_index(drop=True) function specifies to reset the index of the rows.

The following examples show how to use this syntax in practice.

Example 1: Shuffle Entire DataFrame

The following code shows how to shuffle all rows in a pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'],
                   'points': [77, 82, 86, 88, 80, 95],
                   'rebounds': [19, 22, 15, 28, 33, 29]})

#view DataFrame
df

	team	points	rebounds
0	A	77	19
1	A	82	22
2	A	86	15
3	B	88	28
4	B	80	33
5	C	95	29

#shuffle all rows of DataFrame
df.sample(frac=1)

	team	points	rebounds
1	A	82	22
3	B	88	28
2	A	86	15
5	C	95	29
4	B	80	33
0	A	77	19

Notice that the rows are shuffled and each row retained its original index value.

Also note that each time you run this function, the order of the rows will change. 

Example 2: Shuffle Entire DataFrame & Reset Index

The following code shows how to shuffle all rows in a pandas DataFrame and reset the index values:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'],
                   'points': [77, 82, 86, 88, 80, 95],
                   'rebounds': [19, 22, 15, 28, 33, 29]})

#view DataFrame
df

	team	points	rebounds
0	A	77	19
1	A	82	22
2	A	86	15
3	B	88	28
4	B	80	33
5	C	95	29

#shuffle all rows of DataFrame
df.sample(frac=1).reset_index(drop=True)

	team	points	rebounds
0	A	77	19
1	C	95	29
2	A	82	22
3	B	88	28
4	A	86	15
5	B	80	33

Notice that the rows are shuffled and the index is also reset so that the first row has an index value of 0, the second row has an index value of 1, and so on.

x