How to Randomly Select rows from Pandas DataFrame
In Pandas, it is possible to select rows randomly from a DataFrame with different methods. Randomly selecting rows can be useful for tasks like sampling, testing or data exploration.
Creating Sample Pandas DataFrame
First, we will create a sample Pandas DataFrame that we will use further in our article.
import pandas as pd
d = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
'Age':[27, 24, 22, 32, 15],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
df = pd.DataFrame(d)
df
Output

Let's discuss how to randomly select rows from Pandas DataFrame. A random selection of rows from a DataFrame can be achieved in different ways. Below are the ways by which we can randomly select rows from Pandas DataFrame:
- Using sample() Method
- Using parameter n
- Using frac parameter
- Using replace = false
- Using weights
- Using axis
- Using random_state
- Using NumPy
1. Using sample() method
In this example, we are using sample() method to randomly select rows from Pandas DataFram. Sample method returns a random sample of items from an axis of object and this object of same type as our caller.
# Select one random row
dfs = dfs.sample()
print(dfs)
Output

2. Using parameter n
We can specify the number of rows to select using the n
parameter. Every time we run this, we'll get different rows.
# Select 3 random rows
df.sample(n=3)
Output

3. Using frac Parameter
One can do fraction of axis items and get rows. For example, if frac= .5 then sample method return 50% of rows.
df.sample(frac=0.5) # here you get .50 % of the rows
Output

4. Selecting Rows with Replacement (replace=False
)
By default, the sample()
method doesn’t allow selecting the same row more than once. However, we can allow this by setting replace=True
.
df.sample(n=5, replace=True)
Output

5. Using Weights to Select Rows
We can assign weights to rows so that some rows are more likely to be selected than others. The weights
parameter controls the probability of selecting each row.
test_weights = [0.2, 0.4, 0.2, 0.2, 0.4]
df.sample(n=3, weights=test_weights)
Output

6. Using axis
Parameter for Column Sampling
The axis accepts number or name. sample() method also allows users to sample columns instead of rows using the axis argument.
# Sample columns instead of rows
df1.sample(axis=0)
Output

7. Using random_state
for Reproducibility
With a given DataFrame, the sample will always fetch same rows. If random_state is None or np.random, then a randomly-initialized RandomState object is returned.
df.sample(n=2, random_state=2)
Output

8. Using NumPy for Random Selection
We can also use NumPy to randomly select rows based on their index. This approach allows us to control the number of rows to select and whether or not to allow replacement.
import numpy as np
indices = np.random.choice(df.index, size=4, replace=False)
df.loc[indices]
Output

Related Article: