Pandas DataFrame.where()-Python
DataFrame.where() function replace values in a DataFrame based on a condition. It allows you to keep the original value where a condition is True and replace it with something else e.g., NaN or a custom value where the condition is False.
For Example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, -2, 3], 'B': [-1, 5, -6]})
res = df.where(df > 0)
print(res)
Output
A B 0 1.0 NaN 1 NaN 5.0 2 3.0 NaN
Explanation: df.where(df > 0) keeps values where the condition is True (greater than 0) and replaces others with NaN.
Syntax
DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
Parameters:
Parameter | Description |
---|---|
cond | The condition to check (boolean DataFrame or Series) |
other | Value to replace where condition is False (default is NaN) |
inplace | If True, modifies the DataFrame in place |
axis | The axis to align the condition along |
level | Used with MultiIndex |
errors | If 'raise', errors are raised; if 'ignore', errors are ignored |
try_cast | Try to cast the result to the original type |
Returns: A DataFrame with replaced values (or None if inplace=True)
Examples
Example 1: In this, we are replacing negative numbers with 0 using the other parameter.
import pandas as pd
df = pd.DataFrame({'A': [1, -2, 3], 'B': [-1, 5, -6]})
res = df.where(df > 0, other=0)
print(res)
Output
A B 0 1 0 1 0 5 2 3 0
Explanation: where(df > 0, other=0) keeps values > 0 and replaces negatives with 0, leaving positives unchanged.
Example 2: In this, we are modifying the DataFrame directly to replace values ≤ 0 with NaN.
import pandas as pd
df = pd.DataFrame({'A': [2, -3, 4], 'B': [-5, 6, -7]})
df.where(df > 0, inplace=True)
print(df)
Output
A B 0 2.0 NaN 1 NaN 6.0 2 4.0 NaN
Explanation: inplace=True applies the change directly to df without needing to assign it to a new variable.
Example 3: In this, we are applying a condition using a Series instead of the full DataFrame.
import pandas as pd
df = pd.DataFrame({'Score': [45, 85, 60, 30]})
condition = df['Score'] >= 50
res = df.where(condition, other='Fail')
print(res)
Output
Score 0 Fail 1 85 2 60 3 Fail
Explanation: Only scores ≥ 50 are kept and others are replaced with 'Fail'.
Related article: DataFrame