FIX: Set levels explicitly to plot all classes in distinct colors in DecisionBoundaryDisplay#33300
Conversation
|
Repeating the visual check from #32867 (review) Codeimport numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.inspection import DecisionBoundaryDisplay
data = np.array(
[
[-1, -1],
[-2, -1],
[1, 1],
[2, 1],
[2, 2],
[3, 2],
[3, 3],
[4, 3],
[4, 4],
[5, 4],
[5, 5],
]
)
# target = np.asarray([str(i) for i in range(11)])
target = np.arange(11)
clf = LogisticRegression().fit(data, target)
cmap = "gist_rainbow"
_, axes = plt.subplots(nrows=2, ncols=3, figsize=(12, 8), constrained_layout=True)
for plot_method_idx, plot_method in enumerate(["contourf", "contour"]):
for response_method_idx, response_method in enumerate(
["predict_proba", "decision_function", "predict"]
):
ax = axes[plot_method_idx, response_method_idx]
display = DecisionBoundaryDisplay.from_estimator(
clf,
data,
multiclass_colors=cmap,
response_method=response_method,
plot_method=plot_method,
ax=ax,
alpha=0.5,
)
ax.scatter(
data[:, 0],
data[:, 1],
c=target.astype(int),
edgecolors="black",
cmap=cmap,
)
if isinstance(display.surface_, list):
levels = len(display.surface_[0].levels)
else:
levels = len(display.surface_.levels)
ax.set_title(f"plot_method={plot_method}\nresponse_method={response_method}\nlevels={levels}")
|
|
I can confirm that this works in my use-case and fixes the mentioned issue. Thank you very much! |
| # if hasattr(disp.surface_, "levels"): | ||
| # assert len(disp.surface_.levels) >= disp.n_classes | ||
|
|
||
| @pytest.mark.parametrize("y", [np.arange(6), [str(i) for i in np.arange(6)]]) |
There was a problem hiding this comment.
The title of #32866 implies that we need more than 7 distinct classes to reproduce the original problem: have you checked that this version of the test would actually fail on main?
There was a problem hiding this comment.
Yes, this test now covers what I noticed here #32867 (comment) (and also why I removed the two lines above this test):
Even with 7 or fewer classes, the default levels don't necessarily match the classes, because they will be created as an evenly spaced array.
So we don't need to check the length of levels, but rather that the numbers actually correspond to our classes (or -0.5 and +0.5, as in the contourf case, though I have to admit that I'm not sure how @leweex95 figured that out.).
I haven't found a way to check which colors are actually visible in the plot (without actually looking at the plots), so checking the exact match of the levels is the best approach I could think of.
(I'm getting the faint suspicion that these plotting functions were not specifically designed with the multiclass-classification use case in mind.)
There was a problem hiding this comment.
The multiclass support was indeed incrementally added and we weren't careful enough in our review and testing process.
There was a problem hiding this comment.
I hope this didn't come across wrong, I rather meant the matplotlib functions in the first place. The levels parameter is not as intuitive as it seems, and the docs are not very extensive on it. And I couldn't find any multiclass-classification examples using contour in the matplotlib examples either, so this was very hard to detect.
There was a problem hiding this comment.
. The levels parameter is not as intuitive as it seems, and the docs are not very extensive on it.
Agreed, digging into the code, you can see auto determination of levels is quite complex
matplotlib/matplotlib#30996 (I opened an issue about it)
There was a problem hiding this comment.
I haven't found a way to check which colors are actually visible
I think you can add something like this to your test:
if plot_method == "contour":
colours = [collection.get_edgecolor() for collection in disp.ax_.collections]
elif plot_method == "contourf":
colours = [collection.get_facecolor() for collection in disp.ax_.collections]this will give you the rgba values of each level.
It's a bit of a maze but contour/f returns QuadContourSet class -> base ContourSet -> bases ContourLabeler, Collection
Should be a quick PR to just add this to the end of the test. You can then match it with tab10, the default cmap.
sklearn/inspection/_plot/tests/test_boundary_decision_display.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
|
Thank you everyone! |


Reference Issues/PRs
Fixes #32866 (supersedes #32867)
What does this implement/fix? Explain your changes.
Continuing the proposed fix from #32867, this sets the
levelsparameter explicitly to the unique target values forcontourand extends them forcontourfwithresponse_method=predict(the othercontourfcases are handles differently by plotting every class on a different surface, so they're not affected by this bug). This ensures, that all classes (and class boundaries, respectively) are displayed in different colors.Note that in #33015 we noticed that this can also occur when
n_classes < 7. This happens because the default values for the levels don't necessarily correspond to the target classes. I used the data from the example in #33015 for the non-regression test here.@ogrisel @lucyleeow @ThexXTURBOXx @leweex95
AI usage disclosure
I used AI assistance for:
Any other comments?