Skip to content

ENH Add zero division handling to cohen_kappa_score#31172

Merged
StefanieSenger merged 41 commits intoscikit-learn:mainfrom
StefanieSenger:undefined_cohen_kappa_score
Jan 28, 2026
Merged

ENH Add zero division handling to cohen_kappa_score#31172
StefanieSenger merged 41 commits intoscikit-learn:mainfrom
StefanieSenger:undefined_cohen_kappa_score

Conversation

@StefanieSenger
Copy link
Member

Reference Issues/PRs

towards #29048

What does this implement/fix? Explain your changes.

This PR adds a replace_undefined_by param to cohen_kappa_score to deal with cases of division by zero.
Also adds tests.

@github-actions
Copy link

github-actions bot commented Apr 10, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: ed62103. Link to the linter CI: here

Comment on lines 894 to 898
msg = (
"`y2` does not contain any label that is also both present in `y1` and in "
"`labels`. cohen_kappa_score is undefined and set to the value defined in "
"the `replace_undefined_by` param, which defaults to 0.0."
)
Copy link
Member Author

@StefanieSenger StefanieSenger Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my thinking, this warning message also covers cases where both y1 and y2 are empty (as in "test case: empty inputs"). At least it would motivate people to inspect their data and then they might find out if their data is empty.

However, we could also branch here and make a separate warning message for empty inputs.

"`y1` and `y2` only have one label in common that is also in `labels`. "
"cohen_kappa_score is undefined and set to the value defined in the "
"`replace_undefined_by` param, which defaults to 0.0."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my thinking, this message also fits the cases when y1 and y2 only deal with one label (as in "test case: both inputs only have one label").

@StefanieSenger StefanieSenger added this to the 1.7 milestone Apr 10, 2025
Comment on lines 903 to 908
mgs_changing_default = (
"The default return value of `cohen_kappa_score` in case of a division "
"by zero has been deprecated in 1.7 and will be changed to 0.0 in version "
"1.9. Set `replace_undefined_by=0.0` to use the new default and to silence "
"this Warning."
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would pick 0.0 as a future default (instead of -1.0 which is the worst score), because it is the least expressive of the scores, representing matching labels by chance.

If users would use cohen_kappa_score as part of their custom metric, that calculates the mean over several cohen_kappa_scores, 0.0 would be a neutral element like the "ignore" option that we have talked about in this comment: #29048 (comment)

Copy link
Member

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @StefanieSenger!

Overall, I think the replace_undefined_by parameter behaves as expected, as discussed in #29048 (comment).

I just have a few minor suggestions—otherwise, LGTM!

Comment on lines 929 to 930
@pytest.mark.parametrize("replace_undefined_by", [0.0, np.nan])
def test_cohen_kappa_zero_division(replace_undefined_by):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test function looks good to me overall—my comment is just a minor nitpick:

Suggested change
@pytest.mark.parametrize("replace_undefined_by", [0.0, np.nan])
def test_cohen_kappa_zero_division(replace_undefined_by):
@pytest.mark.parametrize(
"test_case",
[
([], [], None, None, None),
([1] * 5 + [2] * 5, [3] * 10, [1, 2], None, None),
([3] * 10, [3] * 10, None, None, None),
([1] * 5 + [2] * 5, [3] * 10, [1, 2], "linear", None),
],
)
@pytest.mark.parametrize("replace_undefined_by", [0.0, np.nan])
def test_cohen_kappa_zero_division(replace_undefined_by):

This way, we could also simplify the unpacking slightly:

    y1, y2, labels, weights, sample_weight = test_case
    y_1, y2 = np.array(y1), np.array(y2)

and only need one assert:

    assert check_equal(
        cohen_kappa_score(
            y1,
            y2,
            labels=labels,
            weights=weights,
            replace_undefined_by=replace_undefined_by,
        ),
        replace_undefined_by,
    )

Totally optional though—happy for you to resolve this if you’d prefer to keep it as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion.

I have made this test leaner, I couldn't do without commenting every single test case though. 🤷 To help the next person looking at this see why these test cases trigger a division by zero.

Copy link
Member Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your review, @virchan!
That was pretty helpful and I have applied all your suggestions.

Copy link
Member

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @StefanieSenger!

@adrinjalali, would you like to have a look?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

@StefanieSenger
Copy link
Member Author

Thank you for reviewing, @adrinjalali! I have addressed your suggestions. Would you have another look?

@adrinjalali
Copy link
Member

The only thing now here that I don't find natural is that the default value of the parameter would become 0 after the deprecation cycle. I think the default value should be np.nan to make sure users actually notice it's undefined, and if they want to make it something else, they can.

Having 0 as the default value for undefined seems a bit odd to me.

Not sure what @glemaitre thinks about this.

@StefanieSenger
Copy link
Member Author

I'd be happy for reviews, maybe @adrinjalali would like to have a look?

@adrinjalali adrinjalali added this to Labs Dec 10, 2025
@adrinjalali adrinjalali moved this to Todo in Labs Dec 10, 2025
@StefanieSenger StefanieSenger self-assigned this Dec 11, 2025
@StefanieSenger StefanieSenger moved this from Todo to In progress in Labs Dec 11, 2025
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit lost as which comments are relevant from the review history @StefanieSenger , please let me know which discussions are open.

@StefanieSenger
Copy link
Member Author

StefanieSenger commented Dec 15, 2025

I'm a bit lost as which comments are relevant from the review history @StefanieSenger , please let me know which discussions are open.

Only this one (#31172 (comment)), @adrinjalali.

Edit: I answered to your and @jeremiedbb's suggestions here. Please let me know what you think.

@adrinjalali adrinjalali moved this from In progress to In progress - High Priority in Labs Dec 15, 2025
@StefanieSenger
Copy link
Member Author

After pulling from main, the tests fail. I will investigate, but on another day. Maybe a newly added test.

@StefanieSenger
Copy link
Member Author

After pulling from main, the tests fail. I will investigate, but on another day. Maybe a newly added test.

Yes, it came from #32549, where a new handling for empty inputs was added. I have removed the obsolete test case from here.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks !

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point to improve the warning message, otherwise LGTM. Feel free to merge if CI is green after fixing the message.

msg_zero_division = (
"`y1`, `y2` and `labels` have only one label in common. "
"`cohen_kappa_score` is undefined and set to the value defined by the "
"`replace_undefined_by` param, which defaults to `np.nan`."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

StefanieSenger and others added 2 commits January 28, 2026 11:26
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
@StefanieSenger
Copy link
Member Author

Thanks reviewing, @virchan, @jeremiedbb and @adrinjalali. I'm happy about the result.
I've enabled auto-merge.

@StefanieSenger StefanieSenger enabled auto-merge (squash) January 28, 2026 10:32
@StefanieSenger StefanieSenger merged commit be7ec61 into scikit-learn:main Jan 28, 2026
37 checks passed
@github-project-automation github-project-automation bot moved this from In progress - High Priority to Done in Labs Jan 28, 2026
@StefanieSenger StefanieSenger deleted the undefined_cohen_kappa_score branch January 28, 2026 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

6 participants