Fix allow_duplicates edge case bug in append processor#134319
Merged
joegallo merged 6 commits intoelastic:mainfrom Sep 9, 2025
Merged
Fix allow_duplicates edge case bug in append processor#134319joegallo merged 6 commits intoelastic:mainfrom
joegallo merged 6 commits intoelastic:mainfrom
Conversation
allow_duplicates: false should work the same whether there is an existing list (with some values), an empty list, or no list at all.
Collaborator
|
Pinging @elastic/es-data-management (Team:Data Management) |
Collaborator
|
Hi @joegallo, I've created a changelog YAML for you. |
samxbr
approved these changes
Sep 8, 2025
Contributor
samxbr
left a comment
There was a problem hiding this comment.
LGTM, I would agree that this is a bug.
Member
|
I also agree this is a bug, but can you describe the issue in the PR body so that it's searchable? |
seanzatzdev
approved these changes
Sep 8, 2025
This was referenced Sep 9, 2025
Collaborator
💔 Backport failed
You can use sqren/backport to manually backport by running |
joegallo
added a commit
to joegallo/elasticsearch
that referenced
this pull request
Sep 9, 2025
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 9, 2025
joegallo
added a commit
to joegallo/elasticsearch
that referenced
this pull request
Sep 9, 2025
joegallo
added a commit
to joegallo/elasticsearch
that referenced
this pull request
Sep 9, 2025
This was referenced Sep 9, 2025
joegallo
added a commit
to joegallo/elasticsearch
that referenced
this pull request
Sep 9, 2025
Contributor
Author
|
All the backport PRs are up, so I'm dropping |
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 9, 2025
#134377 is an automatically generated backport PR that was merged without CI actually having run. It turns out that there are changes in my newly added tests there that aren't compatible with the 9.1 branch. That would have shaken out if CI had actually run, but it didn't, so I'm fixing it in post. This is tangentially related to #134319.
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 9, 2025
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 9, 2025
rjernst
pushed a commit
to rjernst/elasticsearch
that referenced
this pull request
Sep 9, 2025
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 9, 2025
Kubik42
pushed a commit
to Kubik42/elasticsearch
that referenced
this pull request
Sep 9, 2025
sarog
pushed a commit
to portsbuild/elasticsearch
that referenced
this pull request
Sep 11, 2025
sarog
pushed a commit
to portsbuild/elasticsearch
that referenced
this pull request
Sep 19, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes a discrepancy in the way the
appendprocessor handles the"allow_duplicates": falseoption (note: reminder that the default for theappendprocessor is that"allow_duplicates"istrue).Specifically, the handling of the 'data to be appended' varies according to value of the 'data to be appended to'. To wit, if there are duplicates in the data to be appended (on its own, regardless of the data to be appended to) then:
null-ish), duplicates in the data to be appended will NOT be removed (<-- this is the edge case / bug).The code is pretty understandably written as if the case we're handling is for example when the data to be appended to is
["foo", "bar"]and the data to be appended is["bar", "baz"](or["bar", "baz", "bar"]), but it mistakenly shortcuts in the case of a non-existent/nulldata to be appended to and just copies the entirety of the data to be appended which means that it correctly handles["bar", "baz"]but not["bar", "baz", "bar"](and so duplicate items sneak through even thoughallow_duplicatesis set tofalse).Anyway, this PR unifies those cases so that duplicates are processed the same way regardless of the presence/absence of the data to be appended to list.
I discovered this odd behavior in the
appendprocessor'sallow_duplicateshandling while reviewing #105718, and it seemed important to handle it separately from that PR: first, to call it out as a bug, and second, (since it's a bug) to put up a PR that fixes it and can be backported to all the currently-maintained branches.Personally, I think this PR is best reviewed one commit at a time (the better to see the expected behavior, and then the fix) but you can do it how you'd like.
Oh, and one more thing... I'm of the opinion that this is 'just' a bug (and one that warrants fixing), but in full https://www.hyrumslaw.com fashion, you could argue with me that actually this is a breaking change.