Skip to content

Add an ignoreMissing parameter to IngestDocument's removeField method#125232

Merged
joegallo merged 3 commits intoelastic:mainfrom
joegallo:ingest-document-remove-field-ignore-missing
Mar 19, 2025
Merged

Add an ignoreMissing parameter to IngestDocument's removeField method#125232
joegallo merged 3 commits intoelastic:mainfrom
joegallo:ingest-document-remove-field-ignore-missing

Conversation

@joegallo
Copy link
Contributor

And use it in the remove processor.

Related to #123891, and also this is a follow up to #124322 and #125051 (earlier nearby PRs that were laying the groundwork for this change).

Prior to this change, we had to traverse the document tree twice in the remove processor for each field that we wanted to remove: once to check whether the field existed (in the hasField call), and then once to actually remove the field (in the removeField call). This was necessary because removeField would throw an exception if the field didn't exist, so the call had to be guarded. By adding an ignoreMissing parameter to removeField we can remove the hasField-guarding and just specify that we don't care if the field doesn't exist (well, assuming ignore_missing has been set to true on the processor itself, which it typically is in the wild).

I'm labeling this as a >refactoring since there's no user-visible change in behavior, I'm just twiddling the code a bit so that it happens to be faster. On which note, this speeds up the remove processor by 30% -- I'm seeing that it's taking 290 microseconds per document rather than 413 on main (a further note: prior to #120573 it was taking 681 microseconds per document for the same benchmark).

@joegallo joegallo added :Distributed/Ingest Node Execution or management of Ingest Pipelines >refactoring Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.19.0 v9.1.0 labels Mar 19, 2025
@joegallo joegallo requested a review from parkertimmins March 19, 2025 15:12
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@joegallo joegallo added the auto-backport Automatically create backport pull requests when merged label Mar 19, 2025
@joegallo joegallo merged commit e210ea8 into elastic:main Mar 19, 2025
17 checks passed
@joegallo joegallo deleted the ingest-document-remove-field-ignore-missing branch March 19, 2025 20:55
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x
@joegallo
Copy link
Contributor Author

Screenshot 2025-03-24 at 9 24 42 AM

Here's a screenshot from the nightly benchmarks -- there's a very nice decrease in the time spent in remove processors due to #120573, but the additional contribution from this PR also sticks out. Overall we're spending about 60% less time in remove processors during this benchmark as a result of these two PRs. Not bad!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Distributed/Ingest Node Execution or management of Ingest Pipelines >refactoring Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.19.0 v9.1.0

3 participants