Optimize IngestCtxMap construction#120833
Conversation
|
Hi @joegallo, I've created a changelog YAML for you. |
|
Pinging @elastic/es-data-management (Team:Data Management) |
Honestly, shallowly rehashing the map is misleading at best -- the maps that back an ingest document are typically deeply nested, so a shallow rehash at the top-level doesn't do much for you.
f3bd49c to
1cde31e
Compare
| Map<String, Object> source | ||
| ) { | ||
| super(new HashMap<>(source), new IngestDocMetadata(index, id, version, routing, versionType, timestamp)); | ||
| super(source, new IngestDocMetadata(index, id, version, routing, versionType, timestamp)); |
There was a problem hiding this comment.
There's no way this can burn us in any existing scripts, is there? I don't think they can ever call this constructor, so it couldn't be a problem. Just trying to think through potential problems.
There was a problem hiding this comment.
It's the ctx of a script processor, so by the time the script processor has a handle on it, it would have already been duplicated off the once (at the very very beginning new IngestService.newIngestDocument). It's not something you can call from a script processor.
I spent a fair amount of time reading through the various constructors and copy constructors of the IngestDocument and its friends, and I really do think I've covered all the bases.
There was a problem hiding this comment.
But philosophically, if it's a bug to not rehash, then what about the bug of re-hashing but only one layer deep -- these maps are almost always deeply nested, if this code had a job to do then it never actually did it.
This change is 99% tests, and 1% removing a call to
new HashMap(...)that shallowly copied an incoming map into a new map. It's certainly been convenient be able to write tests withMap.of(...)for the source of the document, but in reality the cost of re-hashing at scale at runtime is not zero (though it is small), and the idea that it's protecting us from much of anything to rehash in this way seems mistaken, especially since it's a only shallow re-hashing.