Respect flexible field access pattern in geoip and ip_location processors#138728
Respect flexible field access pattern in geoip and ip_location processors#138728joegallo merged 19 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-data-management (Team:Data Management) |
…-geoip-iplocation
…sh1293/elasticsearch into flash1293/flattened-geoip-iplocation
|
I extended this a bit - it's now also flattening the |
dakrone
left a comment
There was a problem hiding this comment.
LGTM, I left a couple of comments about typos, but nothing major. We do need to update the documentation to document how the processor behaves in the flexible field access pattern however.
| String key = entry.getKey(); | ||
| Object value = entry.getValue(); | ||
|
|
||
| // Convert location from map {lat, lon} to array [lat, lon] in flexible mode |
There was a problem hiding this comment.
Minor typo in the doc
| // Convert location from map {lat, lon} to array [lat, lon] in flexible mode | |
| // Convert location from map {lat, lon} to array [lon, lat] in flexible mode |
| /** | ||
| * Writes GeoIP data to the document. In flexible field access mode, writes individual dotted fields | ||
| * (e.g., "my.field.city", "my.field.country") instead of a single nested object. The "location" field | ||
| * is written as an array [lat, lon] for better compatibility with geo_point fields in flexible mode. |
There was a problem hiding this comment.
| * is written as an array [lat, lon] for better compatibility with geo_point fields in flexible mode. | |
| * is written as an array [lon, lat] for better compatibility with geo_point fields in flexible mode. |
| assertThat(locationArray.get(0), notNullValue()); // longitude | ||
| assertThat(locationArray.get(1), notNullValue()); // latitude |
There was a problem hiding this comment.
Is there a reason you used notNullValue here instead of the actual resolved longitude and latitude (to ensure they are in the right order)?
|
Thanks for the review @dakrone , adjusted |
| @@ -134,7 +134,7 @@ public IngestDocument execute(IngestDocument document) throws IOException { | |||
| continue; | |||
| } | |||
| if (firstOnly) { | |||
There was a problem hiding this comment.
Github isn't letting me put this comment where I want to. 😠
I think you still have to handle the dataList case -- see document.setFieldValue(targetField, dataList); below.
It's also possible that you totally discussed this already and I haven't read the appropriate description/comments, I just jumped straight into the code.
There was a problem hiding this comment.
we didn't, I added the multi-value case. Let me know whether this behavior makes sense
There was a problem hiding this comment.
|
I added a tidiness commit, I don't think it's controversial, but we can revert it if I'm wrong. 🤷 |
| Double lat = (Double) locationMap.get("lat"); | ||
| Double lon = (Double) locationMap.get("lon"); | ||
| if (lat != null && lon != null) { | ||
| value = List.of(lon, lat); // GeoJSON order: [lon, lat] |
There was a problem hiding this comment.
This is kind of an annoying thing, but this will end up resulting in the targetField.location field value being immutable -- if you look at the non-FLEXIBLE side of the house, everything is HashMaps all the way down, so that users can turn around and fuss with these things afterwards (with other ordinary processors or a script processor or whatever).
I don't necessarily think a customer is going to take advantage of this, but if somebody tried, I imagine they'd be annoyed that we've taken it away from them.
Perhaps use an ArrayList instead? You lose the nice constructor call, though, but that's nothing a quick static method couldn't fix.
There was a problem hiding this comment.
switched to arraylist
| value = List.of(lon, lat); // GeoJSON order: [lon, lat] | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
I think it might make sense to assert here that the value is not an instanceof Map. My reasoning is that although all the current databases currently return flat maps except for the location, some future PR could add a new map-valued key and then this code would (as written) very quietly start doing the wrong thing.
It seems like it would be less trappy for us to alert that future developer at dev-time as soon as they go to run some tests of their new code.
joegallo
left a comment
There was a problem hiding this comment.
Unless anything else jumps into my mind while I'm brewing tea or whatever, I think I've hit all my requests for changes in my previous comments.
Overall this PR looks pretty dang good, but I think we can make it just a little better in a few places, and I do think the dataList issue is a real one.
…-geoip-iplocation
…sh1293/elasticsearch into flash1293/flattened-geoip-iplocation
|
Thanks @joegallo , addressed your points, could you take another look? |
| Double lon = (Double) locationMap.get("lon"); | ||
| if (lat != null && lon != null) { | ||
| return newMutableLocationList(lon, lat); // GeoJSON order: [lon, lat] | ||
| } |
There was a problem hiding this comment.
So let's say the lat or lon (or both) are null. Are you okay with what happens then?
I'm on the fence about what to do about it -- part of me thinks that the null checks are making this worse, and you should just always go down the return newMutableLocationList(...) path.
There was a problem hiding this comment.
Yeah, not sure, that's probably very edge-casey
There was a problem hiding this comment.
I get that it's an edge case, but it seems like the point of the flexible access pattern is not to emit nested maps, and yet that's what the code would do if either of the things were null (at least by my read of things).
Even if it's never going to come up in practice, the code is capturing our intent of what would happen if it did, so should it:
- emit a map? (current behavior), seems contrary to the spirit of the flexible access pattern
- emit a list with one or more null elements?
- throw an exception?
There was a problem hiding this comment.
Agreed, let's go with the null elements, seems like the best behavior
|
Hi @flash1293, I've created a changelog YAML for you. |
There was a problem hiding this comment.
LGTM! Given that it's a weekend and a holiday, I'll smash that merge button for you on Monday morning assuming I don't hear otherwise from you before then. I had some trouble with the backport machinery on a different recent PR, so if the backport doesn't go through automatically, I'll handle that then, too.
💚 Backport successful
|
Currently in flexible field access mode, the geoip processor writes such an object:
results in
This is problematic because we go from dotted to subobject, which means that in subsequent processing steps, e.g. my.field.country can't be accessed anymore.
To align things, in flexible field access mode the geoip processor now writes the individual parts as individual properties:
The same applies to the ip_location processor.