Skip to content

Respect flexible field access pattern in geoip and ip_location processors#138728

Merged
joegallo merged 19 commits intoelastic:mainfrom
flash1293:flash1293/flattened-geoip-iplocation
Dec 29, 2025
Merged

Respect flexible field access pattern in geoip and ip_location processors#138728
joegallo merged 19 commits intoelastic:mainfrom
flash1293:flash1293/flattened-geoip-iplocation

Conversation

@flash1293
Copy link
Contributor

@flash1293 flash1293 commented Nov 27, 2025

Currently in flexible field access mode, the geoip processor writes such an object:

{
  "geoip": {
    "field": "ip",
    "target_get": "my.field"
  }
}

results in

{
  "my.field": {
     "city": "...",
     "country": "...",
     "location": {
        "lat": "...",
        "lon": "..."
     }
  }
}

This is problematic because we go from dotted to subobject, which means that in subsequent processing steps, e.g. my.field.country can't be accessed anymore.

To align things, in flexible field access mode the geoip processor now writes the individual parts as individual properties:

{
  "my.field.city: "...",
  "my.field.country": "...",
   "my.field.location": [..., ...]
}

The same applies to the ip_location processor.

@elasticsearchmachine elasticsearchmachine added v9.3.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 27, 2025
@flash1293 flash1293 changed the title Flattened geo Nov 27, 2025
@flash1293 flash1293 marked this pull request as ready for review November 27, 2025 16:29
@flash1293 flash1293 requested a review from masseyke November 27, 2025 16:30
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Nov 27, 2025
@masseyke masseyke added the :Distributed/Ingest Node Execution or management of Ingest Pipelines label Dec 1, 2025
@elasticsearchmachine elasticsearchmachine added Team:Data Management (obsolete) DO NOT USE. This team no longer exists. and removed needs:triage Requires assignment of a team area label labels Dec 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@dakrone dakrone requested a review from jbaiera December 4, 2025 17:18
@flash1293
Copy link
Contributor Author

I extended this a bit - it's now also flattening the location geo point - this helps in some mapping constellations we have today and is also more consistent since it flattens out everything and doesn't introduce inaccessible sub-objects in any place.

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left a couple of comments about typos, but nothing major. We do need to update the documentation to document how the processor behaves in the flexible field access pattern however.

String key = entry.getKey();
Object value = entry.getValue();

// Convert location from map {lat, lon} to array [lat, lon] in flexible mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo in the doc

Suggested change
// Convert location from map {lat, lon} to array [lat, lon] in flexible mode
// Convert location from map {lat, lon} to array [lon, lat] in flexible mode
/**
* Writes GeoIP data to the document. In flexible field access mode, writes individual dotted fields
* (e.g., "my.field.city", "my.field.country") instead of a single nested object. The "location" field
* is written as an array [lat, lon] for better compatibility with geo_point fields in flexible mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* is written as an array [lat, lon] for better compatibility with geo_point fields in flexible mode.
* is written as an array [lon, lat] for better compatibility with geo_point fields in flexible mode.
Comment on lines 649 to 650
assertThat(locationArray.get(0), notNullValue()); // longitude
assertThat(locationArray.get(1), notNullValue()); // latitude
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you used notNullValue here instead of the actual resolved longitude and latitude (to ensure they are in the right order)?

@flash1293
Copy link
Contributor Author

Thanks for the review @dakrone , adjusted

@@ -134,7 +134,7 @@ public IngestDocument execute(IngestDocument document) throws IOException {
continue;
}
if (firstOnly) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github isn't letting me put this comment where I want to. 😠

I think you still have to handle the dataList case -- see document.setFieldValue(targetField, dataList); below.

It's also possible that you totally discussed this already and I haven't read the appropriate description/comments, I just jumped straight into the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we didn't, I added the multi-value case. Let me know whether this behavior makes sense

Copy link
Contributor

@joegallo joegallo Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, I'm fine with the code of the handling of the dataList case. Nice work!

I think you should probably have somebody more familiar with FLEXIBLE mode review the semantics of the way you're handling the dataList case, though, too. @dakrone would be fine, or @jbaiera, for example.

@joegallo
Copy link
Contributor

I added a tidiness commit, I don't think it's controversial, but we can revert it if I'm wrong. 🤷

Double lat = (Double) locationMap.get("lat");
Double lon = (Double) locationMap.get("lon");
if (lat != null && lon != null) {
value = List.of(lon, lat); // GeoJSON order: [lon, lat]
Copy link
Contributor

@joegallo joegallo Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of an annoying thing, but this will end up resulting in the targetField.location field value being immutable -- if you look at the non-FLEXIBLE side of the house, everything is HashMaps all the way down, so that users can turn around and fuss with these things afterwards (with other ordinary processors or a script processor or whatever).

I don't necessarily think a customer is going to take advantage of this, but if somebody tried, I imagine they'd be annoyed that we've taken it away from them.

Perhaps use an ArrayList instead? You lose the nice constructor call, though, but that's nothing a quick static method couldn't fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to arraylist

value = List.of(lon, lat); // GeoJSON order: [lon, lat]
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make sense to assert here that the value is not an instanceof Map. My reasoning is that although all the current databases currently return flat maps except for the location, some future PR could add a new map-valued key and then this code would (as written) very quietly start doing the wrong thing.

It seems like it would be less trappy for us to alert that future developer at dev-time as soon as they go to run some tests of their new code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, added

Copy link
Contributor

@joegallo joegallo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless anything else jumps into my mind while I'm brewing tea or whatever, I think I've hit all my requests for changes in my previous comments.

Overall this PR looks pretty dang good, but I think we can make it just a little better in a few places, and I do think the dataList issue is a real one.

@flash1293
Copy link
Contributor Author

Thanks @joegallo , addressed your points, could you take another look?

Double lon = (Double) locationMap.get("lon");
if (lat != null && lon != null) {
return newMutableLocationList(lon, lat); // GeoJSON order: [lon, lat]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's say the lat or lon (or both) are null. Are you okay with what happens then?

I'm on the fence about what to do about it -- part of me thinks that the null checks are making this worse, and you should just always go down the return newMutableLocationList(...) path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not sure, that's probably very edge-casey

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that it's an edge case, but it seems like the point of the flexible access pattern is not to emit nested maps, and yet that's what the code would do if either of the things were null (at least by my read of things).

Even if it's never going to come up in practice, the code is capturing our intent of what would happen if it did, so should it:

  • emit a map? (current behavior), seems contrary to the spirit of the flexible access pattern
  • emit a list with one or more null elements?
  • throw an exception?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's go with the null elements, seems like the best behavior

@joegallo joegallo added >bug auto-backport Automatically create backport pull requests when merged v9.3.0 labels Dec 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @flash1293, I've created a changelog YAML for you.

Copy link
Contributor

@joegallo joegallo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Given that it's a weekend and a holiday, I'll smash that merge button for you on Monday morning assuming I don't hear otherwise from you before then. I had some trouble with the backport machinery on a different recent PR, so if the backport doesn't go through automatically, I'll handle that then, too.

@joegallo joegallo merged commit f2e547b into elastic:main Dec 29, 2025
35 checks passed
flash1293 added a commit to flash1293/elasticsearch that referenced this pull request Dec 29, 2025
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Distributed/Ingest Node Execution or management of Ingest Pipelines external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v9.3.0 v9.4.0

5 participants