Add Lucene segment-level fields stats#111123
Conversation
8126d91 to
6d02fc3
Compare
DaveCTurner
left a comment
There was a problem hiding this comment.
Looking forward to having these stats available! I left a couple of comments.
| builder.field(Fields.TOTAL_SEGMENT_FIELDS, totalSegments); | ||
| builder.field(Fields.AVERAGE_FIELDS_PER_SEGMENT, totalSegments == 0 ? 0 : totalSegmentFields / totalSegments); |
There was a problem hiding this comment.
Having the average per node is nice but could we also include the raw counts in the output so we can aggregate this across nodes more easily?
There was a problem hiding this comment.
Thanks David. Yes, we send these raw stats between nodes and return only the average in the rest response, as including the number of segments in the mappings section doesn't make sense.
There was a problem hiding this comment.
including the number of segments in the mappings section doesn't make sense
Although I see what you mean, I disagree that this is a good reason to omit it. The number of segments might not be available elsewhere (hence why we've added it to NodeMappingStats) or at least there may be some variance between the number used in this computation and the one exposed elsewhere. Let's have the raw figures here as well as the average please.
server/src/main/java/org/elasticsearch/index/mapper/NodeMappingStats.java
Outdated
Show resolved
Hide resolved
…gStats.java Co-authored-by: David Turner <david.turner@elastic.co>
|
Hi @dnhatn, I've created a changelog YAML for you. |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Overall change looks ok but I'd rather see the raw values in the output as well as the computed average (see this comment).
andreidan
left a comment
There was a problem hiding this comment.
Thanks for working on this Nhat. Very nice to have these stats available !
Should we also update the docs (node-stats.asciidoc) ?
LGTM, pending David's suggestion to expose the raw values as well
|
@DaveCTurner @andreidan Thanks for reviewing :). |
@andreidan ++ I updated it in 9eb89c9. |
The `search_shards` API is not available in serverless. This PR replaces its usage in the newly added test with the `search` API with profiling. Relates #111123
Previously, we returned the number of segments and the total number of fields in those segments in NodeMappingStats (see #111123). However, the total number of fields returned in that PR might be very inaccurate for indices having large mappings but only a small number of actual fields. This change returns a more accurate total number of fields using the Lucene FieldInfos from those segments. Since we need to acquire a searcher to compute this stats, we opt to compute it after a shard is refreshed and cache the result. Relates #111123
Previously, we returned the number of segments and the total number of fields in those segments in NodeMappingStats (see elastic#111123). However, the total number of fields returned in that PR might be very inaccurate for indices having large mappings but only a small number of actual fields. This change returns a more accurate total number of fields using the Lucene FieldInfos from those segments. Since we need to acquire a searcher to compute this stats, we opt to compute it after a shard is refreshed and cache the result. Relates elastic#111123
This change returns the total number of fields at the segment level, allowing for a more accurate estimate of the memory used by Lucene. The new estimate is expected to be closer to the actual memory usage than the current estimate using the index-level field count, due to the non-trivial overhead incurred by each Lucene segment. Two new fields are introduced: total_segment_fields, which represents the total number of fields at the segment level, and average_fields_per_segment. The overhead per field in segments with fewer fields is larger than in segments with many fields.