Conditionally force sequential reading in LuceneSyntheticSourceChangesSnapshot by martijnvg · Pull Request #128473 · elastic/elasticsearch

martijnvg · 2025-05-26T15:36:39Z

Change LuceneSyntheticSourceChangesSnapshot to force sequential stored field reading when index.code is best_compression.

In CCR benchmarks I see that relatively often we spend a lot of time compressing the same stored field block over and over again when the doc ids are not dense. It is likely when a seqno range is requested that the corresponding doc id list contains gaps. However most docids are monotonically increasing, so not sequential reading harms performance. The reason that currently we're not loading sequentially is because of the logic in StoredFieldLoader#hasSequentialDocs(...), which requires all requested docids to be in monotonically order (no gaps allowed). In the case of LuceneSyntheticSourceChangesSnapshot with stored field best compression that is too conservative.

For example: [34, 35, 36, 37, 38, 39, 40, 313, 314, 315, 316, 317, 595, 596, 597, 598, 599, 600, 601, 898, 899, 900, 901, 902, 903]

Or: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532]
(gap is after 471)

In these cases we decompress an entire stored field block for each docid. This then results in the following cpu flame graph:

Full flame graph: baseline3_cpu_profile_10.html.zip

I think it makes sense to do sequential reading in this case, given that it is very likely that many of the requested doc id ranges will contain monotonically increasing ranges. Note that the requested docids will always sort in ascending order (this happens in LuceneSyntheticSourceChangesSnapshot#transformScoreDocsToRecords(...).

…ng when index.code is best_compression. In CCR benchmarks I see that sometimes we spend a lot of time compressing the same stored field block over and over again when the doc ids are not dense. It is likely when a seqno range is requested that the corresponding doc id list contains gaps. For example: [34, 35, 36, 37, 38, 39, 40, 313, 314, 315, 316, 317, 595, 596, 597, 598, 599, 600, 601, 898, 899, 900, 901, 902, 903] Or: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532] I think it makes sense to do sequential reading in these cases, given that many of the docids are consecutive ranges.

martijnvg · 2025-05-27T08:02:33Z

Benchmarking this change using the elastic/logs track with ccr enabled shows a good improvement with ccr: https://esbench-metrics.kb.us-east-2.aws.elastic-cloud.com:9243/app/dashboards#/view/6c61280a-bd58-4c3a-8591-f56ec221c4f2?_g=h@c67153a
Left side are baseline graphs, right side contender graphs (this pr). Total time spent on reading is significantly lower with this change.

…gesSnapshot_forceSequentialReader

elasticsearchmachine · 2025-05-27T08:57:10Z

Hi @martijnvg, I've created a changelog YAML for you.

elasticsearchmachine · 2025-05-27T08:57:10Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

elasticsearchmachine · 2025-05-27T08:57:11Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

tlrx

LGTM

martijnvg · 2025-05-27T10:52:45Z

The Elasticsearch Serverless Checks failure is related to elastic/elasticsearch-serverless#3910.

…sSnapshot (elastic#128473) Change LuceneSyntheticSourceChangesSnapshot to force sequential stored field reading when index.code is best_compression. In CCR benchmarks I see that relatively often we spend a lot of time compressing the same stored field block over and over again when the doc ids are not dense. It is likely when a seqno range is requested that the corresponding doc id list contains gaps. However most docids are monotonically increasing, so not sequential reading harms performance. The reason that currently we're not loading sequentially is because of the logic in `StoredFieldLoader#hasSequentialDocs(...)`, which requires all requested docids to be in monotonically order (no gaps allowed). In the case of `LuceneSyntheticSourceChangesSnapshot` with stored field best compression that is too conservative. In practice, we end decompressing stored field blocks for each docid we need to synthetisize source for recovery. I think it makes sense to do sequential reading in this case, given that it is very likely that many of the requested doc id ranges will contain monotonically increasing ranges. Note that the requested docids will always sort in ascending order (this happens in `LuceneSyntheticSourceChangesSnapshot#transformScoreDocsToRecords(...)`.

elasticsearchmachine · 2025-05-27T11:45:39Z

💚 Backport successful

Status	Branch	Result
✅	8.19

…sSnapshot (#128473) (#128505) Change LuceneSyntheticSourceChangesSnapshot to force sequential stored field reading when index.code is best_compression. In CCR benchmarks I see that relatively often we spend a lot of time compressing the same stored field block over and over again when the doc ids are not dense. It is likely when a seqno range is requested that the corresponding doc id list contains gaps. However most docids are monotonically increasing, so not sequential reading harms performance. The reason that currently we're not loading sequentially is because of the logic in `StoredFieldLoader#hasSequentialDocs(...)`, which requires all requested docids to be in monotonically order (no gaps allowed). In the case of `LuceneSyntheticSourceChangesSnapshot` with stored field best compression that is too conservative. In practice, we end decompressing stored field blocks for each docid we need to synthetisize source for recovery. I think it makes sense to do sequential reading in this case, given that it is very likely that many of the requested doc id ranges will contain monotonically increasing ranges. Note that the requested docids will always sort in ascending order (this happens in `LuceneSyntheticSourceChangesSnapshot#transformScoreDocsToRecords(...)`.

elasticsearchmachine added the v9.1.0 label May 26, 2025

Merge remote-tracking branch 'es/main' into LuceneSyntheticSourceChan…

2152d86

…gesSnapshot_forceSequentialReader

martijnvg added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. :StorageEngine/Logs You know, for Logs >enhancement labels May 27, 2025

martijnvg marked this pull request as ready for review May 27, 2025 08:56

elasticsearchmachine added Team:StorageEngine Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. labels May 27, 2025

Update docs/changelog/128473.yaml

e46a7aa

tlrx self-requested a review May 27, 2025 09:05

fix changelog entry

e2065c4

martijnvg added the v8.19.0 label May 27, 2025

tlrx approved these changes May 27, 2025

View reviewed changes

kkrik-es approved these changes May 27, 2025

View reviewed changes

martijnvg added the auto-backport Automatically create backport pull requests when merged label May 27, 2025

martijnvg merged commit 6a4a285 into elastic:main May 27, 2025
17 of 18 checks passed

martijnvg mentioned this pull request May 27, 2025

[8.19] Conditionally force sequential reading in LuceneSyntheticSourceChangesSnapshot (#128473) #128505

Merged

martijnvg mentioned this pull request May 27, 2025

Skip indexing points for seq_no in tsdb and logsdb #128139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditionally force sequential reading in LuceneSyntheticSourceChangesSnapshot#128473

Conditionally force sequential reading in LuceneSyntheticSourceChangesSnapshot#128473
martijnvg merged 4 commits intoelastic:mainfrom
martijnvg:LuceneSyntheticSourceChangesSnapshot_forceSequentialReader

martijnvg commented May 26, 2025 •

edited

Loading

martijnvg commented May 27, 2025 •

edited

Loading

elasticsearchmachine commented May 27, 2025

elasticsearchmachine commented May 27, 2025

elasticsearchmachine commented May 27, 2025

tlrx left a comment

martijnvg commented May 27, 2025

Uh oh!

elasticsearchmachine commented May 27, 2025

Labels

4 participants

Conversation

martijnvg commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

martijnvg commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented May 27, 2025

elasticsearchmachine commented May 27, 2025

elasticsearchmachine commented May 27, 2025

tlrx left a comment

Choose a reason for hiding this comment

martijnvg commented May 27, 2025

Uh oh!

elasticsearchmachine commented May 27, 2025

💚 Backport successful

Labels

4 participants

martijnvg commented May 26, 2025 •

edited

Loading

martijnvg commented May 27, 2025 •

edited

Loading