Panama vector implementation of codePointCount#140693
Panama vector implementation of codePointCount#140693parkertimmins merged 12 commits intoelastic:mainfrom
Conversation
|
Hi @parkertimmins, I've created a changelog YAML for you. |
|
Here are the results from the attached benchmark. (Edit: This is without the short string fallback added in the most recent commit.) There's some speedup for longer strings, but some slowdown for shorter strings. Perhaps we should use Lucene's UnicodeUtil if length is below some threshold. |
|
Reran the benchmarks, but with fallback to Lucene's version if byte length is below 16: |
|
The latest variant with the fallback looks much better to me. Seems like a good improvement to me. |
|
Hi @parkertimmins, I've updated the changelog YAML for you. |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Looks like it's worth using UnicodeUtil's scalar logic for short strings, SWAR for medium, SIMD for long: This is based on which method had the highest throughput averaged over ascii and unicode workloads. Not very scientific since the results are just from my machine, but should be faster than the existing alternatives. Also, these cases do come with a penalty of some additional branches, but the final results look fine. Unlike the final results, the code used to make the following results to not branch on the lenght, eg they always use only scalar, swar, or simd logic. |
|
And the final results, where SIMD falls back to SWAR if below 54, and scalar if below 12: |
…-tests * upstream/main: (104 commits) Partition time-series source (elastic#140475) Mute org.elasticsearch.xpack.esql.heap_attack.HeapAttackSubqueryIT testManyRandomKeywordFieldsInSubqueryIntermediateResultsWithSortManyFields elastic#141083 Reindex relocation: skip nodes marked for shutdown (elastic#141044) Make fails on fixture caching not fail image building (elastic#140959) Add multi-project tests for get and list reindex (elastic#140980) Painless docs overhaul (reference) (elastic#137211) Panama vector implementation of codePointCount (elastic#140693) Enable PromQL in release builds (elastic#140808) Update rest-api-spec for Jina embedding task (elastic#140696) [CI] ShardSearchPhaseAPMMetricsTests testUniformCanMatchMetricAttributesWhenPlentyOfDocumentsInIndex failed (elastic#140848) Combine hash computation with bloom filter writes/reads (elastic#140969) Refactor posting iterators to provide more information (elastic#141058) Wait for cluster to recover to yellow before checking index health (elastic#141057) (elastic#141065) Fix repo analysis read count assertions (elastic#140994) Fixed a bug in logsdb rolling upgrade sereverless tests involving par… (elastic#141022) Fix readiness edge case on startup (elastic#140791) PromQL: fix quantile function (elastic#141033) ignore `mmr` command for check (in development) (elastic#140981) Use Double.compare to compare doubles in tdigest.Sort (elastic#141049) Migrate third party module tests using legacy test clusters framework (elastic#140991) ...
Add Panama SIMD implementation of codePointCount. Keep SWAR version from #140388 as fallback if SIMD not available. This results in a very large speedup on long strings, for example those over 100 bytes. Lucene's UnicodeUtil.codePointCount remains faster for small strings, so continue to use this version if byte length is below a threshold.
Fixes #140567