Skip to content

[SIMD] Use fixed width native types for better Java interoperability#138429

Merged
ldematte merged 7 commits intoelastic:mainfrom
ldematte:simd/fixed-width-native-types
Nov 24, 2025
Merged

[SIMD] Use fixed width native types for better Java interoperability#138429
ldematte merged 7 commits intoelastic:mainfrom
ldematte:simd/fixed-width-native-types

Conversation

@ldematte
Copy link
Contributor

@ldematte ldematte commented Nov 21, 2025

The C and C++ standards do not prescribe a fixed size for integral and floating-point primitive data types (e.g. int or float); those depends on the platform (compiler + OS + architecture).
Java, on the other end, has a fixed and precise width for each primitive type. Getting those to match is important for interoperability between Java and the native libraries via FFI.

The newest C/C++ standards define new types with explicit width, e.g. int8_t or float32_t; this PR updates the function definitions of the simdvec native library to use these types.

Relates to: #139057

@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0 labels Nov 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, yes, yes. Thank you.

@ChrisHegarty
Copy link
Contributor

I assume that we'll bump the native lib version in this PR too.

@ldematte
Copy link
Contributor Author

I assume that we'll bump the native lib version in this PR too

I'm torn; yes, I think it's worth it. At least, it will help us testing it going through CI.

@ldematte
Copy link
Contributor Author

ldematte commented Nov 21, 2025

float32_t is part of C++23, but not of C23 unfortunately; clang on ARM seems to not care and defines it anyway, but on x64 does not.
On the other hand, we should be OK with float: while it's true that float, double, and long double, have very lenient requirements, those are actually OK in the case of float - it is basically constrained to be 32 bits. See C23 Standard - E [Implementation limits]
I've reverted the float32_t changes for now, but I'll keep investigating for a bit before surrendering :)

@ldematte ldematte added the test-arm Pull Requests that should be tested against arm agents label Nov 24, 2025
@ldematte ldematte requested a review from a team as a code owner November 24, 2025 09:18
@ldematte
Copy link
Contributor Author

OK, I have re-introduced the fixed 32-bit float type, but this time it's our own type defined in the best possible way:

  • std::float32_t if we are compiling C++ code with a C++23 standard compile
  • _Float32 if we are compiling C code with a C23 standard compile
  • otherwise float, but we check it is 32-bit wide
@ldematte
Copy link
Contributor Author

I've tested this with a variety of compilers: clang19, clang17, gcc12 and gcc13 (supporting the 23 standards and not), and it seems to work well. We use clang18/17 atm, and that is well covered.

@ldematte
Copy link
Contributor Author

@ChrisHegarty let me know if still LGTM - I have addressed the float situation and bumped versions.

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SLGTM

@ldematte ldematte merged commit eba1e2a into elastic:main Nov 24, 2025
40 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Nov 24, 2025
…-json

* upstream/main: (247 commits)
  Mute org.elasticsearch.xpack.inference.integration.SemanticTextIndexOptionsIT testValidateIndexOptionsWithBasicLicense elastic#138513
  Mute org.elasticsearch.xpack.esql.heap_attack.HeapAttackLookupJoinIT testLookupExplosionBigString elastic#138510
  This shouldn't be zero (elastic#138501)
  sum of empty histogram is now null (elastic#138378)
  Test ES|QL bfloat16 support (elastic#138499)
  Fix exception handling in S3 `compareAndExchangeRegister` (elastic#138488)
  Mute org.elasticsearch.xpack.exponentialhistogram.ExponentialHistogramFieldMapperTests testFormattedDocValues elastic#138504
  Mute org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT test {yaml=ingest_geoip/60_ip_location_databases/Test adding, getting, and removing ip location databases} elastic#138502
  ESQL: Refactor HeapAttackIT (elastic#138432)
  [Inference API] Add ElasticInferenceServiceDenseTextEmbeddingsServiceSettings to InferenceNamedWriteablesProvider (elastic#138484)
  Store split indices (elastic#138396)
  ES|QL Update CHUNK to support chunking_settings as optional argument (elastic#138123)
  Extract common blob-update logic in `S3HttpHandler` (elastic#138490)
  Cleanup esql request building api (elastic#138398)
  Round sum and avg in exponential_histogram CSV tests (elastic#138472)
  ESQL: load exponential_histogram total count as double instead of long (elastic#138417)
  [SIMD] Use fixed width native types for better Java interoperability (elastic#138429)
  Do not use Min or Max as Top's surrogate when there is an outputField (elastic#138380)
  ES|QL: Fix generative tests (elastic#138478)
  Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesEisChatCompletion_DoesNotRemoveEndpointWhenNoLongerAuthorized elastic#138480
  ...
afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Nov 26, 2025
…lastic#138429)

The C and C++ standards do not prescribe a fixed size for integral and floating-point primitive data types (e.g. int or float); those depends on the platform (compiler + OS + architecture).
Java, on the other end, has a fixed and precise width for each primitive type. Getting those to match is important for interoperability between Java and the native libraries via FFI.

The newest C/C++ standards define new types with explicit width, e.g. int8_t or float32_t; this PR updates the function definitions of the simdvec native library to use these types.
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025
…lastic#138429)

The C and C++ standards do not prescribe a fixed size for integral and floating-point primitive data types (e.g. int or float); those depends on the platform (compiler + OS + architecture).
Java, on the other end, has a fixed and precise width for each primitive type. Getting those to match is important for interoperability between Java and the native libraries via FFI.

The newest C/C++ standards define new types with explicit width, e.g. int8_t or float32_t; this PR updates the function definitions of the simdvec native library to use these types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>refactoring :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-arm Pull Requests that should be tested against arm agents v9.3.0

3 participants