Retrieve routing hash from synthetic id for translog operations by tlrx · Pull Request #140221 · elastic/elasticsearch

tlrx · 2026-01-06T16:08:22Z

In a TSDB datastream shard that uses synthetic id, the operations that are read from the translog have no routing set. They should have a routing computed from the routing hash stored as a suffix in the synthetic id.

This change fix the routing hash to be retrieved correctly and also adds a test.

Relates ES-13606

In a TSDB datastream shard that uses synthetic id, the operations that are read from the translog have no routing set. They should have a routing computed from the routing hash stored as a suffix in the synthetic id. This change fix the routing hash to be retrieved correctly and also adds a test. Relates ES-13606

elasticsearchmachine · 2026-01-06T16:08:47Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

elasticsearchmachine · 2026-01-06T16:08:47Z

Hi @tlrx, I've created a changelog YAML for you.

tlrx · 2026-01-06T16:11:50Z

...a-streams/src/internalClusterTest/java/org/elasticsearch/datastreams/TSDBSyntheticIdsIT.java

+                            assertThat(luceneSnapshot.totalOperations(), equalTo(docsIdsBySeqNo.size()));
+                            // TODO Once ES-13603 is implemented, change this to also check operations (and maybe tombstone doc too?)
+                            if (docsIdsBySeqNo.isEmpty() == false) {
+                                expectThrows(NullPointerException.class, luceneSnapshot::next);


This test will also be useful for ES-13603 , shown here as an expected issue: the stored _id field is not materialized yet and an NPE is thrown.

fcofdez

LGTM. Subtle stuff... thanks for fixing this!

...a-streams/src/internalClusterTest/java/org/elasticsearch/datastreams/TSDBSyntheticIdsIT.java

fcofdez · 2026-01-07T16:05:47Z

server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesRoutingHashFieldMapper.java

-                );
+                if (context.indexSettings().useTimeSeriesSyntheticId()) {
+                    int hash = TsidExtractingIdFieldMapper.extractRoutingHashFromSyntheticId(Uid.encodeId(context.sourceToParse().id()));
+                    routingHash = encode(hash);


I understand that this got really complex, but I find it a bit confusing that we're using two encode methods for the routing hash (synthetic and the regular one using the Uid.encode), not much that we can do... :(

Just to be sure we share the same understanding:

Base64 encoding is used to encode/decode the routing hash integer as a routing String key in several places (mostly when routing operations or parsing documents from source like here)

Base64 encoding is also used to encode the (regular or synthetic) id from byte[] array to String and vice-versa

Uid.encode and Uid.decode are used to encode the (regular or synthetic) String id into the binary data stored in Lucene (as byte[] array too in BytesRef)

Here we need to extract the routing hash integer value back from the (regular or synthetic) String id. In order to have only one "extract routing hash from id" method I reused TsidExtractingIdFieldMapper.extractRoutingHashFromSyntheticId but it means that I have to Uid.encode the String back to its Lucene represention.

Maybe it would be more readable if I only read the last 4 bytes of the id (like regular id do)?

That's my understanding too, but I would prefer to keep it as it's so we try to centralize the knowledge on how we encode the _id.

Me too. I added a comment.

fcofdez · 2026-01-07T16:13:18Z

...a-streams/src/internalClusterTest/java/org/elasticsearch/datastreams/TSDBSyntheticIdsIT.java

+                                        "Lucene document [" + expectedDocId + "] has wrong value for _ts_routing_hash field",
+                                        luceneDocument.getField(TimeSeriesRoutingHashFieldMapper.NAME).binaryValue(),
+                                        equalTo(
+                                            Uid.encodeId(


Maybe we can extract this to a helper method, feels like inception ⏳ haha

Done in 519be37

fcofdez · 2026-01-07T16:14:23Z

...a-streams/src/internalClusterTest/java/org/elasticsearch/datastreams/TSDBSyntheticIdsIT.java

+
+                            Translog.Operation operation;
+                            while ((operation = translogSnapshot.next()) != null) {
+                                if (operation instanceof Translog.Index index) {


nit: we could use the Translog.Operation#opType and a switch instead?

Done in 519be37

fcofdez · 2026-01-07T16:25:34Z

server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesRoutingHashFieldMapper.java

-                    Arrays.copyOf(Base64.getUrlDecoder().decode(context.sourceToParse().id()), 4)
-                );
+                if (context.indexSettings().useTimeSeriesSyntheticId()) {
+                    int hash = TsidExtractingIdFieldMapper.extractRoutingHashFromSyntheticId(Uid.encodeId(context.sourceToParse().id()));


shouldn't we decode the base64 here?

I don't think so. The String id is already encoded as Base64 and we encoded it using Uid.encodeId to have the binary data that would be indexed in Lucene and that the extractRoutingHashFromSyntheticId expects.

See my other comment here, I get how it can be confusing.

🤦 I was fooled by the encodeId name, which looking into the implementation does decode the base64.

fcofdez · 2026-01-07T16:38:07Z

server/src/main/java/org/elasticsearch/index/mapper/ParsedDocument.java

@@ -107,7 +107,7 @@ public static ParsedDocument deleteTombstone(
            // _id terms over tombstones also work as if a regular _id field was present.


I've noticed that we could avoid the Uid.encode in:

document.add(IdFieldMapper.syntheticIdField(id));

Since we have the uid already?

Done in 8a2c585

…est' into 2026/01/06/ES-13603-test

tlrx · 2026-01-08T11:48:46Z

Thanks Francisco!

tlrx assigned fcofdez and burqen Jan 6, 2026

tlrx added >bug :StorageEngine/TSDB You know, for Metrics v9.4.0 labels Jan 6, 2026

elasticsearchmachine added the Team:StorageEngine label Jan 6, 2026

Update docs/changelog/140221.yaml

0c9eee0

tlrx commented Jan 6, 2026

View reviewed changes

tlrx unassigned fcofdez and burqen Jan 6, 2026

tlrx requested review from burqen and fcofdez January 6, 2026 16:26

Merge branch 'main' into 2026/01/06/ES-13603-test

6b0fa38

fcofdez approved these changes Jan 7, 2026

View reviewed changes

fcofdez reviewed Jan 7, 2026

View reviewed changes

tlrx added 4 commits January 8, 2026 10:58

uid

8a2c585

uid

519be37

Merge branch 'main' into 2026/01/06/ES-13603-test

4d2b748

Merge remote-tracking branch 'refs/remotes/tlrx/2026/01/06/ES-13603-t…

c701727

…est' into 2026/01/06/ES-13603-test

tlrx merged commit b1146ee into elastic:main Jan 8, 2026
35 checks passed

tlrx deleted the 2026/01/06/ES-13603-test branch January 8, 2026 11:48

tlrx mentioned this pull request Jan 21, 2026

Enable synthetic _id randomly in TSDBPassthroughIndexingIT #138812

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieve routing hash from synthetic id for translog operations#140221

Retrieve routing hash from synthetic id for translog operations#140221
tlrx merged 7 commits intoelastic:mainfrom
tlrx:2026/01/06/ES-13603-test

tlrx commented Jan 6, 2026

elasticsearchmachine commented Jan 6, 2026

elasticsearchmachine commented Jan 6, 2026

tlrx Jan 6, 2026

fcofdez left a comment

Uh oh!

fcofdez Jan 7, 2026

tlrx Jan 8, 2026

fcofdez Jan 8, 2026

tlrx Jan 8, 2026

fcofdez Jan 7, 2026

tlrx Jan 8, 2026

fcofdez Jan 7, 2026

tlrx Jan 8, 2026

fcofdez Jan 7, 2026

tlrx Jan 8, 2026

fcofdez Jan 8, 2026

fcofdez Jan 7, 2026

tlrx Jan 8, 2026

Uh oh!

tlrx commented Jan 8, 2026

Labels

4 participants

		@@ -107,7 +107,7 @@ public static ParsedDocument deleteTombstone(
		// _id terms over tombstones also work as if a regular _id field was present.

Conversation

tlrx commented Jan 6, 2026

elasticsearchmachine commented Jan 6, 2026

elasticsearchmachine commented Jan 6, 2026

Choose a reason for hiding this comment

fcofdez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

tlrx commented Jan 8, 2026

Labels

4 participants