Allow fast blob-cache introspection by shard-id#138282
Allow fast blob-cache introspection by shard-id#138282henningandersen merged 8 commits intoelastic:mainfrom
Conversation
By changing the key-mapping to be two-layer, shard-id, then region-key, we allow introspection by shard-id to be much faster, exemplified in this PR through both removeFromCache and forceEvict(shardId, predicate) options being added. This can be utilized for more purposes like pushing down data for shards as it ages or moves out of the node - or removing data entirely when an index is deleted. It may also remove the need for the async force-evict mechanism.
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
|
Hi @henningandersen, I've created a changelog YAML for you. |
andreidan
left a comment
There was a problem hiding this comment.
Thanks for working on this Henning. This generally LGTM with a couple concerns.
I left a small suggestion and want to mention this has a memory usage implication as for 100K shards we'll now create 100K CHMs.
The impact of the empty CHMs in this case is a few tens of MiB so perhaps nothing to be too worried about, however I'm a bit concerned on how they grow internally - load factor is 0.75 I believe and they grow in powers of 2 i.e. say we have 24 entries for shard0, capacity is 32, when we add one more entry we'll have 25 entries and capacity 64 (i.e. 39 empty slots) - this could add a few more tens of MiB in memory usage if we're unlucky for 100K shards. Large shards will have many regions so these maps could be quite large.
Shall we test this in a many shards environment? Or is my math a bit off here?
| ConcurrentHashMap<RegionKey<KeyType>, LFUCacheEntry> map = keyMapping.get(shard); | ||
| if (map != null) { | ||
| boolean removed = map.remove(entry.chunk.regionKey, entry); | ||
| if (map.isEmpty()) { |
There was a problem hiding this comment.
there's a race condition here I think?
i.e. the map might not be empty by the time we move past this if-check
However, the computeIfPresent already does the "is empty" check. Perhaps this method can be simplified to something like:
private boolean removeKeyMappingForEntry(LFUCacheEntry entry, ShardId shard) {
boolean[] removed = {false};
keyMapping.computeIfPresent(shard, (k, innerMap) -> {
removed[0] = innerMap.remove(entry.chunk.regionKey, entry);
return innerMap.isEmpty() ? null : innerMap;
});
return removed[0];
}
This way we're accessing the outer map just once and avoiding a race condition on the isEmpty call.
There was a problem hiding this comment.
I think I prefer the original code proposed in the PR: it checks that no entry was concurrently inserted in the map in the computeIfPresent remapping function and that has no side effect if it is called/reapplied multiple times.
There was a problem hiding this comment.
I do think there is a race with the get method, which adds entries to the map outside the CHM lock.
Let me see if I can repair it.
I was considering a ConcurrentSkipListMap approach too, but worry slightly about the lookup being more complex.
...ugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java
Show resolved
Hide resolved
...ugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java
Outdated
Show resolved
Hide resolved
...ugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java
Outdated
Show resolved
Hide resolved
| ConcurrentHashMap<RegionKey<KeyType>, LFUCacheEntry> map = keyMapping.get(shard); | ||
| if (map != null) { | ||
| boolean removed = map.remove(entry.chunk.regionKey, entry); | ||
| if (map.isEmpty()) { |
There was a problem hiding this comment.
I think I prefer the original code proposed in the PR: it checks that no entry was concurrently inserted in the map in the computeIfPresent remapping function and that has no side effect if it is called/reapplied multiple times.
My intution was that it is ok, our shard level overhead is much bigger. I suppose once we get to lazy shards, we could see it change, but just the I think the waste in CHM is not too bad, if we have 25 entries, we spend much more RAM on the actual cache entries (~40 bytes) than the waste here (more like 8b). |
I think the power of 2 and load factor happens at any scale. While we may be unlucky for one shard that it hits the worst case, we could also be unlucky with the original map that it just crosses the 2x boundary, leading to a similar waste amount overall. With the new approach, we should see the worst case waste much less, since it is very unlikely that we will have 1000 shards that all exhibit the worst case. Whereas a node growing to a worst case just above a 2x boundary seems more likely. So I'd think that if we take out the constant overhead (your prior sentence), this 2x/load factor overhead would be less on the p99 with the new approach. |
| map.computeIfAbsent(key2, function); | ||
| return map; | ||
| }); | ||
| return inner.get(key2); |
There was a problem hiding this comment.
Can't we have another thread removing key2 on the inner map between the compute(..) and inner.get(key2) instructions?
If so computeIfAbsent might return a null Value, I think this can cause NPEs later?
There was a problem hiding this comment.
Thanks, you are absolutely right, fixed in fe1c2f9
| } | ||
| } | ||
|
|
||
| void forEach(BiConsumer<Key2, Value> consumer) { |
There was a problem hiding this comment.
can we make this private?
There was a problem hiding this comment.
We cannot, it is used in forceEvict here (an unchanged line).
|
Thanks Tanguy and Andrei! |
Stateless changes for elastic#138282
Stateless changes for elastic#138282
By changing the key-mapping to be two-layer, shard-id, then region-key, we allow introspection by shard-id to be much faster, exemplified in this PR through both removeFromCache and forceEvict(shardId, predicate) options being added. This can be utilized for more purposes like pushing down data for shards as it ages or moves out of the node - or removing data entirely when an index is deleted. It may also remove the need for the async force-evict mechanism.