Skip to content

Conversation

@zhaizhibo
Copy link
Contributor

Descriptions of the changes in this PR:

Motivation

The current synchronous read-ahead implementation in BookKeeper has several limitations:

  1. Inefficient for sequential reads: Since reads are serialized within a ledger, sync read-ahead provides no benefit for sequential read patterns.
  2. Read amplification: When read cache hit ratio falls below (n-1)/n (where n is read ahead size), we experience read amplification as each cache miss triggers a full batch read from disk, wasting I/O bandwidth.
  3. RocksDB bottleneck: Cache misses require RocksDB lookups for entry locations, which will become a performance bottleneck.

Changes

This PR proposes improvements to the miss cache read:

  1. rely on OS page cache for read-ahead patterns instead of application-level batching.
  2. when reading an entry, additionally read the next 20 bytes (containing ledgerId and entryId), caching the next location to reduce RocksDB queries.
  3. particularly benefits sequential read patterns (common in Pulsar).
@zhaizhibo
Copy link
Contributor Author

related #3085

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant