Skip to content

...#1481

Open
sbamopoulos wants to merge 1 commit into
MouseLand:mainfrom
sbamopoulos:fix/mmap-new-labeling-npy
Open

...#1481
sbamopoulos wants to merge 1 commit into
MouseLand:mainfrom
sbamopoulos:fix/mmap-new-labeling-npy

Conversation

@sbamopoulos

Copy link
Copy Markdown

Summary

Avoid loading the full new_labeling.npy array into memory on every Dask tile during the distributed relabeling step.

Problem

After stitching, distributed_eval saves a large new_labeling vector to disk and relabels the unstitched segmentation with:

dask.array.map_blocks(
    lambda block: np.load(new_labeling_path)[block],
    ...
)

np.load without mmap_mode copies the entire .npy file into each worker process for every block. On whole-slide runs this array can be multi-GB, so relabeling multiplies RAM use by the number of concurrent workers and can cause OOM or severe swapping.

Using mmap_mode='r' memory-maps the file so each block only reads the pages it needs.

np.load without mmap_mode copies the full relabeling vector into each
Dask worker for every tile. mmap_mode='r' reads only the pages needed
per block, avoiding multi-GB RAM spikes on large whole-slide runs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant