fix(node): gate GET /ipfs/{cid} on reachable allowed-set, not deny-set (#126)#133
Conversation
Gitlawb#126) The IPFS visibility gate used withheld_blob_oids (a deny-set enumerating only reachable blobs), so a dangling/unreachable blob was absent from the set and served in cleartext to anonymous callers. Flip to an allowed-set (allowed_blob_set_for_caller) that enumerates reachable blobs the caller may read: a dangling blob has no path, is never in the set, and 404s.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR changes ChangesFail-closed allowed-set gating for GET /ipfs/{cid}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/ipfs.rs`:
- Around line 120-164: The path-scoped branch in the IPFS read flow is doing the
expensive allowed-blob walk before confirming the CID exists in the repo, which
causes unnecessary full-history git walks on misses. In the `read_object` path
inside `ipfs.rs`, move the `store::read_object` existence/type check ahead of
`allowed_blob_set_for_caller` so you only call the walk after a blob hit for
that repo. Keep the existing `path_scoped`, `allowed_memo`, and
`allowed_blob_set_for_caller` logic, but gate the spawn_blocking walk on a
confirmed object match to avoid anonymous random-CID misses triggering repo-wide
scans.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fbacedd5-03bf-4d1d-aaf5-6be3c677b051
📒 Files selected for processing (3)
crates/gitlawb-node/src/api/ipfs.rscrates/gitlawb-node/src/git/visibility_pack.rscrates/gitlawb-node/src/test_support.rs
Move store::read_object before the allowed_blob_set_for_caller spawn_blocking call so random-CID spray against repos with path-scoped rules cannot trigger full-history git walks on repos that don't carry the object.
beardthelion
left a comment
There was a problem hiding this comment.
The allow-set flip is the right fix for #126 and it is fail-closed. I built the merged state and ran it: dangling and walk-error cases 404 for every caller class (anon, listed reader, owner), replicable_blob_set is behavior-identical after delegating to the new helper, and the existence-before-walk reorder addresses the earlier perf note. One mechanical blocker before this can merge, plus one divergence worth its own issue (already filed).
Findings
-
[P3] Run
cargo fmt,ipfs.rsfailsrustfmt --check(blocks the format gate)
crates/gitlawb-node/src/api/ipfs.rs:145
Theallowed_blob_set_for_caller(...)call (line ~145) and thex-content-cidHeaderValue::from_str(...)line (~180) are not rustfmt-formatted, so the format check fails.cargo fmtfixes both. This is the only change required to merge. -
[P2] Tree/commit CIDs still disclose a withheld subtree's structure (follow-up, not a blocker here)
crates/gitlawb-node/src/api/ipfs.rs:139
Under a path-scoped rule the gate applies only toobj_type == "blob", so a tree/commit/tag object is served unconditionally and its raw body hands an anonymous caller every child filename and OID of a withheld subtree. This is pre-existing (the old deny-set never held trees either) and explicitly out of #126's scope, so it should not block this PR. Butget_tree(api/repos.rs:444, the N3 fix) deliberately protects exactly this, so the two read surfaces over the same data disagree. Tracked separately in #135. -
[P3] Remove the now-dead
withheld_blob_oidsor note why it stays
crates/gitlawb-node/src/git/visibility_pack.rs:232
After this PR nothing in production callswithheld_blob_oids; only its own tests reference it. Because it ispub, the compiler and clippy will not flag it. Either delete it (and its tests) or add a line on why it is retained.
Notes (non-blocking): the existence-check-before-walk reorder means an existing-but-withheld CID pays the walk while an absent CID returns immediately, so latency weakly signals object existence; both still 404, and this is the perf win from the earlier review, so I would accept it. Test-wise, the KTD3 tree-serve is asserted only by status (not body contents), and there is no get_by_cid test for a listed-reader serve or an allowed-at-both-paths blob.
Nice work on the fail-closed tests. The dangling-blob coverage across all three caller classes is exactly right. Once cargo fmt is run, this is good to go from my side (note the fork CI still needs a maintainer to approve the workflow run).
…tion ✓ P3 blocker fixed: cargo fmt applied — the format gate will pass. ✓ P3 cleanup resolved: withheld_blob_oids is still used by replication code in repos.rs, so it stays. • P2 follow-up: Tree/commit disclosure tracked in Gitlawb#135 — out of scope here.
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the update. I rechecked the changed paths and found one issue that still needs to be addressed.
Findings
- [P2] Avoid reading unauthorized blob bodies before the allowed-set gate
crates/gitlawb-node/src/api/ipfs.rs:125
The reordered path now callsstore::read_objectbefore checking whether a path-scoped blob is inallowed_blob_set_for_caller, andread_objectdoes bothgit cat-file -tandgit cat-file <type>before returning. That means a caller who knows a withheld or dangling blob CID can make the node materialize the full unauthorized blob body into memory just to discard it and return 404. Please split the existence/type probe from the content read, run the allowed-set check after confirming the object is a blob, and only call the body-reading path once the blob is authorized to be served.
beardthelion
left a comment
There was a problem hiding this comment.
The split-probe reorder closes the P2. get_by_cid now type-probes with cat-file -t, runs the allowed-set check, and only reads the object body once that check passes, so a withheld or dangling blob is never materialized for a caller who can't reach it. I built the merged state and walked the caller matrix: anon, non-reader, listed-reader, and owner all behave; dangling and walk-error fail closed (404) for every class; public copies still serve.
@kevincodex1 LGTM.
jatmn
left a comment
There was a problem hiding this comment.
I found one gate issue that needs to be addressed before this is ready.
Findings
- [P2] Run rustfmt so the required format check passes
crates/gitlawb-node/src/git/store.rs:286
The current head still failscargo fmt --all -- --check; rustfmt wants to split the newobject_typereturn expression across multiple lines. The PR Checks workflow runs that exact command before clippy, and the contribution guide says CI rejects formatting failures, so this will block thefmt + clippyjob even though the focused visibility tests pass. Please runcargo fmt --alland push the formatted result.
beardthelion
left a comment
There was a problem hiding this comment.
Re-reviewed on 03ba714. The rustfmt blocker is resolved: the store.rs change is a pure line-wrap of the same expression, and fmt + clippy is green. The gate is unchanged from the head I approved, get_by_cid still gates on the reachable allowed-set with the object_type existence probe ahead of the walk, and the dangling-blob path still fails closed on the merged state. Merge from main is clean, CI green.
@kevincodex1 LGTM.
jatmn
left a comment
There was a problem hiding this comment.
Thanks for the update. I rechecked the previously discussed paths and do not see any remaining actionable issues from my side.
@kevincodex1 LGTM
Closes #126.
The
GET /ipfs/{cid}visibility gate usedwithheld_blob_oids(a deny-set enumerating only reachable blobs), so a dangling/unreachable blob — absent from the set — was served in cleartext to anonymous callers. Flipped to an allowed-set (allowed_blob_set_for_caller) that enumerates reachable blobs the caller may read: a dangling blob has no path, is never in the set, and 404s.Changes:
visibility_pack.rs: addedallowed_blob_set_for_caller, the caller-aware allowed-set.replicable_blob_setnow delegates to it.ipfs.rs:get_by_cidgates on the allowed-set instead of the deny-set; a blob passes only under path-scoped rules and if its oid is in the reachable allowed-set.test_support.rs + visibility_pack.rs: unit and integration tests cover anon, listed reader, and owner against dangling blobs.All 32 unit tests and 5 integration tests pass.
Summary by CodeRabbit
GET /ipfs/{cid}so blob content is served only when it’s reachable and explicitly permitted by repository visibility rules, particularly under path-scoped rules.