Skip to content

fix(node): gate GET /ipfs/{cid} on reachable allowed-set, not deny-set (#126)#133

Merged
kevincodex1 merged 7 commits into
Gitlawb:mainfrom
Gravirei:main
Jul 2, 2026
Merged

fix(node): gate GET /ipfs/{cid} on reachable allowed-set, not deny-set (#126)#133
kevincodex1 merged 7 commits into
Gitlawb:mainfrom
Gravirei:main

Conversation

@Gravirei

@Gravirei Gravirei commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Closes #126.

The GET /ipfs/{cid} visibility gate used withheld_blob_oids (a deny-set enumerating only reachable blobs), so a dangling/unreachable blob — absent from the set — was served in cleartext to anonymous callers. Flipped to an allowed-set (allowed_blob_set_for_caller) that enumerates reachable blobs the caller may read: a dangling blob has no path, is never in the set, and 404s.

Changes:

  • visibility_pack.rs: added allowed_blob_set_for_caller, the caller-aware allowed-set. replicable_blob_set now delegates to it.
  • ipfs.rs: get_by_cid gates on the allowed-set instead of the deny-set; a blob passes only under path-scoped rules and if its oid is in the reachable allowed-set.
  • test_support.rs + visibility_pack.rs: unit and integration tests cover anon, listed reader, and owner against dangling blobs.

All 32 unit tests and 5 integration tests pass.

Summary by CodeRabbit

  • Bug Fixes
    • Strengthened GET /ipfs/{cid} so blob content is served only when it’s reachable and explicitly permitted by repository visibility rules, particularly under path-scoped rules.
    • Unreachable/dangling blobs are now excluded and return an opaque “not found” response (not the underlying content), even when walk/authorization evaluation fails.
  • Tests
    • Added regression coverage verifying dangling blobs fail closed for anonymous and owner-signed requests under path-based visibility rules.
Gitlawb#126)

The IPFS visibility gate used withheld_blob_oids (a deny-set enumerating
only reachable blobs), so a dangling/unreachable blob was absent from the
set and served in cleartext to anonymous callers. Flip to an allowed-set
(allowed_blob_set_for_caller) that enumerates reachable blobs the caller
may read: a dangling blob has no path, is never in the set, and 404s.
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR changes /ipfs/{cid} path-scoped blob authorization to use a reachable allowed set, adds shared helpers for allowed-set and object-content reads, and adds unit and integration coverage for dangling blob fail-closed behavior.

Changes

Fail-closed allowed-set gating for GET /ipfs/{cid}

Layer / File(s) Summary
Allowed blob set helper
crates/gitlawb-node/src/git/visibility_pack.rs
allowed_blob_set_for_caller computes reachable allowed blob OIDs for a caller, and replicable_blob_set delegates anonymous blob selection to it.
Object type and content reads
crates/gitlawb-node/src/git/store.rs
object_type, read_object_content, and the updated read_object split git object lookup from raw content reads.
IPFS allowed-set gate
crates/gitlawb-node/src/api/ipfs.rs
The handler imports the allowed-set helper, updates comments and memo naming, reads object existence before path-scoped gating, computes a per-repo allowed set for path-scoped rules, and skips repos when the walk or join fails before checking blob membership.
Dangling blob coverage
crates/gitlawb-node/src/git/visibility_pack.rs, crates/gitlawb-node/src/test_support.rs
Adds unit and integration tests that assert dangling blobs are excluded from the allowed set and return 404 under path-scoped rules.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Gitlawb/node#25: Introduces the visibility primitives that this PR reuses for caller-scoped blob authorization.
  • Gitlawb/node#28: Also computes per-caller blob OID sets from visibility_check over reachable paths in visibility_pack.rs.
  • Gitlawb/node#128: Updates the same /ipfs/{cid} path-scoped authorization flow and its blob-set handling.

Suggested labels

sev:medium, kind:security, subsystem:visibility, subsystem:api

Suggested reviewers

  • kevincodex1
  • jatmn

Poem

🐇 A dangling blob tried a sneaky little glide,
But the allowed set said, “Nope — you stay outside.”
The gate checked paths, then shut with a tidy 404,
And secret bytes stayed hidden at the door.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main fix: switching GET /ipfs/{cid} from a deny-set to a reachable allowed-set.
Description check ✅ Passed It covers the bug, motivation, concrete changes, linked issue, and test results; only some optional template sections are omitted.
Linked Issues check ✅ Passed The changes implement the requested fail-closed allowed-set gate and add tests for dangling/unreachable blobs, matching #126.
Out of Scope Changes check ✅ Passed The diffs stay focused on the IPFS authorization fix, helper refactor, and related tests with no clear unrelated churn.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/ipfs.rs`:
- Around line 120-164: The path-scoped branch in the IPFS read flow is doing the
expensive allowed-blob walk before confirming the CID exists in the repo, which
causes unnecessary full-history git walks on misses. In the `read_object` path
inside `ipfs.rs`, move the `store::read_object` existence/type check ahead of
`allowed_blob_set_for_caller` so you only call the walk after a blob hit for
that repo. Keep the existing `path_scoped`, `allowed_memo`, and
`allowed_blob_set_for_caller` logic, but gate the spawn_blocking walk on a
confirmed object match to avoid anonymous random-CID misses triggering repo-wide
scans.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fbacedd5-03bf-4d1d-aaf5-6be3c677b051

📥 Commits

Reviewing files that changed from the base of the PR and between 0190fbb and 4506480.

📒 Files selected for processing (3)
  • crates/gitlawb-node/src/api/ipfs.rs
  • crates/gitlawb-node/src/git/visibility_pack.rs
  • crates/gitlawb-node/src/test_support.rs
Comment thread crates/gitlawb-node/src/api/ipfs.rs Outdated
Move store::read_object before the allowed_blob_set_for_caller
spawn_blocking call so random-CID spray against repos with
path-scoped rules cannot trigger full-history git walks on
repos that don't carry the object.

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allow-set flip is the right fix for #126 and it is fail-closed. I built the merged state and ran it: dangling and walk-error cases 404 for every caller class (anon, listed reader, owner), replicable_blob_set is behavior-identical after delegating to the new helper, and the existence-before-walk reorder addresses the earlier perf note. One mechanical blocker before this can merge, plus one divergence worth its own issue (already filed).

Findings

  • [P3] Run cargo fmt, ipfs.rs fails rustfmt --check (blocks the format gate)
    crates/gitlawb-node/src/api/ipfs.rs:145
    The allowed_blob_set_for_caller(...) call (line ~145) and the x-content-cid HeaderValue::from_str(...) line (~180) are not rustfmt-formatted, so the format check fails. cargo fmt fixes both. This is the only change required to merge.

  • [P2] Tree/commit CIDs still disclose a withheld subtree's structure (follow-up, not a blocker here)
    crates/gitlawb-node/src/api/ipfs.rs:139
    Under a path-scoped rule the gate applies only to obj_type == "blob", so a tree/commit/tag object is served unconditionally and its raw body hands an anonymous caller every child filename and OID of a withheld subtree. This is pre-existing (the old deny-set never held trees either) and explicitly out of #126's scope, so it should not block this PR. But get_tree (api/repos.rs:444, the N3 fix) deliberately protects exactly this, so the two read surfaces over the same data disagree. Tracked separately in #135.

  • [P3] Remove the now-dead withheld_blob_oids or note why it stays
    crates/gitlawb-node/src/git/visibility_pack.rs:232
    After this PR nothing in production calls withheld_blob_oids; only its own tests reference it. Because it is pub, the compiler and clippy will not flag it. Either delete it (and its tests) or add a line on why it is retained.

Notes (non-blocking): the existence-check-before-walk reorder means an existing-but-withheld CID pays the walk while an absent CID returns immediately, so latency weakly signals object existence; both still 404, and this is the perf win from the earlier review, so I would accept it. Test-wise, the KTD3 tree-serve is asserted only by status (not body contents), and there is no get_by_cid test for a listed-reader serve or an allowed-at-both-paths blob.

Nice work on the fail-closed tests. The dangling-blob coverage across all three caller classes is exactly right. Once cargo fmt is run, this is good to go from my side (note the fork CI still needs a maintainer to approve the workflow run).

…tion

✓ P3 blocker fixed: cargo fmt applied — the format gate will pass.

  ✓ P3 cleanup resolved: withheld_blob_oids is still used by replication code in repos.rs, so it stays.

  • P2 follow-up: Tree/commit disclosure tracked in Gitlawb#135 — out of scope here.

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I rechecked the changed paths and found one issue that still needs to be addressed.

Findings

  • [P2] Avoid reading unauthorized blob bodies before the allowed-set gate
    crates/gitlawb-node/src/api/ipfs.rs:125
    The reordered path now calls store::read_object before checking whether a path-scoped blob is in allowed_blob_set_for_caller, and read_object does both git cat-file -t and git cat-file <type> before returning. That means a caller who knows a withheld or dangling blob CID can make the node materialize the full unauthorized blob body into memory just to discard it and return 404. Please split the existence/type probe from the content read, run the allowed-set check after confirming the object is a blob, and only call the body-reading path once the blob is authorized to be served.
@Gravirei Gravirei requested review from beardthelion and jatmn June 30, 2026 20:25

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split-probe reorder closes the P2. get_by_cid now type-probes with cat-file -t, runs the allowed-set check, and only reads the object body once that check passes, so a withheld or dangling blob is never materialized for a caller who can't reach it. I built the merged state and walked the caller matrix: anon, non-reader, listed-reader, and owner all behave; dangling and walk-error fail closed (404) for every class; public copies still serve.

@kevincodex1 LGTM.

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one gate issue that needs to be addressed before this is ready.

Findings

  • [P2] Run rustfmt so the required format check passes
    crates/gitlawb-node/src/git/store.rs:286
    The current head still fails cargo fmt --all -- --check; rustfmt wants to split the new object_type return expression across multiple lines. The PR Checks workflow runs that exact command before clippy, and the contribution guide says CI rejects formatting failures, so this will block the fmt + clippy job even though the focused visibility tests pass. Please run cargo fmt --all and push the formatted result.
@Gravirei Gravirei requested review from beardthelion and jatmn July 1, 2026 17:40

@beardthelion beardthelion left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed on 03ba714. The rustfmt blocker is resolved: the store.rs change is a pure line-wrap of the same expression, and fmt + clippy is green. The gate is unchanged from the head I approved, get_by_cid still gates on the reachable allowed-set with the object_type existence probe ahead of the walk, and the dangling-blob path still fails closed on the merged state. Merge from main is clean, CI green.

@kevincodex1 LGTM.

@jatmn jatmn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I rechecked the previously discussed paths and do not see any remaining actionable issues from my side.

@kevincodex1 LGTM

@kevincodex1 kevincodex1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good . thanks

@kevincodex1 kevincodex1 merged commit 466a550 into Gitlawb:main Jul 2, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crate:node gitlawb-node — the serving node and REST API kind:bug Defect fix — wrong or unsafe behavior

4 participants