Skip to content

Implement a proper gliner client, fix bugs#352

Merged
urchade merged 1 commit into
mainfrom
feature/client
Apr 24, 2026
Merged

Implement a proper gliner client, fix bugs#352
urchade merged 1 commit into
mainfrom
feature/client

Conversation

@Ingvarstep

Copy link
Copy Markdown
Collaborator

Summary

Rewrites GLiNERClient as a pure HTTP client, drops the Ray dependency from the client path, fixes a torch.compile-related packing bug in the span-relex model, cleans up GLiNERFactory.shutdown(), and substantially expands the serving docs (RelEx section + client usage).

Details

gliner/serve/client.py — HTTP-only client

  • Replaces the ray / serve.get_deployment_handle based client with a stdlib-only HTTP wrapper built on urllib.request.
  • New GLiNERClientError exception type surfaces HTTP and network failures.
  • New constructor signature: GLiNERClient(base_url, route_prefix, timeout, max_concurrency) — no longer needs a Ray cluster or deployment handle.
  • predict(...) and predict_async(...):
    • str in → dict out; list in → list out (preserved).
    • Batch calls fan out one HTTP request per text concurrently (threads for sync, asyncio.to_thread + asyncio.gather for async) so the server-side @serve.batch coalesces them into a single forward pass. A sequential loop would serialize on the wire and defeat batching.
    • max_concurrency bounds the in-flight request count.
  • _build_payload(...) centralizes request shape; optional fields (relations, threshold, relation_threshold) are only included when set.
  • get_client(...) updated to mirror the new constructor.

gliner/modeling/base.py — simpler packing in UniEncoderSpanRelexModel._pack

  • Drops the torch.compiler.is_compiling() branch that previously avoided a GPU→CPU sync in eager mode at the cost of skipping packing.
  • Always computes lengths = rep_mask.sum(dim=-1) and max_len = lengths.max().item(), so eager execution now benefits from packing too.

gliner/serve/server.py — cleaner shutdown

  • GLiNERFactory.shutdown() now calls ray.shutdown() after serve.shutdown() when Ray is initialized.
  • Prevents noisy ServeController ... killed by ray.kill retry warnings in the raylet log on process exit. Still idempotent.

docs/serving.md

  • Expands the GLiNERClient section: clarifies it’s a pure HTTP client (no Ray import), documents the custom constructor args, and explains the batching semantics with an example.
  • Adds a full Relation Extraction section covering RelEx server startup (--model knowledgator/gliner-relex-large-v1.0), client usage (sync + batched), in-process GLiNERFactory usage, a curl example, and the response shape.
  • Documents new/changed CLI defaults: --batch-wait-timeout-ms 50 → 10, --target-memory-fraction 0.8 → 0.9, plus --memory-overhead-factor 1.3.

.gitignore

  • Adds labels_trie.cpp (generated artifact) to the ignore list.

Notable behavior changes

  • Client no longer needs Ray. Any previous code constructing GLiNERClient(deployment_name=..., ray_address=...) must migrate to GLiNERClient(base_url=..., route_prefix=...).
  • Eager mode packs length-variable inputs in UniEncoderSpanRelexModel, incurring a GPU→CPU sync but reducing downstream compute on masked positions.
  • GLiNERFactory context manager exit now fully tears down Ray, not just Serve.
@urchade urchade merged commit e052cc0 into main Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants