Implement a proper gliner client, fix bugs by Ingvarstep · Pull Request #352 · urchade/GLiNER

Ingvarstep · 2026-04-24T13:52:28Z

Summary

Rewrites GLiNERClient as a pure HTTP client, drops the Ray dependency from the client path, fixes a torch.compile-related packing bug in the span-relex model, cleans up GLiNERFactory.shutdown(), and substantially expands the serving docs (RelEx section + client usage).

Details

`gliner/serve/client.py` — HTTP-only client

Replaces the ray / serve.get_deployment_handle based client with a stdlib-only HTTP wrapper built on urllib.request.
New GLiNERClientError exception type surfaces HTTP and network failures.
New constructor signature: GLiNERClient(base_url, route_prefix, timeout, max_concurrency) — no longer needs a Ray cluster or deployment handle.
predict(...) and predict_async(...):
- str in → dict out; list in → list out (preserved).
- Batch calls fan out one HTTP request per text concurrently (threads for sync, asyncio.to_thread + asyncio.gather for async) so the server-side @serve.batch coalesces them into a single forward pass. A sequential loop would serialize on the wire and defeat batching.
- max_concurrency bounds the in-flight request count.
_build_payload(...) centralizes request shape; optional fields (relations, threshold, relation_threshold) are only included when set.
get_client(...) updated to mirror the new constructor.

`gliner/modeling/base.py` — simpler packing in `UniEncoderSpanRelexModel._pack`

Drops the torch.compiler.is_compiling() branch that previously avoided a GPU→CPU sync in eager mode at the cost of skipping packing.
Always computes lengths = rep_mask.sum(dim=-1) and max_len = lengths.max().item(), so eager execution now benefits from packing too.

`gliner/serve/server.py` — cleaner shutdown

GLiNERFactory.shutdown() now calls ray.shutdown() after serve.shutdown() when Ray is initialized.
Prevents noisy ServeController ... killed by ray.kill retry warnings in the raylet log on process exit. Still idempotent.

`docs/serving.md`

Expands the GLiNERClient section: clarifies it’s a pure HTTP client (no Ray import), documents the custom constructor args, and explains the batching semantics with an example.
Adds a full Relation Extraction section covering RelEx server startup (--model knowledgator/gliner-relex-large-v1.0), client usage (sync + batched), in-process GLiNERFactory usage, a curl example, and the response shape.
Documents new/changed CLI defaults: --batch-wait-timeout-ms 50 → 10, --target-memory-fraction 0.8 → 0.9, plus --memory-overhead-factor 1.3.

`.gitignore`

Adds labels_trie.cpp (generated artifact) to the ignore list.

Notable behavior changes

Client no longer needs Ray. Any previous code constructing GLiNERClient(deployment_name=..., ray_address=...) must migrate to GLiNERClient(base_url=..., route_prefix=...).
Eager mode packs length-variable inputs in UniEncoderSpanRelexModel, incurring a GPU→CPU sync but reducing downstream compute on masked positions.
GLiNERFactory context manager exit now fully tears down Ray, not just Serve.

implement a proper gliner client, fix bugs

a5a1ac3

urchade merged commit e052cc0 into main Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a proper gliner client, fix bugs#352

Implement a proper gliner client, fix bugs#352
urchade merged 1 commit into
mainfrom
feature/client

Ingvarstep commented Apr 24, 2026

Labels

2 participants

Conversation

Ingvarstep commented Apr 24, 2026

Summary

Details

gliner/serve/client.py — HTTP-only client

gliner/modeling/base.py — simpler packing in UniEncoderSpanRelexModel._pack

gliner/serve/server.py — cleaner shutdown

docs/serving.md

.gitignore