Implement a proper gliner client, fix bugs#352
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewrites
GLiNERClientas a pure HTTP client, drops the Ray dependency from the client path, fixes atorch.compile-related packing bug in the span-relex model, cleans upGLiNERFactory.shutdown(), and substantially expands the serving docs (RelEx section + client usage).Details
gliner/serve/client.py— HTTP-only clientray/serve.get_deployment_handlebased client with a stdlib-only HTTP wrapper built onurllib.request.GLiNERClientErrorexception type surfaces HTTP and network failures.GLiNERClient(base_url, route_prefix, timeout, max_concurrency)— no longer needs a Ray cluster or deployment handle.predict(...)andpredict_async(...):strin →dictout;listin →listout (preserved).asyncio.to_thread+asyncio.gatherfor async) so the server-side@serve.batchcoalesces them into a single forward pass. A sequential loop would serialize on the wire and defeat batching.max_concurrencybounds the in-flight request count._build_payload(...)centralizes request shape; optional fields (relations,threshold,relation_threshold) are only included when set.get_client(...)updated to mirror the new constructor.gliner/modeling/base.py— simpler packing inUniEncoderSpanRelexModel._packtorch.compiler.is_compiling()branch that previously avoided a GPU→CPU sync in eager mode at the cost of skipping packing.lengths = rep_mask.sum(dim=-1)andmax_len = lengths.max().item(), so eager execution now benefits from packing too.gliner/serve/server.py— cleaner shutdownGLiNERFactory.shutdown()now callsray.shutdown()afterserve.shutdown()when Ray is initialized.ServeController ... killed by ray.killretry warnings in the raylet log on process exit. Still idempotent.docs/serving.mdGLiNERClientsection: clarifies it’s a pure HTTP client (no Ray import), documents the custom constructor args, and explains the batching semantics with an example.--model knowledgator/gliner-relex-large-v1.0), client usage (sync + batched), in-processGLiNERFactoryusage, acurlexample, and the response shape.--batch-wait-timeout-ms50 → 10,--target-memory-fraction0.8 → 0.9, plus--memory-overhead-factor 1.3..gitignorelabels_trie.cpp(generated artifact) to the ignore list.Notable behavior changes
GLiNERClient(deployment_name=..., ray_address=...)must migrate toGLiNERClient(base_url=..., route_prefix=...).UniEncoderSpanRelexModel, incurring a GPU→CPU sync but reducing downstream compute on masked positions.GLiNERFactorycontext manager exit now fully tears down Ray, not just Serve.