Pinned Loading
-
fast-rl-grpo
fast-rl-grpo PublicOptimizing RL training with GRPO. This repo implements the RL pipeline logic and optimizing it with different technique including custom kernels, reducing overhead mainly in grpo step. Features det…
Python 1
-
Rainference
Rainference PublicSelf-hosted LLM inference stack with OpenAI-compatible API. Deploy LLaMA models on your own bare-metal Kubernetes cluster with built-in billing, analytics, and management dashboard
-
Diffusion.cu
Diffusion.cu PublicDiffusion.cu A high-performance diffusion model built from scratch with core components optimized in CUDA for speed. It features custom Conv2d, GroupNorm, and attention using Flash Attention for ef…
Python 1
-
llm-kernel-patch
llm-kernel-patch Publicpatch transformer library llm part with specific llm cuda kernel
Cuda 1
-
-
llm-d-inference-scheduler
llm-d-inference-scheduler PublicForked from llm-d/llm-d-inference-scheduler
Inference scheduler for llm-d
Go
If the problem persists, check the GitHub status page or contact support.


