Skip to content
View sagar0x0's full-sized avatar

Block or report sagar0x0

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. fast-rl-grpo fast-rl-grpo Public

    Optimizing RL training with GRPO. This repo implements the RL pipeline logic and optimizing it with different technique including custom kernels, reducing overhead mainly in grpo step. Features det…

    Python 1

  2. Rainference Rainference Public

    Self-hosted LLM inference stack with OpenAI-compatible API. Deploy LLaMA models on your own bare-metal Kubernetes cluster with built-in billing, analytics, and management dashboard

    Python 2 1

  3. Diffusion.cu Diffusion.cu Public

    Diffusion.cu A high-performance diffusion model built from scratch with core components optimized in CUDA for speed. It features custom Conv2d, GroupNorm, and attention using Flash Attention for ef…

    Python 1

  4. llm-kernel-patch llm-kernel-patch Public

    patch transformer library llm part with specific llm cuda kernel

    Cuda 1

  5. VectorSum-kernel VectorSum-kernel Public

    Vector sum GPU kernel optimization in CUDA

    Python 1

  6. llm-d-inference-scheduler llm-d-inference-scheduler Public

    Forked from llm-d/llm-d-inference-scheduler

    Inference scheduler for llm-d

    Go