sagar0x0

Follow

Sagar Gupta sagar0x0

Follow

GPU brr..

2 followers · 26 following

Achievements

Achievements

Pinned Loading

fast-rl-grpo fast-rl-grpo Public

Optimizing RL training with GRPO. This repo implements the RL pipeline logic and optimizing it with different technique including custom kernels, reducing overhead mainly in grpo step. Features det…

Python 1
Rainference Rainference Public

Self-hosted LLM inference stack with OpenAI-compatible API. Deploy LLaMA models on your own bare-metal Kubernetes cluster with built-in billing, analytics, and management dashboard

Python 2 1
Diffusion.cu Diffusion.cu Public

Diffusion.cu A high-performance diffusion model built from scratch with core components optimized in CUDA for speed. It features custom Conv2d, GroupNorm, and attention using Flash Attention for ef…

Python 1
llm-kernel-patch llm-kernel-patch Public

patch transformer library llm part with specific llm cuda kernel

Cuda 1
VectorSum-kernel VectorSum-kernel Public

Vector sum GPU kernel optimization in CUDA

Python 1
llm-d-inference-scheduler llm-d-inference-scheduler Public

Forked from llm-d/llm-d-inference-scheduler

Inference scheduler for llm-d

Go