Skip to content
View BBuf's full-sized avatar

Block or report BBuf

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
BBuf/README.md

Hi there πŸ‘‹

I'm BBuf (Xiaoyu Zhang), a Core Developer at SGLang and working at SkyworkAI.

I focus on LLM inference optimization, CUDA kernel engineering, and AI infrastructure β€” writing high-performance GPU code and pushing the boundaries of large model serving.

πŸ“ I share technical deep-dives on my WeChat public account GiantPandaCV and ηŸ₯δΉŽδΈ“ζ .


πŸš€ Open Source Contributions

Core Developer

sglang

SGLang is a high-performance serving framework for LLMs and multimodal models, powering 400,000+ GPUs worldwide. Trusted by xAI, NVIDIA, AMD, Google Cloud, Microsoft Azure and many more.

cache-dit

Cache-DiT is a PyTorch-native inference acceleration framework for Diffusion Transformer (DiT) models. Supports FLUX, HunyuanVideo, WAN2.1, Qwen-Image and 70+ models with train-free caching and hybrid parallelism.


πŸ“š Learning Notes & Research

how-to-optim-algorithm-in-cuda

CUDA optimization notes covering kernels, CUTLASS/CuTe, Triton, CUDA-MODE course, LLM inference/training optimization (SGLang, vLLM, MoE, Flash Attention, etc.), and PyTorch internals.


πŸ“Š GitHub Stats

BBuf's github stats

Pinned Loading

  1. tvm_mlir_learn tvm_mlir_learn Public

    compiler learning resources collect.

    Python 2.7k 370

  2. how-to-optim-algorithm-in-cuda how-to-optim-algorithm-in-cuda Public

    how to optimize some algorithm in cuda.

    Cuda 3k 272

  3. sgl-project/sglang sgl-project/sglang Public

    SGLang is a high-performance serving framework for large language models and multimodal models.

    Python 26.9k 5.7k

  4. vipshop/cache-dit vipshop/cache-dit Public

    A PyTorch-native inference engine with cache, parallelism, quantization for Diffusion Transformers.

    Python 1.2k 70