Skip to content
View jason-huang03's full-sized avatar

Organizations

@thu-nics @thu-ml

Block or report jason-huang03

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jason-huang03/README.md

Pinned Loading

  1. thu-ml/SageAttention thu-ml/SageAttention Public

    [ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

    Cuda 2.8k 274

  2. thu-ml/SpargeAttn thu-ml/SpargeAttn Public

    [ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

    Cuda 796 67

  3. SPH_Project SPH_Project Public

    SPH Realization of Fluid Simulation. Featuring Large Scale Simulation, Rigid-Fluid Coupling and High Viscosity Fluid.

    Python 195 16

  4. mit-han-lab/llm-awq mit-han-lab/llm-awq Public

    [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

    Python 3.4k 283

  5. thu-nics/MoA thu-nics/MoA Public

    [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>

    Python 151 8

  6. mit-han-lab/omniserve mit-han-lab/omniserve Public

    [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

    C++ 786 54