Skip to content
View maawad's full-sized avatar
:shipit:
:shipit:

Highlights

  • Pro

Organizations

@owensgroup @gunrock

Block or report maawad

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
maawad/README.md

πŸ‘‹ Hi there β€” I'm Muhammad Awad

I'm a Senior Member of Technical Staff at AMD Research, where I design, architect, and tech-lead system-level software, libraries, and runtimes for next-generation computing. I lead cross-organizational teams and coordinate initiatives across ML frameworks, kernel engineering, distributed systems, and research teams.

Before joining AMD, I completed my Ph.D. in Electrical and Computer Engineering at UC Davis, advised by John Owens. My research there focused on building dynamic concurrent data structures on GPUs, such as B-trees, dynamic graphs, multiversioned trees, and high-performance hash tables.


πŸ’Ό Current Focus @ AMD

At AMD, I lead and contribute to projects spanning:

  • AI-powered tools for GPU performance and productivity
  • Libraries and runtimes for heterogeneous and distributed systems
  • Programming models for AMD GPUs and Ryzenβ„’ AI NPUs

Notable projects:

  • πŸ” Iris: Designed and architected a Triton-based multi-GPU programming framework from scratch. Tech-leading a cross-organizational team, demonstrated significant speedups in production LLM workloads.
  • 🧠 IntelliKit: Created and architected an open-source LLM-ready profiling toolkit for AMD CPUs and GPUs. Serving as tooling lead for AMD's company-wide ML for performance engineering initiative.
  • 🧠 IntelliPerf: Created and led development of an LLM-powered autonomous GPU performance engineering framework that profiles, diagnoses, and optimizes kernels end-to-end.
  • βš™οΈ IRON: Contributor to IRON, a low-level development stack for AMD Ryzenβ„’ AI NPUs with Python APIs and MLIR compiler passes.

πŸ”¬ Academic Research

I'm broadly interested in parallel computing, concurrent data structures, performance analysis, and low-level GPU programming. As a Ph.D. student, I designed and built several GPU-native data structures:


πŸ“« Connect


Pinned Loading

  1. ROCm/iris ROCm/iris Public

    AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

    Python 191 42

  2. owensgroup/BGHT owensgroup/BGHT Public

    BGHT: High-performance static GPU hash tables.

    C++ 73 9

  3. owensgroup/MVGpuBTree owensgroup/MVGpuBTree Public

    GPU B-Tree with support for versioning (snapshots).

    C++ 51 6

  4. gunrock/gunrock gunrock/gunrock Public

    Programmable CUDA/C++ GPU Graph Analytics

    C++ 1.1k 223

  5. owensgroup/GpuBTree owensgroup/GpuBTree Public

    Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019

    Cuda 57 13

  6. PTX_BCHT PTX_BCHT Public

    Bucketed Cuckoo hash set written in PTX and JIT-compiled.

    C++ 1 1