We developed TorchBench, a novel benchmark suite to study the performance of the PyTorch software stack. Unlike existing benchmark suites, TorchBench encloses many representative models, covering a large PyTorch API surface. TorchBench is able to comprehensively characterize the performance of the PyTorch software stack, guiding the performance optimization across models, PyTorch framework, and GPU libraries. We show two practical use cases atop TorchBench. (1) We profile TorchBench to identify GPU performance inefficiencies in PyTorch. We are able to optimize many performance bugs and upstream the patches to the official PyTorch repository. (2) We integrate TorchBench into PyTorch continuous integration system. We are able to identify performance regression in multiple daily code checkins to prevent PyTorch repository from introducing performance bugs.
We implemented GVProf, the first value profiler that locates value redundancy problems in applications running on GPU-based clusters. Our experiments show that GVProf incurs acceptable overhead and scales to large executions. GVProf provides useful insights to guide performance optimization. Under the guidance of GVProf, we optimized several HPC and machine learning workloads
DrGPU is a Top-Down profiler for GPU Applications. More specifically, it is a trace analyzer for CUDA kernels to analyze the bottleneck and give suggestions for performance optimization.
2023 Best Paper Finalist, ICPE
2022 Distinguished Artifact Award, ASPLOS
2021 Runner Up, A-HUG Cloud HPC Hackathon
2021 Summer Graduate Merit Award, NCSU
2015 First Prize of China Undergraduate Mathematical Contest in Modeling, China
Conference Observer: ICPP 2020
External Conference Reviewer: IPDPS 2023, ICPE 2023
Artifact Evaluation Committee: PPoPP 2021, PPoPP 2022, PPoPP 2023, MICRO 2023, ASPLOS 2024, CGO 2024, PPoPP 2024
Web Chair: LCTES 2021
Journal Reviewer: TECS 2021