Follow
Zhen ZHENG
Title
Cited by
Cited by
Year
DAPPLE: A pipelined data parallel approach for training large models
S Fan, Y Rong, C Meng, Z Cao, S Wang, Z Zheng, C Wu, G Long, J Yang, ...
Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of …, 2021
1122021
Understanding and bridging the gaps in current GNN performance optimizations
K Huang, J Zhai, Z Zheng, Y Yi, X Shen
Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of …, 2021
542021
Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer
H Fu, J Liao, W Xue, L Wang, D Chen, L Gu, J Xu, N Ding, X Wang, C He, ...
SC'16: Proceedings of the International Conference for High Performance …, 2016
392016
Versapipe: a versatile programming framework for pipelined computing on GPU
Z Zheng, C Oh, J Zhai, X Shen, Y Yi, W Chen
Proceedings of the 50th Annual IEEE/ACM International Symposium on …, 2017
322017
AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures
Z Zheng, X Yang, P Zhao, G Long, K Zhu, F Zhu, W Zhao, X Liu, J Yang, ...
Proceedings of the 27th ACM International Conference on Architectural …, 2022
282022
Fusionstitching: boosting memory intensive computations for deep learning workloads
Z Zheng, P Zhao, G Long, F Zhu, K Zhu, W Zhao, L Diao, J Yang, W Lin
arXiv preprint arXiv:2009.10924, 2020
272020
Whale: Efficient giant model training over heterogeneous {GPUs}
X Jia, L Jiang, A Wang, W Xiao, Z Shi, J Zhang, X Li, L Chen, Y Li, ...
2022 USENIX Annual Technical Conference (USENIX ATC 22), 673-688, 2022
212022
Optimizing distributed training deployment in heterogeneous GPU clusters
X Yi, S Zhang, Z Luo, G Long, L Diao, C Wu, Z Zheng, J Yang, W Lin
Proceedings of the 16th International Conference on emerging Networking …, 2020
172020
DISC: A dynamic shape compiler for machine learning workloads
K Zhu, WY Zhao, Z Zheng, TY Guo, PZ Zhao, JJ Bai, J Yang, XY Liu, ...
Proceedings of the 1st Workshop on Machine Learning and Systems, 89-95, 2021
152021
Exploring deep reuse in winograd CNN inference
R Wu, F Zhang, Z Zheng, X Du, X Shen
Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of …, 2021
122021
Drew: Efficient winograd cnn inference with deep reuse
R Wu, F Zhang, J Guan, Z Zheng, X Du, X Shen
Proceedings of the ACM Web Conference 2022, 1807-1816, 2022
112022
Gopipe: a granularity-oblivious programming framework for pipelined stencil executions on gpu
C Oh, Z Zheng, X Shen, J Zhai, Y Yi
Proceedings of the ACM International Conference on Parallel Architectures …, 2020
102020
HiWayLib: A software framework for enabling high performance communications for heterogeneous pipeline computations
Z Zheng, C Oh, J Zhai, X Shen, Y Yi, W Chen
Proceedings of the Twenty-Fourth International Conference on Architectural …, 2019
92019
Auto-map: A DQN framework for exploring distributed execution plans for DNN workloads
S Wang, Y Rong, S Fan, Z Zheng, LS Diao, G Long, J Yang, X Liu, W Lin
arXiv preprint arXiv:2007.04069, 2020
72020
Whale: Scaling deep learning model training to the trillions
X Jia, AW Le Jiang, J Zhang, X Li, W Xiao, Y Li, Z Zheng, X Liu, W Lin
arXiv preprint arXiv:2011.09208, 2020
52020
Optimizing DNN Compilation for Distributed Training With Joint OP and Tensor Fusion
X Yi, S Zhang, L Diao, C Wu, Z Zheng, S Fan, S Wang, J Yang, W Lin
IEEE Transactions on Parallel and Distributed Systems 33 (12), 4694-4706, 2022
22022
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform
S Zhang, L Diao, S Wang, Z Cao, Y Gu, C Si, Z Shi, Z Zheng, C Wu, W Lin
arXiv preprint arXiv:2302.08141, 2023
12023
BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach
Z Zheng, Z Pan, D Wang, K Zhu, W Zhao, T Guo, X Qiu, M Sun, J Bai, ...
Proceedings of the ACM on Management of Data 1 (3), 1-29, 2023
2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
H Xia, Z Zheng, Y Li, D Zhuang, Z Zhou, X Qiu, Y Li, W Lin, SL Song
arXiv preprint arXiv:2309.10285, 2023
2023
Expanding the Edge: Enabling Efficient Winograd CNN Inference With Deep Reuse on Edge Device
F Zhang, R Wu, J Guan, Z Zheng, X Guo, X Zhang, X Du, X Shen
IEEE Transactions on Knowledge and Data Engineering, 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–20