Cheng Li

Cited by

	All	Since 2019
Citations	1573	1155
h-index	16	16
i10-index	22	22

280

140

210

2014201520162017201820192020202120222023202416 39 98 112 130 145 175 171 147 255 261

Public access

View all

10 articles

1 article

available

not available

Based on funding mandates

Co-authors

Wen-mei W. HwuSenior Distinguished Research Scientist, NVIDIA; Professor and Sanders-AMD Chair of Electrical andVerified email at illinois.edu
Abdul DakkakModularVerified email at modular.com
Jinjun XiongUniversity at BuffaloVerified email at buffalo.edu
Michael LaurenzanoClinc, Inc.Verified email at clinc.com
Trevor MudgeBredt Family Professor of Engineering, University of MichiganVerified email at eecs.umich.edu
Ronald DreslinskiUniversity of MichiganVerified email at umich.edu
Jason MarsProfessor of Computer Science and Engineering, University of MichiganVerified email at umich.edu
Lingjia TangUniversity of MichiganVerified email at umich.edu
Johann HauswaldStanford, University of MichiganVerified email at umich.edu
Yunqi ZhangMeta, IncVerified email at umich.edu
Vinicius PetrucciMicron TechnologyVerified email at micron.com
Yiping KangUniversity of MichiganVerified email at umich.edu
John P HayesProfessor of EECS, University of MichiganVerified email at umich.edu
Armin AlaghiUniversity of WashingtonVerified email at cs.washington.edu
Quan ChenProfessor, Shanghai Jiao Tong UniversityVerified email at sjtu.edu.cn
Carl PearsonSandia National LabsVerified email at sandia.gov
Isaac GeladoNVIDIAVerified email at gelado.org
Li-Wen ChangResearch Scientist, ByteDanceVerified email at bytedance.com
Izzat El HajjAmerican University of BeirutVerified email at aub.edu.lb
Juan Gómez LunaNVIDIAVerified email at nvidia.com

Cheng Li

Microsoft

Verified email at microsoft.com - Homepage

AI Deep Learning Machine Learning GPU Parallel Computing


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers J Hauswald, MA Laurenzano, Y Zhang, C Li, A Rovinski, A Khurana, ... Proceedings of the Twentieth International Conference on Architectural …, 2015	338	2015
Stochastic circuits for real-time image-processing applications A Alaghi, C Li, JP Hayes Proceedings of the 50th Annual Design Automation Conference, 1-6, 2013	315	2013
Djinn and tonic: Dnn as a service and its implications for future warehouse scale computers J Hauswald, Y Kang, MA Laurenzano, Q Chen, C Li, T Mudge, ... ACM SIGARCH Computer Architecture News 43 (3S), 27-40, 2015	199	2015
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ... SC22: International Conference for High Performance Computing, Networking …, 2022	189	2022
Accelerating reduction and scan using tensor core units A Dakkak, C Li, J Xiong, I Gelado, W Hwu Proceedings of the ACM International Conference on Supercomputing, 46-57, 2019	95	2019
KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism I El Hajj, J Gómez-Luna, C Li, LW Chang, D Milojicic, W Hwu 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture …, 2016	44	2016
Evaluating characteristics of CUDA communication primitives on high-bandwidth interconnects C Pearson, A Dakkak, S Hashash, C Li, IH Chung, J Xiong, WM Hwu Proceedings of the 2019 ACM/SPEC International Conference on Performance …, 2019	38	2019
A comprehensive study on post-training quantization for large language models Z Yao, C Li, X Wu, S Youn, Y He arXiv preprint arXiv:2303.08302, 2023	33	2023
XSP: Across-stack profiling and analysis of machine learning models on GPUs C Li, A Dakkak, J Xiong, W Wei, L Xu, W Hwu 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2020	32*	2020
Designing future warehouse-scale computers for sirius, an end-to-end voice and vision personal assistant J Hauswald, MA Laurenzano, Y Zhang, H Yang, Y Kang, C Li, A Rovinski, ... ACM Transactions on Computer Systems (TOCS) 34 (1), 1-32, 2016	32	2016
Trims: Transparent and isolated model sharing for low latency deep learning inference in function-as-a-service A Dakkak, C Li, SG De Gonzalo, J Xiong, W Hwu 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), 372-382, 2019	31	2019
Zeroquant-v2: Exploring post-training quantization in llms from comprehensive study to low rank compensation Z Yao, X Wu, C Li, S Youn, Y He arXiv preprint arXiv:2303.08302, 2023	26	2023
Ai matrix: A deep learning benchmark for alibaba data centers W Zhang, W Wei, L Xu, L Jin, C Li arXiv preprint arXiv:1909.10562, 2019	22	2019
Understanding int4 quantization for transformer models: Latency speedup, composability, and failure cases X Wu, C Li, RY Aminabadi, Z Yao, Y He arXiv preprint arXiv:2301.12017, 2023	19	2023
Frustrated with replicating claims of a shared model? a solution A Dakkak, C Li, J Xiong, WM Hwu arXiv preprint arXiv:1811.09737, 2018	16*	2018
Matrix factorization on gpus with memory optimization and approximate computing W Tan, S Chang, L Fong, C Li, Z Wang, L Cao Proceedings of the 47th International Conference on Parallel Processing, 1-10, 2018	16	2018
Acm Y Wang, W Feng, Y Chen, H Yu, M Huang, PS Yu Visual Domain Adaptation with Manifold Embedded Distribution Alignment, 402-410, 2018	15	2018
Random-ltd: Random and layerwise token dropping brings efficient training for large-scale transformers Z Yao, X Wu, C Li, C Holmes, M Zhang, C Li, Y He arXiv preprint arXiv:2211.11586, 2022	13	2022
Mpress: Democratizing billion-scale model training on multi-gpu servers via memory-saving inter-operator parallelism Q Zhou, H Wang, X Yu, C Li, Y Bai, F Yan, Y Xu 2023 IEEE International Symposium on High-Performance Computer Architecture …, 2023	12	2023
Understanding int4 quantization for language models: latency speedup, composability, and failure cases X Wu, C Li, RY Aminabadi, Z Yao, Y He International Conference on Machine Learning, 37524-37539, 2023	11	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors