Follow
Xiuhong Li
Xiuhong Li
Verified email at pku.edu.cn
Title
Cited by
Cited by
Year
Enabling coordinated register allocation and thread-level parallelism optimization for GPUs
X Xie, Y Liang, X Li, Y Wu, G Sun, T Wang, D Fan
Proceedings of the 48th International Symposium on Microarchitecture, 395-406, 2015
822015
TGPA: Tile-grained pipeline architecture for low latency CNN inference
X Wei, Y Liang, X Li, CH Yu, P Zhang, J Cong
2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 1-8, 2018
782018
A coordinated tiling and batching framework for efficient GEMM on GPUs
X Li, Y Liang, S Yan, L Jia, Y Li
Proceedings of the 24th symposium on principles and practice of parallel …, 2019
592019
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction
S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu, B Wu, X Li, S Yan, Y Liang
Proceedings of the 49th Annual International Symposium on Computer …, 2022
422022
Flashdecoding++: Faster large language model inference on gpus
K Hong, G Dai, J Xu, Q Mao, X Li, J Liu, K Chen, H Dong, Y Wang
arXiv preprint arXiv:2311.01282, 2023
212023
Enabling efficient fast convolution algorithms on GPUs via MegaKernels
L Jia, Y Liang, X Li, L Lu, S Yan
IEEE Transactions on Computers 69 (7), 986-997, 2020
202020
Performance-centric register file design for GPUs using racetrack memory
S Wang, Y Liang, C Zhang, X Xie, G Sun, Y Liu, Y Wang, X Li
2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), 25-30, 2016
202016
CRAT: Enabling coordinated register allocation and thread-level parallelism optimization for GPUs
X Xie, Y Liang, X Li, Y Wu, G Sun, T Wang, D Fan
IEEE Transactions on Computers 67 (6), 890-897, 2017
172017
Efficient kernel management on GPUs
X Li, Y Liang
2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 85-90, 2016
162016
Chimera: An analytical optimizing framework for effective compute-intensive operators fusion
S Zheng, S Chen, P Song, R Chen, X Li, S Yan, D Lin, J Leng, Y Liang
2023 IEEE International Symposium on High-Performance Computer Architecture …, 2023
142023
A survey on efficient inference for large language models
Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou, L Wang, Z Yuan, X Li, ...
arXiv preprint arXiv:2404.14294, 2024
132024
cuMBIR: An efficient framework for low-dose x-ray CT image reconstruction on GPUs
X Li, Y Liang, W Zhang, T Liu, H Li, G Luo, M Jiang
Proceedings of the 2018 International Conference on Supercomputing, 184-194, 2018
132018
Efficient kernel management on GPUs
Y Liang, X Li
ACM Transactions on Embedded Computing Systems (TECS) 16 (4), 1-24, 2017
132017
Exploring cache bypassing and partitioning for multi-tasking on GPUs
Y Liang, X Li, X Xie
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 9-16, 2017
122017
Neoflow: A flexible framework for enabling efficient compilation for high performance dnn training
S Zheng, R Chen, Y Jin, A Wei, B Wu, X Li, S Yan, Y Liang
IEEE Transactions on Parallel and Distributed Systems 33 (11), 3220-3232, 2021
112021
CuLDA: solving large-scale LDA Problems on GPUs
X Xie, Y Liang, X Li, W Tan
Proceedings of the 28th International Symposium on High-Performance Parallel …, 2019
82019
CuLDA_CGS: Solving large-scale LDA problems on GPUs
X Xie, Y Liang, X Li, W Tan
Proceedings of the 24th Symposium on Principles and Practice of Parallel …, 2019
62019
Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning
C Chen, X Li, Q Zhu, J Duan, P Sun, X Zhang, C Yang
Proceedings of the 29th ACM International Conference on Architectural …, 2024
22024
Theoretical linear convergence of deep unfolding network for block-sparse signal recovery
R Fu, Y Liu, X Li
Third International Conference on Computer Science and Communication …, 2022
22022
FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics
K Hong, G Dai, J Xu, Q Mao, X Li, J Liu, Y Dong, Y Wang
Proceedings of Machine Learning and Systems 6, 148-161, 2024
12024
The system can't perform the operation now. Try again later.
Articles 1–20