Follow
Yu Emma Wang
Title
Cited by
Cited by
Year
Gemini: a family of highly capable multimodal models
G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ...
arXiv preprint arXiv:2312.11805, 2023
13512023
Glam: Efficient scaling of language models with mixture-of-experts
N Du, Y Huang, AM Dai, S Tong, D Lepikhin, Y Xu, M Krikun, Y Zhou, ...
International Conference on Machine Learning, 5547-5569, 2022
4622022
A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms
YE Wang, GY Wei, D Brooks
the 3rd MLSys Conference, 2020
68*2020
Predicting New Workload or CPU Performance by Analyzing Public Datasets
YE Wang, V Lee, GY Wei, D Brooks
ACM Transactions on Architecture and Code Optimization 15 (4), 53, 2019
352019
Exploiting parallelism opportunities with deep learning frameworks
YE Wang, CJ Wu, X Wang, K Hazelwood, D Brooks
ACM Transactions on Architecture and Code Optimization (TACO) 18 (1), 1-23, 2020
322020
A flexible approach to autotuning multi-pass machine learning compilers
PM Phothilimthana, A Sabne, N Sarda, KS Murthy, Y Zhou, ...
2021 30th International Conference on Parallel Architectures and Compilation …, 2021
272021
Gemma 2: Improving open language models at a practical size
G Team, M Riviere, S Pathak, PG Sessa, C Hardin, S Bhupatiraju, ...
arXiv preprint arXiv:2408.00118, 2024
182024
Exploring the limits of Concurrency in ML Training on Google TPUs
S Kumar, Y Wang, C Young, J Bradbury, N Kumar, D Chen, A Swing
Proceedings of Machine Learning and Systems 3, 81-92, 2021
182021
AutoDistill: an end-to-end framework to explore and distill hardware-efficient language models
X Zhang, Z Zhou, D Chen, YE Wang
arXiv preprint arXiv:2201.08539, 2022
102022
Demystifying Bayesian Inference Workloads
YE Wang, Y Zhu, GG Ko, B Reagen, GY Wei, D Brooks
IEEE International Symposium on Performance Analysis of Systems and Software, 2019
102019
A Learning Algorithm for Bayesian Networks and Its Efficient Implementation on GPUs
Y Wang, W Qian, S Zhang, X Liang, B Yuan
IEEE Transactions on Parallel and Distributed Systems 27 (1), 17-30, 2016
102016
Yazhou Zu, Yuanzhong Xu, and Andy Swing. Exploring the limits of concurrency in ml training on google TPUs
S Kumar, J Bradbury, C Young, YE Wang, A Levskaya, B Hechtman, ...
arXiv preprint arXiv:2011.03641 1 (1), 1.2, 2020
82020
Exploring hardware profile-guided green datacenter scheduling
W Tang, Y Wang, H Liu, T Zhang, C Li, X Liang
2015 44th International Conference on Parallel Processing, 11-20, 2015
52015
Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization
CJS Schaefer, N Lambert-Shirzad, X Zhang, C Chou, T Jablin, J Li, E Guo, ...
arXiv preprint arXiv:2306.04879, 2023
22023
Mixed precision post training quantization of neural networks with sensitivity guided search
CJS Schaefer, E Guo, C Stanton, X Zhang, T Jablin, N Lambert-Shirzad, ...
arXiv preprint arXiv:2302.01382, 2023
12023
Workload scheduling using queues with different priorities
Y Wang, TB Jablin, CK Stanton
US Patent US20240118920A1, 2024
2024
Deploying optimization profiles for compiling computer programs in data centers
Y Wang, D Chen, PM Phothilimthana
US Patent US20240118875A1, 2024
2024
Caching compilation outputs using optimization profiles
H Kim, X Yu, Y Wang, PM PHOTHILIMTHANA
US Patent WO2023234952A1, 2023
2023
Hadamard Domain Training with Integers for Class Incremental Quantized Learning
M Schiemer, CJS Schaefer, JP Vap, MJ Horeni, YE Wang, J Ye, S Joshi
arXiv preprint arXiv:2310.03675, 2023
2023
Sparsely Activated Language Models are Efficient In-Context Learners
A Yu, A Dai, C Cui, DD Lepikhin, E Wang, K Meier-Hellstern, K Webster, ...
2022
The system can't perform the operation now. Try again later.
Articles 1–20