Gluoncv and gluonnlp: Deep learning in computer vision and natural language processing J Guo, H He, T He, L Lausen, M Li, H Lin, X Shi, C Wang, J Xie, S Zha, ... The Journal of Machine Learning Research 21 (1), 845-851, 2020 | 170 | 2020 |
Communication-efficient distributed blockwise momentum SGD with error-feedback S Zheng, Z Huang, J Kwok Advances in Neural Information Processing Systems 32, 2019 | 87 | 2019 |
Fast-and-Light Stochastic ADMM. S Zheng, JT Kwok IJCAI, 2407-2613, 2016 | 56 | 2016 |
Asynchronous Distributed Semi-Stochastic Gradient Optimization R Zhang, S Zheng, JT Kwok AAAI, 2323-2329, 2016 | 43* | 2016 |
Cser: Communication-efficient sgd with error reset C Xie, S Zheng, S Koyejo, I Gupta, M Li, H Lin Advances in Neural Information Processing Systems 33, 12593-12603, 2020 | 22 | 2020 |
Accelerated large batch optimization of bert pretraining in 54 minutes S Zheng, H Lin, S Zha, M Li arXiv preprint arXiv:2006.13484, 2020 | 14 | 2020 |
Follow the moving leader in deep learning S Zheng, JT Kwok International Conference on Machine Learning, 4110-4119, 2017 | 14 | 2017 |
Stochastic variance-reduced admm S Zheng, JT Kwok arXiv preprint arXiv:1604.07070, 2016 | 13 | 2016 |
Lightweight Stochastic Optimization for Minimizing Finite Sums with Infinite Data S Zheng, JT Kwok International Conference on Machine Learning, 5932-5940, 2018 | 8 | 2018 |
Partial and asymmetric contrastive learning for out-of-distribution detection in long-tailed recognition H Wang, A Zhang, Y Zhu, S Zheng, M Li, AJ Smola, Z Wang International Conference on Machine Learning, 23446-23458, 2022 | 5 | 2022 |
Alexa teacher model: Pretraining and distilling multi-billion-parameter encoders for natural language understanding systems J FitzGerald, S Ananthakrishnan, K Arkoudas, D Bernardi, A Bhagia, ... | 5 | 2022 |
Compressed communication for distributed training: Adaptive methods and system Y Zhong, C Xie, S Zheng, H Lin arXiv preprint arXiv:2105.07829, 2021 | 5 | 2021 |
Removing batch normalization boosts adversarial training H Wang, A Zhang, S Zheng, X Shi, M Li, Z Wang International Conference on Machine Learning, 23433-23445, 2022 | 3 | 2022 |
MiCS: near-linear scaling for training gigantic model on public cloud Z Zhang, S Zheng, Y Wang, J Chiu, G Karypis, T Chilimbi, M Li, X Jin arXiv preprint arXiv:2205.00119, 2022 | 2 | 2022 |
SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning MS Bari, A Zhang, S Zheng, X Shi, Y Zhu, S Joty, M Li arXiv preprint arXiv:2212.10929, 2022 | 1 | 2022 |
Context, language modeling, and multimodal data in finance S Das, C Goggins, J He, G Karypis, S Krishnamurthy, M Mahajan, ... The Journal of Financial Data Science 3 (3), 52-66, 2021 | 1 | 2021 |
Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning S Zheng, JT Kwok arXiv preprint arXiv:1905.09899, 2019 | 1 | 2019 |
Stochastic Optimization for Machine Learning S Zheng https://szhengac.github.io/papers/pqe.pdf, 2017 | 1 | 2017 |
Fast nonsmooth regularized risk minimization with continuation RZ ShuaiZheng, JT Kwok AAAI, 2393-2399, 2016 | 1* | 2016 |
SMILE: Scaling Mixture-of-Experts with Efficient Bi-level Routing C He, S Zheng, A Zhang, G Karypis, T Chilimbi, M Soltanolkotabi, ... arXiv preprint arXiv:2212.05191, 2022 | | 2022 |