Large batch training of convolutional networks Y You, I Gitman, B Ginsburg arXiv preprint arXiv:1708.03888, 2017 | 1278* | 2017 |
Understanding the role of momentum in stochastic gradient methods I Gitman, H Lang, P Zhang, L Xiao Advances in Neural Information Processing Systems 32, 2019 | 113 | 2019 |
Comparison of batch normalization and weight normalization algorithms for the large-scale image classification I Gitman, B Ginsburg arXiv preprint arXiv:1709.08145, 2017 | 78 | 2017 |
Mixed-precision training for nlp and speech recognition with openseq2seq O Kuchaiev, B Ginsburg, I Gitman, V Lavrukhin, J Li, H Nguyen, C Case, ... arXiv preprint arXiv:1805.10387, 2018 | 51 | 2018 |
Openseq2seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models O Kuchaiev, B Ginsburg, I Gitman, V Lavrukhin, C Case, P Micikevicius Proceedings of Workshop for NLP Open Source Software (NLP-OSS), 41-46, 2018 | 43 | 2018 |
Openmathinstruct-1: A 1.8 million math instruction tuning dataset S Toshniwal, I Moshkov, S Narenthiran, D Gitman, F Jia, I Gitman arXiv preprint arXiv:2402.10176, 2024 | 42 | 2024 |
Nemotron-4 340B Technical Report B Adler, N Agarwal, A Aithal, DH Anh, P Bhattacharya, A Brundyn, ... arXiv preprint arXiv:2406.11704, 2024 | 37 | 2024 |
Large batch training of convolutional networks with layer-wise adaptive rate scaling B Ginsburg, I Gitman, Y You | 21 | 2018 |
Novel prediction techniques based on clusterwise linear regression I Gitman, J Chen, E Lei, A Dubrawski arXiv preprint arXiv:1804.10742, 2018 | 14 | 2018 |
Scaling SGD batch size to 32k for imagenet training. CoRR abs/1708.03888 (2017) Y You, I Gitman, B Ginsburg arXiv preprint arXiv:1708.03888, 2017 | 9 | 2017 |
Confidence-based ensembles of end-to-end speech recognition models I Gitman, V Lavrukhin, A Laptev, B Ginsburg arXiv preprint arXiv:2306.15824, 2023 | 5 | 2023 |
Convergence analysis of gradient descent algorithms with proportional updates I Gitman, D Dilipkumar, B Parr arXiv preprint arXiv:1801.03137, 2018 | 5 | 2018 |
Openmathinstruct-2: Accelerating ai for math with massive open-source instruction data S Toshniwal, W Du, I Moshkov, B Kisacanin, A Ayrapetyan, I Gitman arXiv preprint arXiv:2410.01560, 2024 | 2 | 2024 |
Weighted finite state transducer frameworks for conversational ai systems and applications A Laptev, V Bataev, I Gitman, B Ginsburg US Patent App. 18/355,653, 2024 | | 2024 |
Weighted finite state transducer frameworks for conversational ai systems and applications A Laptev, V Bataev, I Gitman, B Ginsburg US Patent App. 18/355,646, 2024 | | 2024 |
Powerful and Extensible WFST Framework for Rnn-Transducer Losses A Laptev, V Bataev, I Gitman, B Ginsburg ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | | 2023 |
Canonical Least Squares Clustering on Sparse Medical Data I Gitman, J Chen, A Dubrawski | | |