Quartznet: Deep automatic speech recognition with 1d time-channel separable convolutions S Kriman, S Beliaev, B Ginsburg, J Huang, O Kuchaiev, V Lavrukhin, ... ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 354 | 2020 |
Jasper: An end-to-end convolutional neural acoustic model J Li, V Lavrukhin, B Ginsburg, R Leary, O Kuchaiev, JM Cohen, H Nguyen, ... arXiv preprint arXiv:1904.03288, 2019 | 296 | 2019 |
Nemo: a toolkit for building ai applications using neural modules O Kuchaiev, J Li, H Nguyen, O Hrinchuk, R Leary, B Ginsburg, S Kriman, ... arXiv preprint arXiv:1909.09577, 2019 | 284 | 2019 |
Stochastic gradient methods with layer-wise adaptive moments for training of deep networks B Ginsburg, P Castonguay, O Hrinchuk, O Kuchaiev, V Lavrukhin, R Leary, ... arXiv preprint arXiv:1905.11286, 2019 | 109 | 2019 |
Hi-fi multi-speaker english tts dataset E Bakhturina, V Lavrukhin, B Ginsburg, Y Zhang arXiv preprint arXiv:2104.01497, 2021 | 99 | 2021 |
Citrinet: Closing the gap between non-autoregressive and autoregressive end-to-end models for automatic speech recognition S Majumdar, J Balam, O Hrinchuk, V Lavrukhin, V Noroozi, B Ginsburg arXiv preprint arXiv:2104.01721, 2021 | 75 | 2021 |
Training neural speech recognition systems with synthetic speech augmentation J Li, R Gadde, B Ginsburg, V Lavrukhin arXiv preprint arXiv:1811.00707, 2018 | 73 | 2018 |
Mixed-precision training for nlp and speech recognition with openseq2seq O Kuchaiev, B Ginsburg, I Gitman, V Lavrukhin, J Li, H Nguyen, C Case, ... arXiv preprint arXiv:1805.10387, 2018 | 51 | 2018 |
Spgispeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition PK O'Neill, V Lavrukhin, S Majumdar, V Noroozi, Y Zhang, O Kuchaiev, ... arXiv preprint arXiv:2104.02014, 2021 | 44 | 2021 |
SpeakerNet: 1D depth-wise separable convolutional network for text-independent speaker recognition and verification NR Koluguri, J Li, V Lavrukhin, B Ginsburg arXiv preprint arXiv:2010.12653, 2020 | 43 | 2020 |
Openseq2seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models O Kuchaiev, B Ginsburg, I Gitman, V Lavrukhin, C Case, P Micikevicius Proceedings of Workshop for NLP Open Source Software (NLP-OSS), 41-46, 2018 | 43 | 2018 |
Cross-language transfer learning, continuous learning, and domain adaptation for end-to-end automatic speech recognition J Huang, O Kuchaiev, P O'Neill, V Lavrukhin, J Li, A Flores, G Kucsko, ... arXiv preprint arXiv:2005.04290, 2020 | 29 | 2020 |
Cross-language transfer learning and domain adaptation for end-to-end automatic speech recognition J Luo, J Wang, N Cheng, E Xiao, J Xiao, G Kucsko, P O’Neill, J Balam, ... 2021 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2021 | 20 | 2021 |
Conformer-based target-speaker automatic speech recognition for single-channel audio Y Zhang, KC Puvvada, V Lavrukhin, B Ginsburg ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 18 | 2023 |
Training deep networks with stochastic gradient normalized by layerwise adaptive second moments B Ginsburg, P Castonguay, O Hrinchuk, O Kuchaiev, V Lavrukhin, R Leary, ... | 10 | 2019 |
Text-only domain adaptation for end-to-end asr using integrated text-to-mel-spectrogram generator V Bataev, R Korostik, E Shabalin, V Lavrukhin, B Ginsburg arXiv preprint arXiv:2302.14036, 2023 | 9 | 2023 |
Improving noise robustness of an end-to-end neural model for automatic speech recognition J Balam, J Huang, V Lavrukhin, S Deng, S Majumdar, B Ginsburg arXiv preprint arXiv:2010.12715, 2020 | 8 | 2020 |
LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models A Meister, M Novikov, N Karpov, E Bakhturina, V Lavrukhin, B Ginsburg 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-7, 2023 | 7 | 2023 |
A toolbox for construction and analysis of speech datasets E Bakhturina, V Lavrukhin, B Ginsburg arXiv preprint arXiv:2104.04896, 2021 | 7 | 2021 |
Less is more: Accurate speech recognition & translation without web-scale data KC Puvvada, P Żelasko, H Huang, O Hrinchuk, NR Koluguri, K Dhawan, ... arXiv preprint arXiv:2406.19674, 2024 | 6 | 2024 |