Transformer transducer: A streamable speech recognition model with transformer encoders and rnn-t loss Q Zhang, H Lu, H Sak, A Tripathi, E McDermott, S Koo, S Kumar ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 469 | 2020 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 467 | 2023 |
Learning character-level compositionality with visual features F Liu, H Lu, C Lo, G Neubig arXiv preprint arXiv:1704.04859, 2017 | 76 | 2017 |
Handling homographs in neural machine translation F Liu, H Lu, G Neubig arXiv preprint arXiv:1708.06510, 2017 | 67 | 2017 |
Turn-to-diarize: Online speaker diarization constrained by transformer transducer speaker turn detection W Xia, H Lu, Q Wang, A Tripathi, Y Huang, IL Moreno, H Sak ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 46 | 2022 |
Monotonic recurrent neural network transducer and decoding strategies A Tripathi, H Lu, H Sak, H Soltau 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2019 | 46 | 2019 |
Transformer transducer: One model unifying streaming and non-streaming speech recognition A Tripathi, J Kim, Q Zhang, H Lu, H Sak arXiv preprint arXiv:2010.03192, 2020 | 40 | 2020 |
End-to-end multi-talker overlapping speech recognition A Tripathi, H Lu, H Sak ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 35 | 2020 |
Multilingual Speech Recognition with Self-Attention Structured Parameterization. Y Zhu, P Haghani, A Tripathi, B Ramabhadran, B Farris, H Xu, H Lu, ... INTERSPEECH, 4741-4745, 2020 | 27 | 2020 |
Reducing streaming ASR model delay with self alignment J Kim, H Lu, A Tripathi, Q Zhang, H Sak arXiv preprint arXiv:2105.05005, 2021 | 19 | 2021 |
Contrastive siamese network for semi-supervised speech recognition S Khorram, J Kim, A Tripathi, H Lu, Q Zhang, H Sak ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 12 | 2022 |
An event reconstruction tool for conflict monitoring using social media J Liang, D Fan, H Lu, P Huang, J Chen, L Jiang, A Hauptmann Proceedings of the AAAI Conference on Artificial Intelligence 31 (1), 2017 | 11 | 2017 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024 | 8 | 2024 |
Augmenting transformer-transducer based speaker change detection with token-level training loss G Zhao, Q Wang, H Lu, Y Huang, IL Moreno ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 8 | 2023 |
Highly efficient real-time streaming and fully on-device speaker diarization with multi-stage clustering Q Wang, Y Huang, H Lu, G Zhao, IL Moreno arXiv preprint arXiv:2210.13690, 2022 | 6 | 2022 |
Videos from the 2013 boston marathon: An event reconstruction dataset for synchronization and localization J Chen, J Liang, H Lu, SI Yu, AG Hauptmann Carnegie Mellon University, 2016 | 5 | 2016 |
Transformer transducer: one model unifying streaming and non-streaming speech recognition A Tripathi, H Sak, H Lu, Q Zhang, JY Kim US Patent 11,741,947, 2023 | 3 | 2023 |
Contrastive Siamese network for semi-supervised speech recognition JY Kim, S Khorram, H Sak, A Tripathi, H Lu, Q Zhang US Patent 11,961,515, 2024 | 2 | 2024 |
End-to-end multi-talker overlapping speech recognition A Tripathi, H Lu, H Sak US Patent 11,521,595, 2022 | 2 | 2022 |
USM-SCD: Multilingual speaker change detection based on large pretrained foundation models G Zhao, Y Wang, J Pelecanos, Y Zhang, H Liao, Y Huang, H Lu, Q Wang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 1 | 2024 |