Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024 | 358 | 2024 |
Universal phone recognition with a multilingual allophone system X Li, S Dalmia, J Li, M Lee, P Littell, J Yao, A Anastasopoulos, ... ICASSP 2020, 2020 | 131 | 2020 |
Adversarial music: Real world audio adversary against wake-word detection system J Li, S Qu, X Li, J Szurley, JZ Kolter, F Metze NeurIPS 2019, 2019 | 82 | 2019 |
Machine Listening for Heart Status Monitoring: Introducing and Benchmarking HSS–the Heart Sounds Shenzhen Corpus BS Fengquan Dong, Kun Qian, Ren Zhao, Alice Baird, Xinjian Li, Zhenyu Dai ... IEEE Journal of Biomedical and Health Informatics, 1-13, 2019 | 47* | 2019 |
Towards Zero-shot Learning for Automatic Phonemic Transcription X Li, S Dalmia, DR Mortensen, J Li, AW Black, F Metze AAAI 2020, 2020 | 33 | 2020 |
Reproducing whisper-style training using an open-source toolkit and publicly available data Y Peng, J Tian, B Yan, D Berrebbi, X Chang, X Li, J Shi, S Arora, W Chen, ... 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-8, 2023 | 30 | 2023 |
Multilingual Speech Recognition with Corpus Relatedness Sampling X Li, S Dalmia, AW Black, F Metze 20th Annual Conference of the International Speech Communication Association …, 2019 | 25 | 2019 |
ASR2K: Speech Recognition for Around 2000 Languages without Audio X Li, F Metze, DR Mortensen, AW Black, S Watanabe Interspeech 2022, 2022 | 23 | 2022 |
Zero-shot learning for grapheme to phoneme conversion with language ensemble X Li, F Metze, DR Mortensen, S Watanabe, AW Black Findings of the Association for Computational Linguistics: ACL 2022, 2106-2115, 2022 | 19 | 2022 |
Domain robust feature extraction for rapid low resource asr development S Dalmia, X Li, F Metze, AW Black 2018 IEEE Spoken Language Technology Workshop (SLT), 258-265, 2018 | 19 | 2018 |
Textless Direct Speech-to-Speech Translation with Discrete Speech Representation X Li, Y Jia, CC Chiu ICASSP 2023, 2022 | 17 | 2022 |
Acoustics based intent recognition using discovered phonetic units for low resource languages A Gupta, X Li, SK Rallabandi, AW Black ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 15 | 2021 |
A summary of the first workshop on language technology for language documentation and revitalization G Neubig, S Rijhwani, A Palmer, J MacKenzie, H Cruz, X Li, M Lee, ... arXiv preprint arXiv:2004.13203, 2020 | 15 | 2020 |
Hierarchical Phone Recognition with Compositional Phonetics X Li, J Li, F Metze, AW Black Proc. Interspeech 2021, 2461-2465, 2021 | 14 | 2021 |
AlloVera: a multilingual allophone database DR Mortensen, X Li, P Littell, A Michaud, S Rijhwani, A Anastasopoulos, ... LREC 2020, 2020 | 14 | 2020 |
Multilingual phonetic dataset for low resource speech recognition X Li, DR Mortensen, F Metze, AW Black ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 13 | 2021 |
Learning to speak from text: Zero-shot multilingual text-to-speech with unsupervised text pretraining T Saeki, S Maiti, X Li, S Watanabe, S Takamichi, H Saruwatari arXiv preprint arXiv:2301.12596, 2023 | 12 | 2023 |
Towards Context-Aware End-to-End Code-Switching Speech Recognition Z Qiu, Y Li, X Li, F Metze, WM Campbell, AA AI Interspeech 2020, 2020 | 12 | 2020 |
The ariel-cmu systems for lorehlt18 A Chaudhary, S Dalmia, J Hu, X Li, A Matthews, AO Muis, N Otani, ... arXiv preprint arXiv:1902.08899, 2019 | 10 | 2019 |
On Prosody Modeling for ASR+ TTS based Voice Conversion WC Huang, T Hayashi, X Li, S Watanabe, T Toda ASRU 2021, 2021 | 9 | 2021 |