Follow
Han Lu
Han Lu
Verified email at google.com
Title
Cited by
Cited by
Year
Transformer transducer: A streamable speech recognition model with transformer encoders and rnn-t loss
Q Zhang, H Lu, H Sak, A Tripathi, E McDermott, S Koo, S Kumar
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
4692020
Gemini: a family of highly capable multimodal models
G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ...
arXiv preprint arXiv:2312.11805, 2023
4672023
Learning character-level compositionality with visual features
F Liu, H Lu, C Lo, G Neubig
arXiv preprint arXiv:1704.04859, 2017
762017
Handling homographs in neural machine translation
F Liu, H Lu, G Neubig
arXiv preprint arXiv:1708.06510, 2017
672017
Turn-to-diarize: Online speaker diarization constrained by transformer transducer speaker turn detection
W Xia, H Lu, Q Wang, A Tripathi, Y Huang, IL Moreno, H Sak
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022
462022
Monotonic recurrent neural network transducer and decoding strategies
A Tripathi, H Lu, H Sak, H Soltau
2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2019
462019
Transformer transducer: One model unifying streaming and non-streaming speech recognition
A Tripathi, J Kim, Q Zhang, H Lu, H Sak
arXiv preprint arXiv:2010.03192, 2020
402020
End-to-end multi-talker overlapping speech recognition
A Tripathi, H Lu, H Sak
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
352020
Multilingual Speech Recognition with Self-Attention Structured Parameterization.
Y Zhu, P Haghani, A Tripathi, B Ramabhadran, B Farris, H Xu, H Lu, ...
INTERSPEECH, 4741-4745, 2020
272020
Reducing streaming ASR model delay with self alignment
J Kim, H Lu, A Tripathi, Q Zhang, H Sak
arXiv preprint arXiv:2105.05005, 2021
192021
Contrastive siamese network for semi-supervised speech recognition
S Khorram, J Kim, A Tripathi, H Lu, Q Zhang, H Sak
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022
122022
An event reconstruction tool for conflict monitoring using social media
J Liang, D Fan, H Lu, P Huang, J Chen, L Jiang, A Hauptmann
Proceedings of the AAAI Conference on Artificial Intelligence 31 (1), 2017
112017
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ...
arXiv preprint arXiv:2403.05530, 2024
82024
Augmenting transformer-transducer based speaker change detection with token-level training loss
G Zhao, Q Wang, H Lu, Y Huang, IL Moreno
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
82023
Highly efficient real-time streaming and fully on-device speaker diarization with multi-stage clustering
Q Wang, Y Huang, H Lu, G Zhao, IL Moreno
arXiv preprint arXiv:2210.13690, 2022
62022
Videos from the 2013 boston marathon: An event reconstruction dataset for synchronization and localization
J Chen, J Liang, H Lu, SI Yu, AG Hauptmann
Carnegie Mellon University, 2016
52016
Transformer transducer: one model unifying streaming and non-streaming speech recognition
A Tripathi, H Sak, H Lu, Q Zhang, JY Kim
US Patent 11,741,947, 2023
32023
Contrastive Siamese network for semi-supervised speech recognition
JY Kim, S Khorram, H Sak, A Tripathi, H Lu, Q Zhang
US Patent 11,961,515, 2024
22024
End-to-end multi-talker overlapping speech recognition
A Tripathi, H Lu, H Sak
US Patent 11,521,595, 2022
22022
USM-SCD: Multilingual speaker change detection based on large pretrained foundation models
G Zhao, Y Wang, J Pelecanos, Y Zhang, H Liao, Y Huang, H Lu, Q Wang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
12024
The system can't perform the operation now. Try again later.
Articles 1–20