Gpipe: Efficient training of giant neural networks using pipeline parallelism Y Huang, Y Cheng, A Bapna, O Firat, D Chen, M Chen, HJ Lee, J Ngiam, ... Advances in neural information processing systems 32, 2019 | 1089 | 2019 |
Palm: Scaling language modeling with pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... arXiv preprint arXiv:2204.02311, 2022 | 947 | 2022 |
Theano: A Python framework for fast computation of mathematical expressions R Al-Rfou, G Alain, A Almahairi, C Angermueller, D Bahdanau, N Ballas, ... arXiv e-prints, arXiv: 1605.02688, 2016 | 884 | 2016 |
Multi-way, multilingual neural machine translation with a shared attention mechanism O Firat, K Cho, Y Bengio arXiv preprint arXiv:1601.01073, 2016 | 623 | 2016 |
Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation J Hu, S Ruder, A Siddhant, G Neubig, O Firat, M Johnson International Conference on Machine Learning, 4411-4421, 2020 | 622 | 2020 |
On using monolingual corpora in neural machine translation C Gulcehre, O Firat, K Xu, K Cho, L Barrault, HC Lin, F Bougares, ... arXiv preprint arXiv:1503.03535, 2015 | 546 | 2015 |
The best of both worlds: Combining recent advances in neural machine translation MX Chen, O Firat, A Bapna, M Johnson, W Macherey, G Foster, L Jones, ... arXiv preprint arXiv:1804.09849, 2018 | 465 | 2018 |
Massively multilingual neural machine translation R Aharoni, M Johnson, O Firat arXiv preprint arXiv:1903.00089, 2019 | 437 | 2019 |
Nematus: a toolkit for neural machine translation R Sennrich, O Firat, K Cho, A Birch, B Haddow, J Hitschler, ... arXiv preprint arXiv:1703.04357, 2017 | 418 | 2017 |
Gshard: Scaling giant models with conditional computation and automatic sharding D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ... arXiv preprint arXiv:2006.16668, 2020 | 405 | 2020 |
Massively multilingual neural machine translation in the wild: Findings and challenges N Arivazhagan, A Bapna, O Firat, D Lepikhin, M Johnson, M Krikun, ... arXiv preprint arXiv:1907.05019, 2019 | 290 | 2019 |
Simple, scalable adaptation for neural machine translation A Bapna, N Arivazhagan, O Firat arXiv preprint arXiv:1909.08478, 2019 | 268 | 2019 |
Zero-resource translation with multi-lingual neural machine translation O Firat, B Sankaran, Y Al-Onaizan, FTY Vural, K Cho arXiv preprint arXiv:1606.04164, 2016 | 234 | 2016 |
Theano: A Python framework for fast computation of mathematical expressions TTD Team, R Al-Rfou, G Alain, A Almahairi, C Angermueller, D Bahdanau, ... arXiv preprint arXiv:1605.02688, 2016 | 193 | 2016 |
Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... arXiv preprint arXiv:1902.08295, 2019 | 172 | 2019 |
Montreal neural machine translation systems for WMT’15 S Jean, O Firat, K Cho, R Memisevic, Y Bengio Proceedings of the tenth workshop on statistical machine translation, 134-140, 2015 | 167 | 2015 |
Does neural machine translation benefit from larger context? S Jean, S Lauly, O Firat, K Cho arXiv preprint arXiv:1704.05135, 2017 | 137 | 2017 |
Glam: Efficient scaling of language models with mixture-of-experts N Du, Y Huang, AM Dai, S Tong, D Lepikhin, Y Xu, M Krikun, Y Zhou, ... International Conference on Machine Learning, 5547-5569, 2022 | 123 | 2022 |
Revisiting character-based neural machine translation with capacity and compression C Cherry, G Foster, A Bapna, O Firat, W Macherey arXiv preprint arXiv:1808.09943, 2018 | 107 | 2018 |
On integrating a language model into neural machine translation C Gulcehre, O Firat, K Xu, K Cho, Y Bengio Computer Speech & Language 45, 137-148, 2017 | 106 | 2017 |