Follow
Sharan Narang
Sharan Narang
Research Engineer, Meta AI
Verified email at meta.com
Title
Cited by
Cited by
Year
Exploring the limits of transfer learning with a unified text-to-text transformer
C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ...
The Journal of Machine Learning Research 21 (1), 5485-5551, 2020
115572020
Deep speech 2: End-to-end speech recognition in english and mandarin
D Amodei, S Ananthanarayanan, R Anubhai, J Bai, E Battenberg, C Case, ...
International conference on machine learning, 173-182, 2016
34122016
Palm: Scaling language modeling with pathways
A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ...
arXiv preprint arXiv:2204.02311, 2022
21122022
Mixed precision training
P Micikevicius, S Narang, J Alben, G Diamos, E Elsen, D Garcia, ...
arXiv preprint arXiv:1710.03740, 2017
14832017
Llama 2: Open foundation and fine-tuned chat models
H Touvron, L Martin, K Stone, P Albert, A Almahairi, Y Babaei, ...
arXiv preprint arXiv:2307.09288, 2023
9022023
Deep voice 3: Scaling text-to-speech with convolutional sequence learning
W Ping, K Peng, A Gibiansky, SO Arik, A Kannan, S Narang, J Raiman, ...
arXiv preprint arXiv:1710.07654, 2017
795*2017
Scaling instruction-finetuned language models
HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, Y Li, X Wang, ...
arXiv preprint arXiv:2210.11416, 2022
7892022
Deep learning scaling is predictable, empirically
J Hestness, S Narang, N Ardalani, G Diamos, H Jun, H Kianinejad, ...
arXiv preprint arXiv:1712.00409, 2017
5172017
Self-consistency improves chain of thought reasoning in language models
X Wang, J Wei, D Schuurmans, Q Le, E Chi, S Narang, A Chowdhery, ...
arXiv preprint arXiv:2203.11171, 2022
3532022
Exploring sparsity in recurrent neural networks
S Narang, E Elsen, G Diamos, S Sengupta
arXiv preprint arXiv:1704.05119, 2017
3282017
DSD: regularizing deep neural networks with dense-sparse-dense training flow
S Han, J Pool, S Narang, H Mao, S Tang, E Elsen, B Catanzaro, J Tran, ...
303*2016
Exploring the limits of transfer learning with a unified text-to-text transformer
A Roberts, C Raffel, K Lee, M Matena, N Shazeer, PJ Liu, S Narang, W Li, ...
2302019
Byt5: Towards a token-free future with pre-trained byte-to-byte models
L Xue, A Barua, N Constant, R Al-Rfou, S Narang, M Kale, A Roberts, ...
Transactions of the Association for Computational Linguistics 10, 291-306, 2022
2202022
Block-sparse recurrent neural networks
S Narang, E Undersander, G Diamos
arXiv preprint arXiv:1711.02782, 2017
1372017
Wt5?! training text-to-text models to explain their predictions
S Narang, C Raffel, K Lee, A Roberts, N Fiedel, K Malkan
arXiv preprint arXiv:2004.14546, 2020
1362020
Scaling Up Models and Data with and
A Roberts, HW Chung, A Levskaya, G Mishra, J Bradbury, D Andor, ...
arXiv preprint arXiv:2203.17189, 2022
822022
Do transformer modifications transfer across implementations and applications?
S Narang, HW Chung, Y Tay, W Fedus, T Fevry, M Matena, K Malkan, ...
arXiv preprint arXiv:2102.11972, 2021
732021
Scale efficiently: Insights from pre-training and fine-tuning transformers
Y Tay, M Dehghani, J Rao, W Fedus, S Abnar, HW Chung, S Narang, ...
arXiv preprint arXiv:2109.10686, 2021
702021
H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei. 2022. Scaling instruction-finetuned language models
HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, E Li, X Wang, ...
arXiv preprint arXiv:2210.11416, 0
51
Scaling laws vs model architectures: How does inductive bias influence scaling?
Y Tay, M Dehghani, S Abnar, HW Chung, W Fedus, J Rao, S Narang, ...
arXiv preprint arXiv:2207.10551, 2022
342022
The system can't perform the operation now. Try again later.
Articles 1–20