Efficient transformers: A survey Y Tay, M Dehghani, D Bahri, D Metzler ACM Computing Surveys 55 (6), 1-28, 2022 | 648 | 2022 |
Berkeley advanced reconstruction toolbox M Uecker, F Ong, JI Tamir, D Bahri, P Virtue, JY Cheng, T Zhang, M Lustig Proc. Intl. Soc. Mag. Reson. Med 23 (2486), 2015 | 371 | 2015 |
Long range arena: A benchmark for efficient transformers Y Tay, M Dehghani, S Abnar, Y Shen, D Bahri, P Pham, J Rao, L Yang, ... arXiv preprint arXiv:2011.04006, 2020 | 279 | 2020 |
Synthesizer: Rethinking self-attention for transformer models Y Tay, D Bahri, D Metzler, DC Juan, Z Zhao, C Zheng International conference on machine learning, 10183-10192, 2021 | 231 | 2021 |
Sparse sinkhorn attention Y Tay, D Bahri, L Yang, D Metzler, DC Juan International Conference on Machine Learning, 9438-9447, 2020 | 190 | 2020 |
Ext5: Towards extreme multi-task scaling for transfer learning V Aribandi, Y Tay, T Schuster, J Rao, HS Zheng, SV Mehta, H Zhuang, ... arXiv preprint arXiv:2111.10952, 2021 | 92 | 2021 |
Charformer: Fast character transformers via gradient-based subword tokenization Y Tay, VQ Tran, S Ruder, J Gupta, HW Chung, D Bahri, Z Qin, ... arXiv preprint arXiv:2106.12672, 2021 | 67 | 2021 |
Unifying language learning paradigms Y Tay, M Dehghani, VQ Tran, X Garcia, D Bahri, T Schuster, HS Zheng, ... arXiv preprint arXiv:2205.05131, 2022 | 59 | 2022 |
Are pre-trained convolutions better than pre-trained transformers? Y Tay, M Dehghani, J Gupta, D Bahri, V Aribandi, Z Qin, D Metzler arXiv preprint arXiv:2105.03322, 2021 | 55 | 2021 |
Transformer memory as a differentiable search index Y Tay, V Tran, M Dehghani, J Ni, D Bahri, H Mehta, Z Qin, K Hui, Z Zhao, ... Advances in Neural Information Processing Systems 35, 21831-21843, 2022 | 54 | 2022 |
Deep k-nn for noisy labels D Bahri, H Jiang, M Gupta International Conference on Machine Learning, 540-550, 2020 | 46 | 2020 |
Rethinking search: making domain experts out of dilettantes D Metzler, Y Tay, D Bahri, M Najork ACM SIGIR Forum 55 (1), 1-27, 2021 | 44 | 2021 |
Scarf: Self-supervised contrastive learning using random feature corruption D Bahri, H Jiang, Y Tay, D Metzler arXiv preprint arXiv:2106.15147, 2021 | 41 | 2021 |
Sharpness-aware minimization improves language model generalization D Bahri, H Mobahi, Y Tay arXiv preprint arXiv:2110.08529, 2021 | 36 | 2021 |
Structformer: Joint unsupervised induction of dependency and constituency structure from masked language modeling Y Shen, Y Tay, C Zheng, D Bahri, D Metzler, A Courville arXiv preprint arXiv:2012.00857, 2020 | 27 | 2020 |
Hypergrid transformers: Towards a single model for multiple tasks Y Tay, Z Zhao, D Bahri, D Metzler, DC Juan | 25 | 2021 |
Omninet: Omnidirectional representations from transformers Y Tay, M Dehghani, V Aribandi, J Gupta, PM Pham, Z Qin, D Bahri, ... International Conference on Machine Learning, 10193-10202, 2021 | 24 | 2021 |
Diminishing returns shape constraints for interpretability and regularization M Gupta, D Bahri, A Cotter, K Canini Advances in neural information processing systems 31, 2018 | 24 | 2018 |
Confident adaptive language modeling T Schuster, A Fisch, J Gupta, M Dehghani, D Bahri, V Tran, Y Tay, ... Advances in Neural Information Processing Systems 35, 17456-17472, 2022 | 20 | 2022 |
Encased cantilevers for low-noise force and mass sensing in liquids D Ziegler, A Klaassen, D Bahri, D Chmielewski, A Nievergelt, F Mugele, ... 2014 IEEE 27th International Conference on Micro Electro Mechanical Systems …, 2014 | 14 | 2014 |