Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning H Liu, D Tam, M Muqeeth, J Mohta, T Huang, M Bansal, CA Raffel Advances in Neural Information Processing Systems 35, 1950-1965, 2022 | 618 | 2022 |
BLiMP: The benchmark of linguistic minimal pairs for English A Warstadt, A Parrish, H Liu, A Mohananey, W Peng, SF Wang, ... Transactions of the Association for Computational Linguistics 8, 377-392, 2020 | 393 | 2020 |
Intermediate-task transfer learning with pretrained models for natural language understanding: When and why does it work? Y Pruksachatkun, J Phang, H Liu, PM Htut, X Zhang, RY Pang, C Vania, ... arXiv preprint arXiv:2005.00628, 2020 | 294 | 2020 |
Learning which features matter: RoBERTa acquires a preference for linguistic generalizations (eventually) A Warstadt, Y Zhang, HS Li, H Liu, SR Bowman arXiv preprint arXiv:2010.05358, 2020 | 133 | 2020 |
Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs A Warstadt arXiv preprint arXiv:1909.02597, 2019 | 130 | 2019 |
jiant: A software toolkit for research on general-purpose text understanding models Y Pruksachatkun, P Yeres, H Liu, J Phang, PM Htut, A Wang, I Tenney, ... arXiv preprint arXiv:2003.02249, 2020 | 92* | 2020 |
English intermediate-task training improves zero-shot cross-lingual transfer too J Phang, I Calixto, PM Htut, Y Pruksachatkun, H Liu, C Vania, K Kann, ... arXiv preprint arXiv:2005.13013, 2020 | 69 | 2020 |
Counterfactually-augmented SNLI training data does not yield better generalization than unaugmented data W Huang, H Liu, SR Bowman arXiv preprint arXiv:2010.04762, 2020 | 40 | 2020 |
Comparing test sets with item response theory C Vania, PM Htut, W Huang, D Mungra, RY Pang, J Phang, H Liu, K Cho, ... arXiv preprint arXiv:2106.00840, 2021 | 32 | 2021 |
Soft merging of experts with adaptive routing M Muqeeth, H Liu, C Raffel arXiv preprint arXiv:2306.03745, 2023 | 21 | 2023 |
Fine-tuned transformers show clusters of similar representations across layers J Phang, H Liu, SR Bowman arXiv preprint arXiv:2109.08406, 2021 | 17 | 2021 |
Precise task formalization matters in Winograd schema evaluations H Liu, W Huang, DA Mungra, SR Bowman arXiv preprint arXiv:2010.04043, 2020 | 15 | 2020 |
Learning to route among specialized experts for zero-shot generalization M Muqeeth, H Liu, Y Liu, C Raffel arXiv preprint arXiv:2402.05859, 2024 | 12 | 2024 |
Git-theta: A git extension for collaborative development of machine learning models N Kandpal, B Lester, M Muqeeth, A Mascarenhas, M Evans, V Baskaran, ... International Conference on Machine Learning, 15708-15719, 2023 | 7 | 2023 |
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models B Pan, Y Shen, H Liu, M Mishra, G Zhang, A Oliva, C Raffel, R Panda arXiv preprint arXiv:2404.05567, 2024 | 6 | 2024 |
Memd: A diversity-promoting learning framework for short-text conversation M Zou, X Li, H Liu, ZH Deng Proceedings of the 27th International Conference on Computational …, 2018 | 4 | 2018 |
Retrieving Relevant and Diverse Image from Social Media Images. X Chen, H Liu, ZH Deng, Y Yang MediaEval, 2015 | 3 | 2015 |
A survey on model moerging: Recycling and routing among specialized experts for collaborative learning P Yadav, C Raffel, M Muqeeth, L Caccia, H Liu, T Chen, M Bansal, ... arXiv preprint arXiv:2408.07057, 2024 | 2 | 2024 |
Models with conditional computation learn suboptimal solutions M Mohammed, H Liu, C Raffel I Can't Believe It's Not Better Workshop: Understanding Deep Learning …, 2022 | 2 | 2022 |