End-to-end learning of visual representations from uncurated instructional videos A Miech, JB Alayrac, L Smaira, I Laptev, J Sivic, A Zisserman Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 813 | 2020 |
Self-supervised multimodal versatile networks JB Alayrac, A Recasens, R Schneider, R Arandjelović, J Ramapuram, ... Advances in neural information processing systems 33, 25-37, 2020 | 428 | 2020 |
A short note on the kinetics-700-2020 human action dataset L Smaira, J Carreira, E Noland, E Clancy, A Wu, A Zisserman arXiv preprint arXiv:2010.10864, 2020 | 175 | 2020 |
Tap-vid: A benchmark for tracking any point in a video C Doersch, A Gupta, L Markeeva, A Recasens, L Smaira, Y Aytar, ... Advances in Neural Information Processing Systems 35, 13610-13626, 2022 | 128 | 2022 |
Towards learning universal audio representations L Wang, P Luc, Y Wu, A Recasens, L Smaira, A Brock, A Jaegle, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 76 | 2022 |
Perception test: A diagnostic benchmark for multimodal video models V Patraucean, L Smaira, A Gupta, A Recasens, L Markeeva, D Banarse, ... Advances in Neural Information Processing Systems 36, 2024 | 61 | 2024 |
Visual grounding in video for unsupervised word translation GA Sigurdsson, JB Alayrac, A Nematzadeh, L Smaira, M Malinowski, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 56 | 2020 |
Zorro: the masked multimodal transformer A Recasens, J Lin, J Carreira, D Jaegle, L Wang, J Alayrac, P Luc, ... arXiv preprint arXiv:2301.09595, 2023 | 22 | 2023 |
A short note on the kinetics-700-2020 human action dataset. arXiv 2020 L Smaira, J Carreira, E Noland, E Clancy, A Wu, A Zisserman arXiv preprint arXiv:2010.10864, 2010 | 15 | 2010 |
Perception test: A diagnostic benchmark for multimodal video models V Pătrăucean, L Smaira, A Gupta, AR Continente, L Markeeva, D Banarse, ... arXiv preprint arXiv:2305.13786, 2023 | 9 | 2023 |
Human-agent cooperation in bridge bidding E Lockhart, N Burch, N Bard, S Borgeaud, T Eccles, L Smaira, R Smith arXiv preprint arXiv:2011.14124, 2020 | 7 | 2020 |
Recognizing multimodal entailment C Ilharco, A Shirazi, A Gopalan, A Nagrani, B Bratanic, C Bregler, C Funk, ... Proceedings of the 59th Annual Meeting of the Association for Computational …, 2021 | 1 | 2021 |
Recognizing Multimodal Entailment (tutorial at ACL 2021) AH Shirazi, A Gopalan, A Nagrani, CI Magalhaes, F Ferreira, GF Barcik, ... | | 2021 |
Supplementary Material for TAP-Vid: A Benchmark for Tracking Any Point in a Video C Doersch, A Gupta, L Markeeva, A Recasens, L Smaira, Y Aytar, ... | | |