Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2146 | 2023 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 671* | 2024 |
Self-supervised multimodal versatile networks JB Alayrac, A Recasens, R Schneider, R Arandjelović, J Ramapuram, ... Advances in neural information processing systems 33, 25-37, 2020 | 425 | 2020 |
Gaze360: Physically unconstrained gaze estimation in the wild P Kellnhofer, A Recasens, S Stent, W Matusik, A Torralba Proceedings of the IEEE/CVF international conference on computer vision …, 2019 | 398 | 2019 |
Where are they looking? A Recasens Continente, A Khosla, C Vondrick, A Torralba Neural Information Processing Systems Foundation, 2015 | 300* | 2015 |
Context based emotion recognition using emotic dataset R Kosti, JM Alvarez, A Recasens, A Lapedriza IEEE transactions on pattern analysis and machine intelligence 42 (11), 2755 …, 2019 | 261 | 2019 |
Emotion recognition in context R Kosti, JM Alvarez, A Recasens, A Lapedriza Proceedings of the IEEE conference on computer vision and pattern …, 2017 | 248 | 2017 |
Jointly discovering visual objects and spoken words from raw sensory input D Harwath, A Recasens, D Surķs, G Chuang, A Torralba, J Glass Proceedings of the European conference on computer vision (ECCV), 649-665, 2018 | 245 | 2018 |
Where should saliency models look next? Z Bylinskii, A Recasens, A Borji, A Oliva, A Torralba, F Durand Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The …, 2016 | 184 | 2016 |
Learning to zoom: a saliency-based sampling layer for neural networks A Recasens, P Kellnhofer, S Stent, W Matusik, A Torralba Proceedings of the European conference on computer vision (ECCV), 51-66, 2018 | 169 | 2018 |
Broaden your views for self-supervised video learning A Recasens, P Luc, JB Alayrac, L Wang, F Strub, C Tallec, M Malinowski, ... Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 141 | 2021 |
Tap-vid: A benchmark for tracking any point in a video C Doersch, A Gupta, L Markeeva, A Recasens, L Smaira, Y Aytar, ... Advances in Neural Information Processing Systems 35, 13610-13626, 2022 | 121 | 2022 |
Game Plan: What AI can do for Football, and What Football can do for AI K Tuyls, S Omidshafiei, P Muller, Z Wang, J Connor, D Hennes, I Graham, ... Journal of Artificial Intelligence Research 71, 41-88, 2021 | 110 | 2021 |
Following gaze in video A Recasens, C Vondrick, A Khosla, A Torralba Proceedings of the IEEE International Conference on Computer Vision, 1435-1443, 2017 | 107 | 2017 |
Emotic: Emotions in context dataset R Kosti, JM Alvarez, A Recasens, A Lapedriza Proceedings of the IEEE conference on computer vision and pattern …, 2017 | 83 | 2017 |
Towards learning universal audio representations L Wang, P Luc, Y Wu, A Recasens, L Smaira, A Brock, A Jaegle, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 76 | 2022 |
Perception test: A diagnostic benchmark for multimodal video models V Patraucean, L Smaira, A Gupta, A Recasens, L Markeeva, D Banarse, ... Advances in Neural Information Processing Systems 36, 2024 | 53 | 2024 |
Multimodal self-supervised learning of general audio representations L Wang, P Luc, A Recasens, JB Alayrac, A Oord arXiv preprint arXiv:2104.12807, 2021 | 50 | 2021 |
Understanding infographics through textual and visual tag prediction Z Bylinskii, S Alsheikh, S Madan, A Recasens, K Zhong, H Pfister, ... arXiv preprint arXiv:1709.09215, 2017 | 41 | 2017 |
Synthetically trained icon proposals for parsing and summarizing infographics S Madan, Z Bylinskii, M Tancik, A Recasens, K Zhong, S Alsheikh, ... arXiv preprint arXiv:1807.10441, 2018 | 27 | 2018 |