Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 724 | 2024 |
Just ask: Learning to answer questions from millions of narrated videos A Yang, A Miech, J Sivic, I Laptev, C Schmid Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 303 | 2021 |
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models A Yang, A Miech, J Sivic, I Laptev, C Schmid Advances in Neural Information Processing Systems 35, 124-141, 2022 | 215 | 2022 |
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning A Yang, A Nagrani, PH Seo, A Miech, J Pont-Tuset, I Laptev, J Sivic, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 211 | 2023 |
NAS evaluation is frustratingly hard A Yang, PM Esperança, FM Carlucci International Conference on Learning Representations, 2020 | 209 | 2020 |
TubeDETR: Spatio-Temporal Video Grounding with Transformers A Yang, A Miech, J Sivic, I Laptev, C Schmid Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 101 | 2022 |
Learning to Answer Visual Questions from Web Videos A Yang, A Miech, J Sivic, I Laptev, C Schmid IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022 | 39 | 2022 |
MANAS: multi-agent neural architecture search V Lopes, FM Carlucci, P Esperanca, M Singh, A Yang, V Gabillon, H Xu, ... Machine Learning, 1-24, 2023 | 33* | 2023 |
Covr: Learning composed video retrieval from web video captions L Ventura, A Yang, C Schmid, G Varol Proceedings of the AAAI Conference on Artificial Intelligence 38 (6), 5270-5279, 2024 | 30 | 2024 |
VidChapters-7M: Video Chapters at Scale A Yang, A Nagrani, I Laptev, J Sivic, C Schmid Advances in Neural Information Processing Systems 36, 2023 | 22 | 2023 |
Just ask: Learning to answer questions from millions of narrated videos. 2021 IEEE A Yang, A Miech, J Sivic, I Laptev, C Schmid CVF International Conference on Computer Vision (ICCV), 1666-1677, 2020 | 8 | 2020 |
CoVR-2: Automatic Data Construction for Composed Video Retrieval L Ventura, A Yang, C Schmid, G Varol IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | | 2024 |
Learning Visual Language Models for Video Understanding A Yang Ecole Normale Superieure de Paris-ENS Paris, 2023 | | 2023 |
VidChapters-7M: Video Chapters at Scale Supplementary Material A Yang, A Nagrani, I Laptev, J Sivic, C Schmid | | |
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models Supplementary Material A Yang, A Miech, J Sivic, I Laptev, C Schmid | | |
TubeDETR: Spatio-Temporal Video Grounding with Transformers Supplementary Material A Yang, A Miech, J Sivic, I Laptev, C Schmid | | |
Just Ask: Learning to Answer Questions from Millions of Narrated Videos Supplementary Material A Yang, A Miech, J Sivic, I Laptev, C Schmid | | |