Sphinx: The joint mixing of weights, tasks, and visual embeddings for multi-modal large language models Z Lin*, C Liu*, R Zhang*, P Gao*, L Qiu*, H Xiao, H Qiu, C Lin, W Shao, ... arXiv preprint arXiv:2311.07575, 2023 | 60 | 2023 |
Calip: Zero-shot enhancement of clip with parameter-free attention Z Guo*, R Zhang*, L Qiu*, X Ma, X Miao, X He, B Cui Proceedings of the AAAI Conference on Artificial Intelligence 37 (1), 746-754, 2023 | 54 | 2023 |
Vt-clip: Enhancing vision-language models with visual-guided texts L Qiu, R Zhang, Z Guo, Z Zeng, Z Guo, Y Li, G Zhang arXiv preprint arXiv:2112.02399, 2021 | 36 | 2021 |
Joint-mae: 2d-3d joint masked autoencoders for 3d point cloud pre-training Z Guo, R Zhang, L Qiu, X Li, PA Heng arXiv preprint arXiv:2302.14007, 2023 | 25 | 2023 |
HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models S Ning*, L Qiu*, Y Liu, X He Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 22 | 2023 |
A challenger to gpt-4v? early explorations of gemini in visual expertise C Fu, R Zhang, H Lin, Z Wang, T Gao, Y Luo, Y Huang, Z Zhang, L Qiu, ... arXiv preprint arXiv:2312.12436, 2023 | 20 | 2023 |
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models P Gao*, R Zhang*, C Liu*, L Qiu*, S Huang*, W Lin*, S Zhao, S Geng, ... ICML 24, 2024 | 11 | 2024 |
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training L Qiu*, S Ning*, X He AAAI 24, 2024 | | 2024 |