Temporal enhanced training of multi-view 3d object detector via historical object prediction Z Zong, D Jiang, G Song, Z Xue, J Su, H Li, Y Liu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 20 | 2023 |
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? R Zhang, D Jiang, Y Zhang, H Lin, Z Guo, P Qiu, A Zhou, P Lu, KW Chang, ... arXiv preprint arXiv:2403.14624, 2024 | 5 | 2024 |
MoVA: Adapting Mixture of Vision Experts to Multimodal Context Z Zong, B Ma, D Shen, G Song, H Shao, D Jiang, H Li, Y Liu arXiv preprint arXiv:2404.13046, 2024 | | 2024 |
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching D Jiang, G Song, X Wu, R Zhang, D Shen, Z Zong, Y Liu, H Li arXiv preprint arXiv:2404.03653, 2024 | | 2024 |