Follow
Yiwu Zhong
Title
Cited by
Cited by
Year
Grounded language-image pre-training
LH Li, P Zhang, H Zhang, J Yang, C Li, Y Zhong, L Wang, L Yuan, ...
Computer Vision and Pattern Recognition (CVPR), 2022
10752022
RegionCLIP: Region-based Language-Image Pretraining
Y Zhong, J Yang, P Zhang, C Li, N Codella, LH Li, L Zhou, X Dai, L Yuan, ...
Computer Vision and Pattern Recognition (CVPR), 2022
5442022
Comprehensive Image Captioning via Scene Graph Decomposition
Y Zhong, L Wang, J Chen, D Yu, Y Li
European Conference on Computer Vision (ECCV), 2020
1422020
Learning to Generate Scene Graph from Natural Language Supervision
Y Zhong, J Shi, J Yang, C Xu, Y Li
International Conference on Computer Vision (ICCV), 2021
772021
Gpt-4v in wonderland: Large multimodal models for zero-shot smartphone gui navigation
A Yan, Z Yang, W Zhu, K Lin, L Li, J Wang, J Yang, Y Zhong, J McAuley, ...
arXiv preprint arXiv:2311.07562, 2023
732023
Learning Concise and Descriptive Attributes for Visual Recognition
A Yan*, Y Wang*, Y Zhong*, C Dong, Z He, Y Lu, W Wang, J Shang, ...
International Conference on Computer Vision (ICCV), 2023
582023
A Simple Baseline for Weakly-Supervised Scene Graph Generation
J Shi, Y Zhong, N Xu, Y Li, C Xu
International Conference on Computer Vision (ICCV), 2021
332021
Robust and interpretable medical image classifiers via concept bottleneck models
A Yan, Y Wang, Y Zhong, Z He, P Karypis, Z Wang, C Dong, A Gentili, ...
arXiv preprint arXiv:2310.03182, 2023
272023
Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations
Y Zhong, L Yu, Y Bai, S Li, X Yan, Y Li
Computer Vision and Pattern Recognition (CVPR), 2023
232023
Towards learning a generalist model for embodied navigation
D Zheng, S Huang, L Zhao, Y Zhong, L Wang
Computer Vision and Pattern Recognition (CVPR), 2024
172024
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
M Cai, R Tan, J Zhang, B Zou, K Zhang, F Yao, F Zhu, J Gu, Y Zhong, ...
arXiv preprint arXiv:2410.10818, 2024
32024
Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods
C Qu, Y Zhong, C Liu, G Xu, D Peng, F Guo, L Jin
Computer Vision and Pattern Recognition (CVPR), 2024
32024
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Y Zhong, Z Liu, Y Li, L Wang
arXiv preprint arXiv:2412.03248, 2024
2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
ZY Hu, Y Zhong, S Huang, MR Lyu, L Wang
Empirical Methods in Natural Language Processing (EMNLP) Findings, 2024
2024
Beyond Embeddings: The Promise of Visual Table in Visual Reasoning
Y Zhong, ZY Hu, MR Lyu, L Wang
Empirical Methods in Natural Language Processing (EMNLP), 2024
2024
Learning Visual Knowledge from Natural Language Supervision
Y Zhong
The University of Wisconsin-Madison, 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–16