关注
Bang Yang
Bang Yang
在 pku.edu.cn 的电子邮件经过验证
标题
引用次数
引用次数
年份
Non-Autoregressive Coarse-to-Fine Video Captioning
B Yang, Y Zou, F Liu, C Zhang
In Proceedings of AAAI 2021, 2021
78*2021
O2NA: An object-oriented non-autoregressive approach for controllable video captioning
F Liu, X Ren, X Wu, B Yang, S Ge, Y Zou, X Sun
In Findings of ACL 2021, 2021
312021
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter
B Yang, T Zhang, Y Zou
In Proceedings of PRCV 2022 (Oral), 2022
22*2022
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Y Li, B Yang, X Cheng, Z Zhu, H Li, Y Zou
In Proceedings of ICCV 2023, 2023
132023
Adaptive curriculum learning for video captioning
S Li, B Yang, Y Zou
IEEE Access 10, 31751-31759, 2022
122022
Retrieve, reason, and refine: Generating accurate and faithful patient instructions
F Liu*, B Yang*, C You, X Wu, S Ge, Z Liu, X Sun, Y Yang, D Clifton
In Proceedings of NeurIPS 2022, 2022
102022
A medical multimodal large language model for future pandemics
F Liu, T Zhu, X Wu, B Yang, C You, C Wang, L Lu, Z Liu, Y Zheng, X Sun, ...
NPJ Digital Medicine 6 (1), 226, 2023
82023
Concept-aware video captioning: Describing videos with effective prior information
B Yang, M Cao, Y Zou
IEEE Transactions on Image Processing, 2023
62023
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
B Yang, F Liu, Y Zou, X Wu, Y Wang, DA Clifton
arXiv preprint arXiv:2303.06458, 2023
52023
Graph-in-graph network for automatic gene ontology description generation
F Liu, B Yang, C You, X Wu, S Ge, A Woicik, S Wang
In Proceedings of KDD 2022 (Oral), 2022
52022
PCLmed at ImageCLEFmedical 2023: Customizing General-Purpose Foundation Models for Medical Report Generation
B Yang, A Raza, Y Zou, T Zhang
In Proceedings of CLEF 2023, 2023
4*2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
B Yang, F Liu, X Wu, Y Wang, Y Wang, Y Zou
In Proceedings of ACL 2023, 2023
42023
WorldGPT: a Sora-inspired video AI agent as Rich world models from text and image inputs
D Yang, L Hu, Y Tian, Z Li, C Kelly, B Yang, C Yang, Y Zou
arXiv preprint arXiv:2403.07944, 2024
32024
Visual oriented encoder: Integrating multimodal and multi-scale contexts for video captioning
B Yang, Y Zou
In Proceedings of ICPR 2020, 188-195, 2021
32021
Visiongpt: Vision-language understanding agent using generalized multimodal framework
C Kelly, L Hu, B Yang, Y Tian, D Yang, C Yang, Z Huang, Z Li, J Hu, Y Zou
arXiv preprint arXiv:2403.09027, 2024
22024
Consensus-Guided Keyword Targeting for Video Captioning
P Ji, B Yang, T Zhang, Y Zou
In Proceedings of PRCV 2022, 2022
22022
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding
C Kelly, L Hu, J Hu, Y Tian, D Yang, B Yang, C Yang, Z Li, Z Huang, Y Zou
arXiv preprint arXiv:2403.09530, 2024
12024
Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models
S Wu, B Yang, Z Ye, H Wang, H Zheng, T Zhang
arXiv preprint arXiv:2312.03970, 2023
12023
UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework
C Kelly, L Hu, C Yang, Y Tian, D Yang, B Yang, Z Huang, Z Li, Y Zou
arXiv preprint arXiv:2311.10125, 2023
12023
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
B Yang, F Liu, Z Li, Q Yin, C You, B Yin, Y Zou
In Findings of ACL 2023, 2023
12023
系统目前无法执行此操作,请稍后再试。
文章 1–20