Follow
Wanrong Zhu
Wanrong Zhu
Adobe Research
Verified email at adobe.com - Homepage
Title
Cited by
Cited by
Year
Openflamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
A Awadalla, I Gao, J Gardner, J Hessel, Y Hanafy, W Zhu, K Marathe, ...
arXiv preprint arXiv:2308.01390, 2023
393*2023
Large Language Models are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning
X Wang, W Zhu, WY Wang
NeurIPS 2023, 2023
141*2023
Multimodal C4: An Open, Billion-Scale Corpus of Images Interleaved with Text
W Zhu, J Hessel, A Awadalla, SY Gadre, J Dodge, A Fang, Y Yu, ...
NeurIPS 2023 - Dataset and Benchmark Track, 2023
1222023
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
W Feng, W Zhu, T Fu, V Jampani, A Akula, X He, S Basu, XE Wang, ...
NeurIPS 2023, 2023
1162023
Text Infilling
W Zhu, Z Hu, E Xing
arXiv preprint arXiv:1901.00158, 2019
892019
Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation
Z Hu, H Shi, B Tan, W Wang, Z Yang, T Zhao, J He, L Qin, D Wang, X Ma, ...
ACL 2019: System Demonstration, 159–164, 2019
662019
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
A Yan, Z Yang, W Zhu, K Lin, L Li, J Wang, J Yang, Y Zhong, J McAuley, ...
arXiv preprint arXiv:2311.07562, 2023
592023
Diagnosing Vision-and-Language Navigation: What Really Matters
W Zhu, Y Qi, P Narayana, K Sone, S Basu, XE Wang, Q Wu, M Eckstein, ...
NAACL 2022, 5981–5993, 2021
432021
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Y Bitton, H Bansal, J Hessel, R Shao, W Zhu, A Awadalla, J Gardner, ...
NeurIPS 2023 - Dataset and Benchmark Track, 2023
422023
Velma: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
R Schumann, W Zhu, W Feng, TJ Fu, S Riezler, WY Wang
AAAI 2024, 2023
372023
End-to-end Dense Video Captioning as Sequence Generation
W Zhu, B Pang, A Thapliyal, WY Wang, R Soricut
COLING 2022, 5651–5665, 2022
342022
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation
W Zhu, XE Wang, TJ Fu, A Yan, P Narayana, K Sone, S Basu, WY Wang
EACL 2021, 1207–1221, 2020
322020
Multimodal procedural planning via dual text-image prompting
Y Lu, P Lu, Z Chen, W Zhu, XE Wang, WY Wang
Findings of EMNLP 2024, 2023
312023
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
W Zhu, A Yan, Y Lu, W Xu, XE Wang, M Eckstein, WY Wang
Findings of EACL 2023, 78–92, 2022
312022
Neuro-Symbolic Causal Language Planning with Commonsense Prompting
Y Lu, W Feng, W Zhu, W Xu, XE Wang, M Eckstein, WY Wang
ICLR 2023, 2022
31*2022
Imagination-Augmented Natural Language Understanding
Y Lu, W Zhu, XE Wang, M Eckstein, WY Wang
NAACL 2022, 4392–4402, 2022
292022
ImaginE: An Imagination-based Automatic Evaluation Metric for Natural Language Generation
W Zhu, XE Wang, A Yan, M Eckstein, WY Wang
Findings of EACL 2023, 93–105, 2021
102021
Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations
W Zhu, XE Wang, P Narayana, K Sone, S Basu, WY Wang
EMNLP 2020, 8806–8811, 2020
92020
Clip also understands text: Prompting clip for phrase understanding
A Yan, J Li, W Zhu, Y Lu, WY Wang, J McAuley
arXiv preprint arXiv:2210.05836, 2022
62022
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
A Yan, Z Yang, J Wu, W Zhu, J Yang, L Li, K Lin, J Wang, J McAuley, ...
arXiv preprint arXiv:2404.16375, 2024
42024
The system can't perform the operation now. Try again later.
Articles 1–20