Wanrong Zhu
Cited by
Cited by
Openflamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
A Awadalla, I Gao, J Gardner, J Hessel, Y Hanafy, W Zhu, K Marathe, ...
arXiv preprint arXiv:2308.01390, 2023
Large Language Models are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning
X Wang, W Zhu, WY Wang
NeurIPS 2023, 2023
Multimodal C4: An Open, Billion-Scale Corpus of Images Interleaved with Text
W Zhu, J Hessel, A Awadalla, SY Gadre, J Dodge, A Fang, Y Yu, ...
NeurIPS 2023 - Dataset and Benchmark Track, 2023
Text Infilling
W Zhu, Z Hu, E Xing
arXiv preprint arXiv:1901.00158, 2019
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
W Feng, W Zhu, T Fu, V Jampani, A Akula, X He, S Basu, XE Wang, ...
NeurIPS 2023, 2023
Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation
Z Hu, H Shi, B Tan, W Wang, Z Yang, T Zhao, J He, L Qin, D Wang, X Ma, ...
ACL 2019: System Demonstration, 159–164, 2019
Diagnosing Vision-and-Language Navigation: What Really Matters
W Zhu, Y Qi, P Narayana, K Sone, S Basu, XE Wang, Q Wu, M Eckstein, ...
NAACL 2022, 5981–5993, 2021
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
A Yan, Z Yang, W Zhu, K Lin, L Li, J Wang, J Yang, Y Zhong, J McAuley, ...
arXiv preprint arXiv:2311.07562, 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Y Bitton, H Bansal, J Hessel, R Shao, W Zhu, A Awadalla, J Gardner, ...
NeurIPS 2023 - Dataset and Benchmark Track, 2023
End-to-end Dense Video Captioning as Sequence Generation
W Zhu, B Pang, A Thapliyal, WY Wang, R Soricut
COLING 2022, 5651–5665, 2022
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
W Zhu, A Yan, Y Lu, W Xu, XE Wang, M Eckstein, WY Wang
Findings of EACL 2023, 78–92, 2022
Multimodal procedural planning via dual text-image prompting
Y Lu, P Lu, Z Chen, W Zhu, XE Wang, WY Wang
arXiv preprint arXiv:2305.01795, 2023
Neuro-Symbolic Causal Language Planning with Commonsense Prompting
Y Lu, W Feng, W Zhu, W Xu, XE Wang, M Eckstein, WY Wang
ICLR 2023, 2022
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation
W Zhu, XE Wang, TJ Fu, A Yan, P Narayana, K Sone, S Basu, WY Wang
EACL 2021, 1207–1221, 2020
Imagination-Augmented Natural Language Understanding
Y Lu, W Zhu, XE Wang, M Eckstein, WY Wang
NAACL 2022, 4392–4402, 2022
Velma: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
R Schumann, W Zhu, W Feng, TJ Fu, S Riezler, WY Wang
AAAI 2024, 2023
ImaginE: An Imagination-based Automatic Evaluation Metric for Natural Language Generation
W Zhu, XE Wang, A Yan, M Eckstein, WY Wang
Findings of EACL 2023, 93–105, 2021
Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations
W Zhu, XE Wang, P Narayana, K Sone, S Basu, WY Wang
EMNLP 2020, 8806–8811, 2020
Clip also understands text: Prompting clip for phrase understanding
A Yan, J Li, W Zhu, Y Lu, WY Wang, J McAuley
arXiv preprint arXiv:2210.05836, 2022
Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation
W Zhu, X Wang, Y Lu, TJ Fu, XE Wang, M Eckstein, WY Wang
EMNLP 2023, 2023
The system can't perform the operation now. Try again later.
Articles 1–20