Pixel-bert: Aligning image pixels with text by deep multi-modal transformers Z Huang, Z Zeng, B Liu, D Fu, J Fu arXiv preprint arXiv:2004.00849, 2020 | 361 | 2020 |
Seeing out of the box: End-to-end pre-training for vision-language representation learning Z Huang, Z Zeng, Y Huang, B Liu, D Fu, J Fu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 215 | 2021 |
Grounding dino: Marrying dino with grounded pre-training for open-set object detection S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang, C Li, J Yang, H Su, J Zhu, ... arXiv preprint arXiv:2303.05499, 2023 | 187 | 2023 |
Wsod2: Learning bottom-up and top-down objectness distillation for weakly-supervised object detection Z Zeng, B Liu, J Fu, H Chao, L Zhang Proceedings of the IEEE/CVF international conference on computer vision …, 2019 | 142 | 2019 |
Active contrastive learning of audio-visual video representations S Ma, Z Zeng, D McDuff, Y Song arXiv preprint arXiv:2009.09805, 2020 | 85 | 2020 |
Mind the discriminability: Asymmetric adversarial domain adaptation J Yang, H Zou, Y Zhou, Z Zeng, L Xie Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020 | 46 | 2020 |
GarbageNet: a unified learning framework for robust garbage classification J Yang, Z Zeng, K Wang, H Zou, L Xie IEEE Transactions on Artificial Intelligence 2 (4), 372-380, 2021 | 33 | 2021 |
Contrastive learning of global-local video representations S Ma, Z Zeng, D McDuff, Y Song arXiv preprint arXiv:2104.05418, 2021 | 26 | 2021 |
Smp challenge: An overview of social media prediction challenge 2019 B Wu, WH Cheng, P Liu, B Liu, Z Zeng, J Luo Proceedings of the 27th ACM International Conference on Multimedia, 2667-2671, 2019 | 26 | 2019 |
Suppressing mislabeled data via grouping and self-attention X Peng, K Wang, Z Zeng, Q Li, J Yang, Y Qiao Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020 | 22 | 2020 |
Reference-based defect detection network Z Zeng, B Liu, J Fu, H Chao IEEE Transactions on Image Processing 30, 6637-6647, 2021 | 20 | 2021 |
Contrastive learning of global and local video representations Z Zeng, D McDuff, Y Song Advances in Neural Information Processing Systems 34, 7025-7040, 2021 | 17 | 2021 |
Activitynet 2019 task 3: Exploring contexts for dense captioning events in videos S Chen, Y Song, Y Zhao, Q Jin, Z Zeng, B Liu, J Fu, A Hauptmann arXiv preprint arXiv:1907.05092, 2019 | 12 | 2019 |
Tencent-mvse: A large-scale benchmark dataset for multi-modal video similarity evaluation Z Zeng, Y Luo, Z Liu, F Rao, D Li, W Guo, Z Wen Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 9 | 2022 |
Pixel-bert: Aligning image pixels with text by deep multi-modal transformers. arXiv 2020 Z Huang, Z Zeng, B Liu, D Fu, J Fu arXiv preprint arXiv:2004.00849, 2020 | 9 | 2020 |
Learning rich image region representation for visual question answering B Liu, Z Huang, Z Zeng, Z Chen, J Fu arXiv preprint arXiv:1910.13077, 2019 | 9 | 2019 |
Multiple transfer learning and multi-label balanced training strategies for facial au detection in the wild S Ji, K Wang, X Peng, J Yang, Z Zeng, Y Qiao Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 8 | 2020 |
Clip4caption++: Multi-clip for video caption M Tang, Z Wang, Z Zeng, F Rao, D Li arXiv preprint arXiv:2110.05204, 2021 | 6 | 2021 |
Detection Transformer with Stable Matching S Liu, T Ren, J Chen, Z Zeng, H Zhang, F Li, H Li, J Huang, H Su, J Zhu, ... arXiv preprint arXiv:2304.04742, 2023 | 5 | 2023 |
Be Specific, Be Clear: Bridging Machine and Human Captions by Scene-Guided Transformer Y Huang, Z Zeng, Y Lu Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia …, 2021 | 5 | 2021 |