Xize Cheng（成曦泽）

Cited by

	All	Since 2020
Citations	493	493
h-index	13	13
i10-index	18	18

240

120

180

20232024202548 238 205

Public access

View all

9 articles

1 article

available

not available

Based on funding mandates

Co-authors

Zhou ZhaoZhejiang UniversityVerified email at zju.edu.cn
Rongjie HuangFAIR, Zhejiang UniversityVerified email at zju.edu.cn
Zehan WangZhejiang UniversityVerified email at zju.edu.cn
Minghui FangZhejiang UniversityVerified email at zju.edu.cn
shengpeng jiZhejiang universityVerified email at zju.edu.cn
Haifeng HuangZhejiang UniversityVerified email at zju.edu.cn
Ziyue JiangZhejiang UniversityVerified email at zju.edu.cn
Luping Liu (刘路平)The University of Hong KongVerified email at connect.hku.hk
Zhenhui Ye (叶振辉)Zhejiang universityVerified email at zju.edu.cn
Huadai Liu (刘华岱)Zhejiang UniversityVerified email at zju.edu.cn
Yi Ren (任意)Research Scientist, TiktokVerified email at bytedance.com

Xize Cheng（成曦泽）

Zhejiang University

Verified email at zju.edu.cn - Homepage

Audio-Visual Processing Sound Separation Spoken Dialogue System


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers H Huang, Y Chen, Z Wang, R Huang, R Xu, T Wang, L Liu, X Cheng, ... arXiv preprint arXiv:2312.08168, 2023	62	2023
Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo, Q Yang, X Cheng, Z Wang, ... arXiv preprint arXiv:2408.16532, 2024	43	2024
Connecting multi-modal contrastive representations Z Wang, Y Zhao, H Huang, J Liu, A Yin, L Tang, L Li, Y Wang, Z Zhang, ... Advances in Neural Information Processing Systems 36, 22099-22114, 2023	41	2023
Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition X Cheng, T Jin, R Huang, L Li, W Lin, Z Wang, Y Wang, H Liu, A Yin, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	25	2023
Wavchat: A survey of spoken dialogue models S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang, Z Jiang, L Zhou, S Liu, ... arXiv preprint arXiv:2411.13577, 2024	23	2024
3drp-net: 3d relative position-aware network for 3d visual grounding Z Wang, H Huang, Y Zhao, L Li, X Cheng, Y Zhu, A Yin, Z Zhao arXiv preprint arXiv:2307.13363, 2023	22	2023
Opensr: Open-modality speech recognition via maintaining multi-modality alignment X Cheng, T Jin, L Li, W Lin, X Duan, Z Zhao arXiv preprint arXiv:2306.06410, 2023	22	2023
Distilling coarse-to-fine semantic matching knowledge for weakly supervised 3d visual grounding Z Wang, H Huang, Y Zhao, L Li, X Cheng, Y Zhu, A Yin, Z Zhao Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	22	2023
Tavt: Towards transferable audio-visual text generation W Lin, T Jin, W Pan, L Li, X Cheng, Y Wang, Z Zhao Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023	18	2023
Av-transpeech: Audio-visual robust speech-to-speech translation R Huang, H Liu, X Cheng, Y Ren, L Li, Z Ye, J He, L Zhang, J Liu, X Yin, ... arXiv preprint arXiv:2305.15403, 2023	18	2023
Exploring group video captioning with efficient relational approximation W Lin, T Jin, Y Wang, W Pan, L Li, X Cheng, Z Zhao Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	16	2023
Freebind: Free lunch in unified multimodal space via knowledge fusion Z Wang, Z Zhang, X Cheng, R Huang, L Liu, Z Ye, H Huang, Y Zhao, T Jin, ... arXiv preprint arXiv:2405.04883, 2024	13	2024
Rethinking missing modality learning from a decoding perspective T Jin, X Cheng, L Li, W Lin, Y Wang, Z Zhao Proceedings of the 31st ACM International Conference on Multimedia, 4431-4439, 2023	13	2023
Omnibind: Large-scale omni multimodal representation via binding spaces Z Wang, Z Zhang, H Zhang, L Liu, R Huang, X Cheng, H Zhao, Z Zhao arXiv preprint arXiv:2407.11895, 2024	11	2024
Audiolcm: Text-to-audio generation with latent consistency models H Liu, R Huang, Y Liu, H Cao, J Wang, X Cheng, S Zheng, Z Zhao arXiv preprint arXiv:2406.00356, 2024	11	2024
Extending multi-modal contrastive representations Z Zhang, Z Wang, L Liu, R Huang, X Cheng, Z Ye, H Liu, H Huang, ... Advances in Neural Information Processing Systems 37, 91880-91903, 2024	10	2024
Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec S Ji, J Zuo, W Wang, M Fang, S Zheng, Q Chen, Z Jiang, H Huang, ... arXiv preprint arXiv:2406.01205, 2024	10	2024
Transface: Unit-based audio-visual speech synthesizer for talking head translation X Cheng, R Huang, L Li, T Jin, Z Wang, A Yin, M Li, X Duan, Z Zhao arXiv preprint arXiv:2312.15197, 2023	10	2023
Weakly-supervised spoken video grounding via semantic interaction learning Y Wang, W Lin, S Zhang, T Jin, L Li, X Cheng, Z Zhao Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023	8	2023
Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts D Fu, X Cheng, X Yang, W Hanting, Z Zhao, T Jin Proceedings of the 32nd ACM International Conference on Multimedia, 3838-3847, 2024	7	2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors