Wei Xiong

Cited by

	All	Since 2019
Citations	446	446
h-index	13	13
i10-index	14	14

220

110

165

202120222023202419 40 209 175

Public access

View all

5 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Tong ZhangHKUSTVerified email at tongzhang-ml.org
Han ZhongPeking UniversityVerified email at stu.pku.edu.cn
Chengshuai ShiElectrical and Computer Engineering, University of VirginiaVerified email at virginia.edu
Cong ShenUniversity of VirginiaVerified email at virginia.edu
Jipeng ZhangHong Kong University of Science and TechnologyVerified email at connect.ust.hk
Shizhe DiaoHong Kong University of Science and TechnologyVerified email at connect.ust.hk
Zhaoran WangAssistant Professor at Northwestern UniversityVerified email at northwestern.edu
Zhuoran YangYale UniversityVerified email at yale.edu
Liwei WangProfessor, Peking UniversityVerified email at cis.pku.edu.cn
Hanze DongSalesforce ResearchVerified email at salesforce.com
Rui PanPhD student, HKUSTVerified email at connect.ust.hk
KaShun SHUMThe Hong Kong University of Science and TechnologyVerified email at connect.ust.hk
Jing YangAssociate Professor of Electrical Engineering, Penn State UniversityVerified email at psu.edu
Chenlu YeHong Kong University of Science and TechnologyVerified email at connect.ust.hk
Miao LuStanford UniversityVerified email at stanford.edu
Jiyuan TanStanford UniversityVerified email at stanford.edu
Nan JiangAssistant Professor of Computer Science, UIUCVerified email at illinois.edu
Haoxiang WangPhD Student, University of Illinois Urbana-ChampaignVerified email at illinois.edu
Renjie PiHKUSTVerified email at connect.ust.hk
Li ZhaoResearcherVerified email at microsoft.com

Wei Xiong

Other names熊伟

Computer Science, University of Illinois Urbana-Champaign

Verified email at illinois.edu - Homepage

learning theory RLHF


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Raft: Reward ranked finetuning for generative foundation model alignment H Dong, W Xiong, D Goyal, R Pan, S Diao, J Zhang, K Shum, T Zhang TMLR, 2023	124	2023
Gec: A unified framework for interactive decision making in mdp, pomdp, and beyond H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang arXiv preprint arXiv:2211.01962, 2022	39*	2022
Lmflow: An extensible toolkit for finetuning and inference of large foundation models S Diao, R Pan, H Dong, KS Shum, J Zhang, W Xiong, T Zhang NAACL 2024, 2023	37	2023
Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang ICLR 2023, 2022	37	2022
Decentralized multi-player multi-armed bandits with no collision information C Shi, W Xiong, C Shen, J Yang AISTATS, 2020	37	2020
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang ICML 2022, 2022	34	2022
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games W Xiong, H Zhong, C Shi, C Shen, T Zhang ICML 2022, 2022	22	2022
Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization C Shi, W Xiong, C Shen, J Yang NeurIPS 2021, 2021	19	2021
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes C Ye, W Xiong, Q Gu, T Zhang ICML 2023, 2022	17	2022
Distributional reinforcement learning for multi-dimensional reward functions P Zhang, X Chen, L Zhao, W Xiong, T Qin, TY Liu NeurIPS 2021, 2021	17	2021
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang NeurIPS 2023, 2023	16*	2023
Mitigating the Alignment Tax of RLHF Y Lin, H Lin, W Xiong, S Diao, J Liu, J Zhang, R Pan, H Wang, W Hu, ... arXiv preprint arXiv:2309.06256, 2023	14*	2023
PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction H Ye, W Xiong, T Zhang arXiv preprint arXiv:2012.15010, 2020	13	2020
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint W Xiong, H Dong, C Ye, Z Wang, H Zhong, H Ji, N Jiang, T Zhang ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation …, 2023	12*	2023
(Almost) Free Incentivized Exploration from Decentralized Learning Agents C Shi, H Xu, W Xiong, C Shen NeurIPS 2021, 2021	5*	2021
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards H Wang, Y Lin, W Xiong, R Yang, S Diao, S Qiu, H Zhao, T Zhang arXiv preprint arXiv:2402.18571, 2024	1	2024
A theoretical analysis of nash learning from human feedback under general kl-regularized preference C Ye, W Xiong, Y Zhang, N Jiang, T Zhang arXiv preprint arXiv:2402.07314, 2024	1	2024
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources C Shi, W Xiong, C Shen, J Yang ICML 2023, 2023	1	2023
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization R Pi, T Han, W Xiong, J Zhang, R Liu, R Pan, T Zhang arXiv preprint arXiv:2403.08730, 2024		2024
Reward Teaching for Federated Multi-armed Bandits C Shi, W Xiong, C Shen, J Yang IEEE International Symposium on Information Theory (ISIT 2023), 2023		2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors