Follow
Wei Xiong
Wei Xiong
Verified email at illinois.edu - Homepage
Title
Cited by
Cited by
Year
Raft: Reward ranked finetuning for generative foundation model alignment
H Dong, W Xiong, D Goyal, R Pan, S Diao, J Zhang, K Shum, T Zhang
TMLR, 2023
1242023
Gec: A unified framework for interactive decision making in mdp, pomdp, and beyond
H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang
arXiv preprint arXiv:2211.01962, 2022
39*2022
Lmflow: An extensible toolkit for finetuning and inference of large foundation models
S Diao, R Pan, H Dong, KS Shum, J Zhang, W Xiong, T Zhang
NAACL 2024, 2023
372023
Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game
W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang
ICLR 2023, 2022
372022
Decentralized multi-player multi-armed bandits with no collision information
C Shi, W Xiong, C Shen, J Yang
AISTATS, 2020
372020
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets
H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang
ICML 2022, 2022
342022
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
W Xiong, H Zhong, C Shi, C Shen, T Zhang
ICML 2022, 2022
222022
Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization
C Shi, W Xiong, C Shen, J Yang
NeurIPS 2021, 2021
192021
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
C Ye, W Xiong, Q Gu, T Zhang
ICML 2023, 2022
172022
Distributional reinforcement learning for multi-dimensional reward functions
P Zhang, X Chen, L Zhao, W Xiong, T Qin, TY Liu
NeurIPS 2021, 2021
172021
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang
NeurIPS 2023, 2023
16*2023
Mitigating the Alignment Tax of RLHF
Y Lin, H Lin, W Xiong, S Diao, J Liu, J Zhang, R Pan, H Wang, W Hu, ...
arXiv preprint arXiv:2309.06256, 2023
14*2023
PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction
H Ye, W Xiong, T Zhang
arXiv preprint arXiv:2012.15010, 2020
132020
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
W Xiong, H Dong, C Ye, Z Wang, H Zhong, H Ji, N Jiang, T Zhang
ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation …, 2023
12*2023
(Almost) Free Incentivized Exploration from Decentralized Learning Agents
C Shi, H Xu, W Xiong, C Shen
NeurIPS 2021, 2021
5*2021
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
H Wang, Y Lin, W Xiong, R Yang, S Diao, S Qiu, H Zhao, T Zhang
arXiv preprint arXiv:2402.18571, 2024
12024
A theoretical analysis of nash learning from human feedback under general kl-regularized preference
C Ye, W Xiong, Y Zhang, N Jiang, T Zhang
arXiv preprint arXiv:2402.07314, 2024
12024
Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources
C Shi, W Xiong, C Shen, J Yang
ICML 2023, 2023
12023
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
R Pi, T Han, W Xiong, J Zhang, R Liu, R Pan, T Zhang
arXiv preprint arXiv:2403.08730, 2024
2024
Reward Teaching for Federated Multi-armed Bandits
C Shi, W Xiong, C Shen, J Yang
IEEE International Symposium on Information Theory (ISIT 2023), 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–20