GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
by Shih-Yang Liu, Xin Dong, Ximing Lu, Shizhe Diao, Peter Belcak, Mingjie Liu, Min-Hung Chen, Hongxu Yin, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Yejin Choi, Jan Kautz, Pavlo Molchanov
Jan 9, 2026 • 16:51
Multi-reward Reinforcement LearningGroup Relative Policy Optimization (GRPO)Group reward-Decoupled Normalization Policy Optimization (GDPO)Reward Normalization Issues
00:0016:51
Download on the App Store
Get the full experience with ResearchPod
ResearchPod turns research papers into podcasts you can actually follow.