GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

by Shih-Yang Liu, Xin Dong, Ximing Lu, Shizhe Diao, Peter Belcak, Mingjie Liu, Min-Hung Chen, Hongxu Yin, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Yejin Choi, Jan Kautz, Pavlo Molchanov

Jan 9, 202616:51

Multi-reward Reinforcement LearningGroup Relative Policy Optimization (GRPO)Group reward-Decoupled Normalization Policy Optimization (GDPO)Reward Normalization Issues
00:0016:51
Download on the App Store

Get the full experience with ResearchPod

ResearchPod turns research papers into podcasts you can actually follow.