Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
Published in Under review, 2025
Recommended citation: C. Yao, Y. Chen, Y. Sun, Y. Chen, W. Zhang, X. Pan, Y. Li, and B. Ding, “Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends,” arxiv: 2509.24203.
