On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
W. Zhang, Y. Xie, Y. Sun, Y. Chen, G. Wang, Y. Li, B. Ding, and J. Zhou, “On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting,” arxiv: 2508.11408.