2025 Conference Simple Policy Optimization Zhengpeng Xie, Qiang Zhang, Fan Yang, Marco Hutter, and Renjing Xu In Forty-second International Conference on Machine Learning, 2025 HTML Preprint Representation Convergence: Mutual Distillation is Secretly a Form of Regularization Zhengpeng Xie, Jiahang Cao, Qiang Zhang, Jianxiong Zhang, Changwei Wang, and Renjing Xu 2025 HTML Conference Zeroth-Order Optimization is Secretly Single-Step Policy Optimization Junbin Qiu, Zhengpeng Xie, Xiangda Yan, Yongjie Yang, and Yao Shu In Tiny Titans: The next wave of On-Device Learning for Foundational Models (TTODLer-FM), 2025 HTML