Sharpen's Blogs
HOME
ARCHIVES
CATEGORIES
ABOUT
LINKS
GITHUB
BLOG
CSDN
HOME
ARCHIVES
CATEGORIES
ABOUT
LINKS
GITHUB
BLOG
CSDN
Tags
10
Tags
12
Categories
50
Posts
强化学习
2025
6
GRPO trainer 训练推理模型
GRPO 实现讲解
GRPO & DAPO 论文解读
GRPO-trainer-HF 长度奖励的文本压缩任务
DPO trainer - by trl
Direct Preference Optimization (DPO)
1