DeepSeek_R1论文
软件 › 人工智能 › 自然语言处理 | 下载:1 | 浏览:621898 | 时间:8 个月前 |
- 文件大小:1.05MB
- 运行平台:Windows
- 开发工具:PDF
- 下载鸟蛋:免费

- 说明
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs viaReinforcement Learning
Abstract
We introduce our ffrst-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised ffne-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguingreasoning behaviors. However, it encounters challenges such as poor readability, and languagemixing. To address these issues and further enhance reasoning performance, we introduceDeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support theresearch community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models(1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.
- 目录
- 相关问题