DeepSeek_R1论文

软件 › 人工智能 › 自然语言处理 下载:1 浏览:590699 时间:个月前
  • 文件大小:1.05MB
  • 运行平台:Windows
  • 开发工具:PDF
  • 下载鸟蛋:免费
提问 收藏 举报 0 0
  • 说明
DeepSeek语言模型

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs viaReinforcement Learning

Abstract

We introduce our ffrst-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised ffne-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguingreasoning behaviors. However, it encounters challenges such as poor readability, and languagemixing. To address these issues and further enhance reasoning performance, we introduceDeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support theresearch community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models(1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.


图片.png

图片.png

  • 目录
    • 相关问题
    评论(1)
    评论
    • 还没有评论,发表第一个评论吧

    8 0 17 2 2
    提问 回答 资料 博客 粉丝
    近期下载