DeepSeek 发布新的推测解码方法 DSpark

MR_Teen

46人浏览 · 2026-06-28 10:20:45

MR_Teen · 2026-06-28 10:20:45 发布

DeepSeek 发布新的推测解码方法 DSpark

DeepSeek刚刚发布了DSpark for V4 Flash & Pro，这是一种新的推测解码方法，可将吞吐量提高51%至400%！DS还表明，DSpark对于Gemma和Qwen等其他模型也表现良好!

注意：DeepSeek-V4-Pro-DSpark并非新模型，而是在原有版本的基础上增加了一个推测性解码模块。文件夹中提供了一个最小推理示例inference。更多详情，请参阅：https://github.com/deepseek-ai/DeepSpec
mtp对比

介绍

我们推出DeepSeek-V4系列的预览版，其中包括两个强大的混合专家 (MoE) 语言模型——DeepSeek -V4-Pro（1.6T 参数，已激活 49B）和DeepSeek-V4-Flash （284B 参数，已激活 13B）——两者均支持一百万个标记的上下文长度。

DeepSeek-V4系列在架构和优化方面进行了多项关键升级：

混合注意力架构：我们设计了一种混合注意力机制，结合了压缩稀疏注意力（CSA）和高度压缩注意力（HCA），以显著提高长上下文效率。在 100 万个词元的上下文设置下，与 DeepSeek-V3.2 相比，DeepSeek-V4-Pro 仅需27% 的单词元推理浮点运算次数和10% 的键值缓存。
流形约束超连接（mHC）：我们引入 mHC 来加强传统的残差连接，增强跨层信号传播的稳定性，同时保持模型的表达能力。
Muon优化器：我们采用Muon优化器以实现更快的收敛速度和更高的训练稳定性。
我们使用超过32T 个多样化且高质量的词元对两个模型进行预训练，随后进行全面的后训练流程。后训练流程采用两阶段范式：首先，通过 SFT 和 GRPO 强化学习独立培养领域特定专家；其次，通过策略内蒸馏进行统一模型整合，将不同领域的专家技能整合到一个模型中。

DeepSeek-V4-Pro-Max是 DeepSeek-V4-Pro 的最高推理模式，显著提升了开源模型的知识能力，稳居目前最佳开源模型之列。它在编码基准测试中取得了顶尖性能，并在推理和智能体任务方面显著缩小了与领先的闭源模型之间的差距。同时，DeepSeek-V4-Flash-Max在拥有更大的推理预算时，其推理性能与 Pro 版本相当，但由于其参数规模较小，在纯知识任务和最复杂的智能体工作流程方面自然略逊一筹。
DFlash

Model Downloads

Model	#Total Params	#Activated Params	Context Length	Precision	Download
DeepSeek-V4-Flash-Base	284B	13B	1M	FP8 Mixed	HuggingFace \| ModelScope
DeepSeek-V4-Flash	284B	13B	1M	FP4 + FP8 Mixed*	HuggingFace \| ModelScope
DeepSeek-V4-Pro-Base	1.6T	49B	1M	FP8 Mixed	HuggingFace \| ModelScope
DeepSeek-V4-Pro	1.6T	49B	1M	FP4 + FP8 Mixed*	HuggingFace \| ModelScope

* FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8.

Evaluation Results — Base Model

Category	Benchmark (Metric)	# Shots	DeepSeek-V3.2-Base	DeepSeek-V4-Flash-Base	DeepSeek-V4-Pro-Base
Architecture	Architecture	-	MoE	MoE	MoE
Params	# Activated Params	-	37B	13B	49B
Params	# Total Params	-	671B	284B	1.6T
World Knowledge	AGIEval (EM)	0-shot	80.1	82.6	83.1
World Knowledge	MMLU (EM)	5-shot	87.8	88.7	90.1
World Knowledge	MMLU-Redux (EM)	5-shot	87.5	89.4	90.8
World Knowledge	MMLU-Pro (EM)	5-shot	65.5	68.3	73.5
World Knowledge	MMMLU (EM)	5-shot	87.9	88.8	90.3
World Knowledge	C-Eval (EM)	5-shot	90.4	92.1	93.1
World Knowledge	CMMLU (EM)	5-shot	88.9	90.4	90.8
World Knowledge	MultiLoKo (EM)	5-shot	38.7	42.2	51.1
World Knowledge	Simple-QA verified (EM)	25-shot	28.3	30.1	55.2
World Knowledge	SuperGPQA (EM)	5-shot	45.0	46.5	53.9
World Knowledge	FACTS Parametric (EM)	25-shot	27.1	33.9	62.6
World Knowledge	TriviaQA (EM)	5-shot	83.3	82.8	85.6
Language & Reasoning	BBH (EM)	3-shot	87.6	86.9	87.5
Language & Reasoning	DROP (F1)	1-shot	88.2	88.6	88.7
Language & Reasoning	HellaSwag (EM)	0-shot	86.4	85.7	88.0
Language & Reasoning	WinoGrande (EM)	0-shot	78.9	79.5	81.5
Language & Reasoning	CLUEWSC (EM)	5-shot	83.5	82.2	85.2
Code & Math	BigCodeBench (Pass@1)	3-shot	63.9	56.8	59.2
Code & Math	HumanEval (Pass@1)	0-shot	62.8	69.5	76.8
Code & Math	GSM8K (EM)	8-shot	91.1	90.8	92.6
Code & Math	MATH (EM)	4-shot	60.5	57.4	64.5
Code & Math	MGSM (EM)	8-shot	81.3	85.7	84.4
Code & Math	CMath (EM)	3-shot	92.6	93.6	90.9
Long Context	LongBench-V2 (EM)	1-shot	40.2	44.7	51.5

Reasoning Mode

DeepSeek-V4-Pro 和 DeepSeek-V4-Flash 都支持三种推理强度模式：

Reasoning Mode	Characteristics	Typical Use Cases	Response Format
Non-think	Fast, intuitive responses	Routine daily tasks, low-risk decisions	`summary`
Think High	Conscious logical analysis, slower but more accurate	Complex problem-solving, planning	`<think> thinking summary`
Think Max	Push reasoning to its fullest extent	Exploring the boundary of model reasoning capability	Special system prompt + `<think> thinking summary`

DeepSeek-V4-Pro-Max vs Frontier Models

Category	Benchmark (Metric)	Opus-4.6 Max	GPT-5.4 xHigh	Gemini-3.1-Pro High	K2.6 Thinking	GLM-5.1 Thinking	DS-V4-Pro Max
Knowledge & Reasoning	MMLU-Pro (EM)	89.1	87.5	91.0	87.1	86.0	87.5
Knowledge & Reasoning	SimpleQA-Verified (Pass@1)	46.2	45.3	75.6	36.9	38.1	57.9
Knowledge & Reasoning	Chinese-SimpleQA (Pass@1)	76.4	76.8	85.9	75.9	75.0	84.4
Knowledge & Reasoning	GPQA Diamond (Pass@1)	91.3	93.0	94.3	90.5	86.2	90.1
Knowledge & Reasoning	HLE (Pass@1)	40.0	39.8	44.4	36.4	34.7	37.7
Knowledge & Reasoning	LiveCodeBench (Pass@1)	88.8	-	91.7	89.6	-	93.5
Knowledge & Reasoning	Codeforces (Rating)	-	3168	3052	-	-	3206
Knowledge & Reasoning	HMMT 2026 Feb (Pass@1)	96.2	97.7	94.7	92.7	89.4	95.2
Knowledge & Reasoning	IMOAnswerBench (Pass@1)	75.3	91.4	81.0	86.0	83.8	89.8
Knowledge & Reasoning	Apex (Pass@1)	34.5	54.1	60.9	24.0	11.5	38.3
Knowledge & Reasoning	Apex Shortlist (Pass@1)	85.9	78.1	89.1	75.5	72.4	90.2
Long Context	MRCR 1M (MMR)	92.9	-	76.3	-	-	83.5
Long Context	CorpusQA 1M (ACC)	71.7	-	53.8	-	-	62.0
Agentic	Terminal Bench 2.0 (Acc)	65.4	75.1	68.5	66.7	63.5	67.9
Agentic	SWE Verified (Resolved)	80.8	-	80.6	80.2	-	80.6
Agentic	SWE Pro (Resolved)	57.3	57.7	54.2	58.6	58.4	55.4
Agentic	SWE Multilingual (Resolved)	77.5	-	-	76.7	73.3	76.2
Agentic	BrowseComp (Pass@1)	83.7	82.7	85.9	83.2	79.3	83.4
Agentic	HLE w/ tools (Pass@1)	53.1	52.0	51.6	54.0	50.4	48.2
Agentic	GDPval-AA (Elo)	1619	1674	1314	1482	1535	1554
Agentic	MCPAtlas Public (Pass@1)	73.8	67.2	69.2	66.6	71.8	73.6
Agentic	Toolathlon (Pass@1)	47.2	54.6	48.8	50.0	40.7	51.8

Comparison across Modes

Category	Benchmark (Metric)	V4-Flash Non-Think	V4-Flash High	V4-Flash Max	V4-Pro Non-Think	V4-Pro High	V4-Pro Max
Knowledge & Reasoning	MMLU-Pro (EM)	83.0	86.4	86.2	82.9	87.1	87.5
Knowledge & Reasoning	SimpleQA-Verified (Pass@1)	23.1	28.9	34.1	45.0	46.2	57.9
Knowledge & Reasoning	Chinese-SimpleQA (Pass@1)	71.5	73.2	78.9	75.8	77.7	84.4
Knowledge & Reasoning	GPQA Diamond (Pass@1)	71.2	87.4	88.1	72.9	89.1	90.1
Knowledge & Reasoning	HLE (Pass@1)	8.1	29.4	34.8	7.7	34.5	37.7
Knowledge & Reasoning	LiveCodeBench (Pass@1)	55.2	88.4	91.6	56.8	89.8	93.5
Knowledge & Reasoning	Codeforces (Rating)	-	2816	3052	-	2919	3206
Knowledge & Reasoning	HMMT 2026 Feb (Pass@1)	40.8	91.9	94.8	31.7	94.0	95.2
Knowledge & Reasoning	IMOAnswerBench (Pass@1)	41.9	85.1	88.4	35.3	88.0	89.8
Knowledge & Reasoning	Apex (Pass@1)	1.0	19.1	33.0	0.4	27.4	38.3
Knowledge & Reasoning	Apex Shortlist (Pass@1)	9.3	72.1	85.7	9.2	85.5	90.2
Long Context	MRCR 1M (MMR)	37.5	76.9	78.7	44.7	83.3	83.5
Long Context	CorpusQA 1M (ACC)	15.5	59.3	60.5	35.6	56.5	62.0
Agentic	Terminal Bench 2.0 (Acc)	49.1	56.6	56.9	59.1	63.3	67.9
Agentic	SWE Verified (Resolved)	73.7	78.6	79.0	73.6	79.4	80.6
Agentic	SWE Pro (Resolved)	49.1	52.3	52.6	52.1	54.4	55.4
Agentic	SWE Multilingual (Resolved)	69.7	70.2	73.3	69.8	74.1	76.2
Agentic	BrowseComp (Pass@1)	-	53.5	73.2	-	80.4	83.4
Agentic	HLE w/ tools (Pass@1)	-	40.3	45.1	-	44.7	48.2
Agentic	MCPAtlas (Pass@1)	64.0	67.4	69.0	69.4	74.2	73.6
Agentic	GDPval-AA (Elo)	-	-	1395	-	-	1554
Agentic	Toolathlon (Pass@1)	40.7	43.5	47.8	46.3	49.0	51.8