ktransformers 部署deepseek满血版
本文介绍ktransformers技术框架本地部署deepseek的基本步骤
Introduction - Ktransformers 官方文档
一、硬件环境:
主板: X13SWA-TF
显卡:Intel Arc A770
二、软件环境:
操作系统:Ubuntu 25.04 Intel架构版
Ubuntu 25.04 开始更加适配intel显卡。22.04 、24.10 这两个Ubuntu版本能够源码编译ktransformers xpu版。25.04源码编译ktransformers会提示找不到cuda和xpu从而无法源码安装,只能通过desktop安装。
Ubuntu 25.04 源码编译ktransformers会出现下面的问题:
valueError: Unsupported backend: CUDA_HOME ROCM_HOME MUSA_HOME are not set and XPU is not available.
三、必备准备工作:
1、安装Intel GPU驱动:
client GPU :
Installing Client GPUs — Intel® software for general purpose GPU capabilities documentation
data Center GPU :
Installing Data Center GPU: LTS Releases — Intel® software for general purpose GPU capabilities documentation
备注:只有 Intel® Data Center GPU Max 系列和 Intel® Data Center GPU Flex 请安装 data Center GPU
其他系列如 Intel® Arc™ A-series 一律安装client GPU (好消息:安装 client GPU 比安装 data Center GPU 要容易的多)
2、安装Intel oneAPI
Get the Intel® oneAPI Base Toolkit
注意安装完后一定要在终端使用以下两条命令,否则llama.cpp 识别不到SYCL后端:
sudo apt update sudo apt -y install cmake pkg-config build-essential
四、ktransformers 版本:
ktransformers/doc/en/xpu.md at main · kvcache-ai/ktransformers xpu源码编译指南
ktransformers/doc/en/Docker_xpu.md at main · kvcache-ai/ktransformers docker 安装指南
Introduction - Ktransformers 官方文档
五、研究成果
使用 deepseek-r1-671b Q4_K_M ,单GPU版 速度为3.0~ 3.7 t/s,4GPU版 速度为0.5 t
4GPU版配置文件:
# === Embed Tokens ===
- match:
name: "^model.embed_tokens"
replace:
class: "default"
kwargs:
generate_device: "cpu"
prefill_device: "cpu"
# === Rotary Embedding ===
- match:
name: "^model\\.layers\\.(0|[1-9]|1[0-4])\\."
class: ktransformers.models.modeling_deepseek_v3.DeepseekV3RotaryEmbedding
replace:
class: ktransformers.operators.RoPE.YarnRotaryEmbeddingV3
kwargs:
generate_device: "xpu:0"
prefill_device: "xpu:0"
- match:
name: "^model\\.layers\\.(1[5-9]|2[0-9])\\."
class: ktransformers.models.modeling_deepseek_v3.DeepseekV3RotaryEmbedding
replace:
class: ktransformers.operators.RoPE.YarnRotaryEmbeddingV3
kwargs:
generate_device: "xpu:1"
prefill_device: "xpu:1"
- match:
name: "^model\\.layers\\.(3[0-9]|4[0-4])\\."
class: ktransformers.models.modeling_deepseek_v3.DeepseekV3RotaryEmbedding
replace:
class: ktransformers.operators.RoPE.YarnRotaryEmbeddingV3
kwargs:
generate_device: "xpu:2"
prefill_device: "xpu:2"
- match:
name: "^model\\.layers\\.(4[5-9]|5[0-9]|60)\\."
class: ktransformers.models.modeling_deepseek_v3.DeepseekV3RotaryEmbedding
replace:
class: ktransformers.operators.RoPE.YarnRotaryEmbeddingV3
kwargs:
generate_device: "xpu:3"
prefill_device: "xpu:3"
# === Linear Layers (including kv_b_proj) ===
- match:
name: "^model\\.layers\\.(0|[1-9]|1[0-4])\\..*"
class: torch.nn.Linear
replace:
class: ktransformers.operators.linear.KTransformersLinear
kwargs:
generate_device: "xpu:0"
prefill_device: "xpu:0"
generate_op: "KLinearIPEXLLM"
prefill_op: "KLinearIPEXLLM"
- match:
name: "^model\\.layers\\.(1[5-9]|2[0-9])\\..*"
class: torch.nn.Linear
replace:
class: ktransformers.operators.linear.KTransformersLinear
kwargs:
generate_device: "xpu:1"
prefill_device: "xpu:1"
generate_op: "KLinearIPEXLLM"
prefill_op: "KLinearIPEXLLM"
- match:
name: "^model\\.layers\\.(3[0-9]|4[0-4])\\..*"
class: torch.nn.Linear
replace:
class: ktransformers.operators.linear.KTransformersLinear
kwargs:
generate_device: "xpu:2"
prefill_device: "xpu:2"
generate_op: "KLinearIPEXLLM"
prefill_op: "KLinearIPEXLLM"
- match:
name: "^model\\.layers\\.(4[5-9]|5[0-9]|60)\\..*"
class: torch.nn.Linear
replace:
class: ktransformers.operators.linear.KTransformersLinear
kwargs:
generate_device: "xpu:3"
prefill_device: "xpu:3"
generate_op: "KLinearIPEXLLM"
prefill_op: "KLinearIPEXLLM"
# === MLP ===
- match:
name: "^model\\.layers\\.(0|[1-9]|1[0-4])\\.mlp$"
class: ktransformers.models.modeling_deepseek_v3.DeepseekV3MoE
replace:
class: ktransformers.operators.experts.KDeepseekV3MoE
kwargs:
generate_device: "xpu:0"
prefill_device: "xpu:0"
- match:
name: "^model\\.layers\\.(1[5-9]|2[0-9])\\.mlp$"
class: ktransformers.models.modeling_deepseek_v3.DeepseekV3MoE
replace:
class: ktransformers.operators.experts.KDeepseekV3MoE
kwargs:
generate_device: "xpu:1"
prefill_device: "xpu:1"
- match:
name: "^model\\.layers\\.(3[0-9]|4[0-4])\\.mlp$"
class: ktransformers.models.modeling_deepseek_v3.DeepseekV3MoE
replace:
class: ktransformers.operators.experts.KDeepseekV3MoE
kwargs:
generate_device: "xpu:2"
prefill_device: "xpu:2"
- match:
name: "^model\\.layers\\.(4[5-9]|5[0-9]|60)\\.mlp$"
class: ktransformers.models.modeling_deepseek_v3.DeepseekV3MoE
replace:
class: ktransformers.operators.experts.KDeepseekV3MoE
kwargs:
generate_device: "xpu:3"
prefill_device: "xpu:3"
# === MoE Gate ===
- match:
class: ktransformers.models.modeling_deepseek_v3.MoEGate
replace:
class: ktransformers.operators.gate.KMoEGateIPEXLLM
kwargs:
generate_device: "xpu:0"
prefill_device: "xpu:0"
# === MoE Experts ===
- match:
name: "^model\\.layers\\.(0|[1-9]|1[0-4])\\.mlp\\.experts$"
replace:
class: ktransformers.operators.experts.KTransformersExperts
kwargs:
prefill_device: "xpu:0"
prefill_op: "KExpertsTorch"
generate_device: "cpu"
generate_op: "KExpertsCPU"
out_device: "xpu:0"
recursive: False
- match:
name: "^model\\.layers\\.(1[5-9]|2[0-9])\\.mlp\\.experts$"
replace:
class: ktransformers.operators.experts.KTransformersExperts
kwargs:
prefill_device: "xpu:1"
prefill_op: "KExpertsTorch"
generate_device: "cpu"
generate_op: "KExpertsCPU"
out_device: "xpu:1"
recursive: False
- match:
name: "^model\\.layers\\.(3[0-9]|4[0-4])\\.mlp\\.experts$"
replace:
class: ktransformers.operators.experts.KTransformersExperts
kwargs:
prefill_device: "xpu:2"
prefill_op: "KExpertsTorch"
generate_device: "cpu"
generate_op: "KExpertsCPU"
out_device: "xpu:2"
recursive: False
- match:
name: "^model\\.layers\\.(4[5-9]|5[0-9]|60)\\.mlp\\.experts$"
replace:
class: ktransformers.operators.experts.KTransformersExperts
kwargs:
prefill_device: "xpu:3"
prefill_op: "KExpertsTorch"
generate_device: "cpu"
generate_op: "KExpertsCPU"
out_device: "xpu:3"
recursive: False
# === Self Attention ===
- match:
name: "^model\\.layers\\..*\\.self_attn$"
replace:
class: ktransformers.operators.attention.KDeepseekV2Attention
kwargs:
generate_device: "xpu"
prefill_device: "xpu"
absorb_for_prefill: False
# === LayerNorm ===
- match:
class: ktransformers.models.modeling_deepseek_v3.DeepseekV3RMSNorm
replace:
class: ktransformers.operators.layernorm.KDeepseekRMSNormIPEXLLM
kwargs:
generate_device: "xpu"
prefill_device: "xpu"
# === Final lm_head ===
- match:
name: "^lm_head$"
class: torch.nn.Linear
replace:
class: ktransformers.operators.linear.KTransformersLinear
kwargs:
generate_device: "xpu:3"
prefill_device: "xpu:3"
generate_op: "KLinearIPEXLLM"
prefill_op: "KLinearIPEXLLM"
# === Final Norm + Layers on xpu:3 ===
- match:
name: "(^model\\.layers\\.(4[5-9]|5[0-9]|60)\\.)|(^model\\.norm)"
replace:
class: "default"
kwargs:
generate_device: "xpu:3"
prefill_device: "xpu:3"
# === Top-Level Model Wrapper ===
- match:
name: "^model$"
replace:
class: ktransformers.operators.models.KDeepseekV2Model
kwargs:
per_layer_prefill_intput_threshold: 0
火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。
更多推荐
所有评论(0)