deepseek moe chat 16b模型量化为gguf q8格式
首先要去hugging face官网下载deepseek moe chat 16b模型,安装llama.cpp基础依赖(这一步不清楚的可以看我第一篇博客。,占用内存大大缩小,再结合llama运行MOE的那篇博客,量化后的模型可以正常对话,回答准确且速度快。突然发现之前博客写了llama运行gguf量化模型但是没写gguf是怎么来的,这里补充一下。模型现在内存占用是16.22GB,相较于原始大小30
突然发现之前博客写了llama运行gguf量化模型但是没写gguf是怎么来的,这里补充一下
想要完成量化需内存保持在64G以上
首先要去hugging face官网下载deepseek moe chat 16b模型,安装llama.cpp基础依赖(这一步不清楚的可以看我第一篇博客llama.cpp运行deepseek MOE 16b chat-CSDN博客)
进入llama.cpp目录
cd llama.cpp
博主把deepseek模型下载放入了MOE文件夹里(MOE和llama.cpp平级关系),如果你和我下载路径/文件命名不一致的话把下面代码的路径&名称改成你的就可以了:
python3 convert_hf_to_gguf.py ../MOE --outtype f16 --outfile deepseek-moe-16b-chat.f16.gguf
随后他会立刻开始writing,这里贴一下所有的打印信息,可以清楚观察模型量化模块的形状、数据格式转换等(博主当前工作内容需要明确知道每个权重大小形状,所以对这块比较敏感,如果你方向不是这一块的话等执行完成就行了):
(torchenv) wxy@YUSN01:~/llama.cpp$ python3 convert_hf_to_gguf.py ../MOE --outtype f16 --outfile deepseek-moe-16b-chat.f16.gguf
INFO:hf-to-gguf:Loading model: MOE
WARNING:hf-to-gguf:Failed to load model config from ../MOE: The repository ../MOE contains custom code which must be executed to correctly load the model. You can inspect the repository content at /ssd/users/wxy/MOE .
You can inspect the repository content at https://hf.co/../MOE.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: DeepseekForCausalLM
WARNING:hf-to-gguf:Failed to load model config from ../MOE: The repository ../MOE contains custom code which must be executed to correctly load the model. You can inspect the repository content at /ssd/users/wxy/MOE .
You can inspect the repository content at https://hf.co/../MOE.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00007.safetensors'
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {2048, 102400}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {10944, 2048}
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 10944}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 10944}
INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.1.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.1.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.1.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.1.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.1.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.1.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.1.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.2.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.2.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.2.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.2.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.2.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.2.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.2.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.3.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.3.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.3.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.3.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.3.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.3.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.3.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-00007.safetensors'
INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.4.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.4.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.4.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.4.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.4.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.4.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.4.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.5.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.5.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.5.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.5.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.5.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.5.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.6.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.6.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.6.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.6.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.6.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.6.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.6.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.7.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.7.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.7.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.7.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.7.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.7.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.7.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.8.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.8.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.8.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:gguf: loading model part 'model-00003-of-00007.safetensors'
INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.10.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.10.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.10.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.10.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.10.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.10.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.10.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.10.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.11.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.11.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.11.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.11.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.11.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.11.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.11.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.11.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.12.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.12.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.12.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.12.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.12.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.12.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.12.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.12.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.9.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.9.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.9.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.9.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.9.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.9.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:gguf: loading model part 'model-00004-of-00007.safetensors'
INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.13.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.13.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.13.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.13.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.13.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.13.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.13.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.14.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.14.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.14.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.14.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.14.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.14.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.15.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.15.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.15.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.15.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.15.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.15.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.15.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.15.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.16.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.16.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.16.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.16.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.16.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.16.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.16.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.16.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:gguf: loading model part 'model-00005-of-00007.safetensors'
INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.17.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.17.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.17.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.17.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.17.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.17.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.17.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.18.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.18.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.18.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.18.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.18.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.18.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.19.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.19.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.19.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.19.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.19.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.19.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.19.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.19.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.20.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.20.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.20.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.20.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.20.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.20.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.20.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.20.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:gguf: loading model part 'model-00006-of-00007.safetensors'
INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.21.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.21.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.21.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.21.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.21.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.21.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.21.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.22.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.22.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.22.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.22.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.22.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.22.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.23.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.23.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.23.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.23.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.23.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.23.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.23.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.23.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.24.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.24.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.24.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.24.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.24.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.24.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.24.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.24.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.24.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.24.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.24.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.24.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.25.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.25.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.25.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.25.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.25.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.25.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.25.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.25.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.25.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.25.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.25.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.25.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:gguf: loading model part 'model-00007-of-00007.safetensors'
INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {2048, 102400}
INFO:hf-to-gguf:blk.26.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.26.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.26.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.26.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.26.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.26.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.26.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.26.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.26.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.26.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.26.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.26.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.27.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.27.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.27.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.27.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.27.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.27.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.27.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.27.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.27.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.27.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.27.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.27.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 4096
INFO:hf-to-gguf:gguf: embedding length = 2048
INFO:hf-to-gguf:gguf: feed forward length = 10944
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 16
INFO:hf-to-gguf:gguf: rope theta = 10000
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
INFO:hf-to-gguf:gguf: experts used count = 6
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Adding 99757 merge(s).
INFO:gguf.vocab:Setting special token type bos to 100000
INFO:gguf.vocab:Setting special token type eos to 100001
INFO:gguf.vocab:Setting special token type pad to 100001
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{{ bos_token }}{% for message in messages %}{% if message['role'] == 'user' %}{{ 'User: ' + message['content'] + '
' }}{% elif message['role'] == 'assistant' %}{{ 'Assistant: ' + message['content'] + eos_token }}{% elif message['role'] == 'system' %}{{ message['content'] + '
' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:deepseek-moe-16b-chat.f16.gguf: n_tensors = 363, total_size = 32.8G
执行完确认无报错的话执行最后一步:
./build/bin/llama-quantize deepseek-moe-16b-chat.f16.gguf deepseek-moe-16b-chat.q8_0.gguf q8_0 # 这里q8_0可以改成你需要的量化格式,比如q4_0等等,看你需要的精度
完整打印信息如下:
(torchenv) wxy@YUSN01:~/llama.cpp$ ./build/bin/llama-quantize deepseek-moe-16b-chat.f16.gguf deepseek-moe-16b-chat.q8_0.gguf q8_0
main: build = 5891 (0d922676)
main: built with gcc-11 (Ubuntu 11.4.0-2ubuntu1~18.04) 11.4.0 for x86_64-linux-gnu
main: quantizing 'deepseek-moe-16b-chat.f16.gguf' to 'deepseek-moe-16b-chat.q8_0.gguf' as Q8_0
llama_model_loader: loaded meta data with 37 key-value pairs and 363 tensors from deepseek-moe-16b-chat.f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = deepseek
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = MOE
llama_model_loader: - kv 3: general.size_label str = 64x1.7B
llama_model_loader: - kv 4: general.license str = other
llama_model_loader: - kv 5: general.license.name str = deepseek
llama_model_loader: - kv 6: general.license.link str = https://github.com/deepseek-ai/DeepSe...
llama_model_loader: - kv 7: deepseek.block_count u32 = 28
llama_model_loader: - kv 8: deepseek.context_length u32 = 4096
llama_model_loader: - kv 9: deepseek.embedding_length u32 = 2048
llama_model_loader: - kv 10: deepseek.feed_forward_length u32 = 10944
llama_model_loader: - kv 11: deepseek.attention.head_count u32 = 16
llama_model_loader: - kv 12: deepseek.attention.head_count_kv u32 = 16
llama_model_loader: - kv 13: deepseek.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 14: deepseek.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 15: deepseek.expert_used_count u32 = 6
llama_model_loader: - kv 16: general.file_type u32 = 1
llama_model_loader: - kv 17: deepseek.rope.dimension_count u32 = 128
llama_model_loader: - kv 18: deepseek.rope.scaling.type str = none
llama_model_loader: - kv 19: deepseek.leading_dense_block_count u32 = 1
llama_model_loader: - kv 20: deepseek.vocab_size u32 = 102400
llama_model_loader: - kv 21: deepseek.expert_feed_forward_length u32 = 1408
llama_model_loader: - kv 22: deepseek.expert_weights_scale f32 = 1.000000
llama_model_loader: - kv 23: deepseek.expert_count u32 = 64
llama_model_loader: - kv 24: deepseek.expert_shared_count u32 = 2
llama_model_loader: - kv 25: general.quantization_version u32 = 2
llama_model_loader: - kv 26: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 27: tokenizer.ggml.pre str = deepseek-llm
llama_model_loader: - kv 28: tokenizer.ggml.tokens arr[str,102400] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 29: tokenizer.ggml.token_type arr[i32,102400] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 30: tokenizer.ggml.merges arr[str,99757] = ["? ?", "? t", "? a", "i n", "h e...
llama_model_loader: - kv 31: tokenizer.ggml.bos_token_id u32 = 100000
llama_model_loader: - kv 32: tokenizer.ggml.eos_token_id u32 = 100001
llama_model_loader: - kv 33: tokenizer.ggml.padding_token_id u32 = 100001
llama_model_loader: - kv 34: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 35: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 36: tokenizer.chat_template str = {% if not add_generation_prompt is de...
llama_model_loader: - type f32: 84 tensors
llama_model_loader: - type f16: 279 tensors
[ 1/ 363] output.weight - [ 2048, 102400, 1, 1], type = f16, converting to q8_0 .. size = 400.00 MiB -> 212.50 MiB
[ 2/ 363] output_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 3/ 363] token_embd.weight - [ 2048, 102400, 1, 1], type = f16, converting to q8_0 .. size = 400.00 MiB -> 212.50 MiB
[ 4/ 363] blk.0.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 5/ 363] blk.0.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 6/ 363] blk.0.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 7/ 363] blk.0.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 8/ 363] blk.0.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 9/ 363] blk.0.ffn_down.weight - [10944, 2048, 1, 1], type = f16, converting to q8_0 .. size = 42.75 MiB -> 22.71 MiB
[ 10/ 363] blk.0.ffn_gate.weight - [ 2048, 10944, 1, 1], type = f16, converting to q8_0 .. size = 42.75 MiB -> 22.71 MiB
[ 11/ 363] blk.0.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 12/ 363] blk.0.ffn_up.weight - [ 2048, 10944, 1, 1], type = f16, converting to q8_0 .. size = 42.75 MiB -> 22.71 MiB
[ 13/ 363] blk.1.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 14/ 363] blk.1.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 15/ 363] blk.1.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 16/ 363] blk.1.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 17/ 363] blk.1.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 18/ 363] blk.1.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 19/ 363] blk.1.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 20/ 363] blk.1.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 21/ 363] blk.1.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 22/ 363] blk.1.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 23/ 363] blk.1.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 24/ 363] blk.1.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 25/ 363] blk.1.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 26/ 363] blk.2.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 27/ 363] blk.2.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 28/ 363] blk.2.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 29/ 363] blk.2.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 30/ 363] blk.2.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 31/ 363] blk.2.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 32/ 363] blk.2.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 33/ 363] blk.2.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 34/ 363] blk.2.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 35/ 363] blk.2.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 36/ 363] blk.2.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 37/ 363] blk.2.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 38/ 363] blk.2.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 39/ 363] blk.3.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 40/ 363] blk.3.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 41/ 363] blk.3.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 42/ 363] blk.3.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 43/ 363] blk.3.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 44/ 363] blk.3.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 45/ 363] blk.3.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 46/ 363] blk.3.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 47/ 363] blk.3.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 48/ 363] blk.3.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 49/ 363] blk.3.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 50/ 363] blk.3.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 51/ 363] blk.3.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 52/ 363] blk.4.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 53/ 363] blk.4.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 54/ 363] blk.4.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 55/ 363] blk.4.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 56/ 363] blk.4.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 57/ 363] blk.4.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 58/ 363] blk.4.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 59/ 363] blk.4.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 60/ 363] blk.4.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 61/ 363] blk.4.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 62/ 363] blk.4.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 63/ 363] blk.4.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 64/ 363] blk.4.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 65/ 363] blk.5.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 66/ 363] blk.5.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 67/ 363] blk.5.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 68/ 363] blk.5.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 69/ 363] blk.5.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 70/ 363] blk.5.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 71/ 363] blk.5.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 72/ 363] blk.5.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 73/ 363] blk.5.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 74/ 363] blk.5.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 75/ 363] blk.5.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 76/ 363] blk.5.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 77/ 363] blk.5.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 78/ 363] blk.6.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 79/ 363] blk.6.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 80/ 363] blk.6.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 81/ 363] blk.6.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 82/ 363] blk.6.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 83/ 363] blk.6.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 84/ 363] blk.6.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 85/ 363] blk.6.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 86/ 363] blk.6.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 87/ 363] blk.6.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 88/ 363] blk.6.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 89/ 363] blk.6.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 90/ 363] blk.6.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 91/ 363] blk.7.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 92/ 363] blk.7.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 93/ 363] blk.7.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 94/ 363] blk.7.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 95/ 363] blk.7.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 96/ 363] blk.7.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 97/ 363] blk.7.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 98/ 363] blk.7.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 99/ 363] blk.7.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 100/ 363] blk.7.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 101/ 363] blk.7.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 102/ 363] blk.7.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 103/ 363] blk.7.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 104/ 363] blk.8.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 105/ 363] blk.8.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 106/ 363] blk.8.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 107/ 363] blk.8.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 108/ 363] blk.8.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 109/ 363] blk.8.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 110/ 363] blk.8.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 111/ 363] blk.8.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 112/ 363] blk.8.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 113/ 363] blk.8.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 114/ 363] blk.8.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 115/ 363] blk.8.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 116/ 363] blk.8.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 117/ 363] blk.9.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 118/ 363] blk.9.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 119/ 363] blk.9.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 120/ 363] blk.9.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 121/ 363] blk.9.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 122/ 363] blk.9.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 123/ 363] blk.9.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 124/ 363] blk.9.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 125/ 363] blk.9.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 126/ 363] blk.9.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 127/ 363] blk.9.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 128/ 363] blk.9.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 129/ 363] blk.9.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 130/ 363] blk.10.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 131/ 363] blk.10.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 132/ 363] blk.10.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 133/ 363] blk.10.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 134/ 363] blk.10.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 135/ 363] blk.10.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 136/ 363] blk.10.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 137/ 363] blk.10.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 138/ 363] blk.10.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 139/ 363] blk.10.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 140/ 363] blk.10.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 141/ 363] blk.10.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 142/ 363] blk.10.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 143/ 363] blk.11.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 144/ 363] blk.11.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 145/ 363] blk.11.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 146/ 363] blk.11.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 147/ 363] blk.11.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 148/ 363] blk.11.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 149/ 363] blk.11.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 150/ 363] blk.11.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 151/ 363] blk.11.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 152/ 363] blk.11.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 153/ 363] blk.11.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 154/ 363] blk.11.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 155/ 363] blk.11.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 156/ 363] blk.12.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 157/ 363] blk.12.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 158/ 363] blk.12.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 159/ 363] blk.12.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 160/ 363] blk.12.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 161/ 363] blk.12.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 162/ 363] blk.12.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 163/ 363] blk.12.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 164/ 363] blk.12.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 165/ 363] blk.12.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 166/ 363] blk.12.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 167/ 363] blk.12.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 168/ 363] blk.12.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 169/ 363] blk.13.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 170/ 363] blk.13.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 171/ 363] blk.13.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 172/ 363] blk.13.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 173/ 363] blk.13.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 174/ 363] blk.13.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 175/ 363] blk.13.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 176/ 363] blk.13.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 177/ 363] blk.13.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 178/ 363] blk.13.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 179/ 363] blk.13.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 180/ 363] blk.13.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 181/ 363] blk.13.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 182/ 363] blk.14.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 183/ 363] blk.14.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 184/ 363] blk.14.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 185/ 363] blk.14.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 186/ 363] blk.14.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 187/ 363] blk.14.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 188/ 363] blk.14.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 189/ 363] blk.14.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 190/ 363] blk.14.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 191/ 363] blk.14.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 192/ 363] blk.14.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 193/ 363] blk.14.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 194/ 363] blk.14.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 195/ 363] blk.15.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 196/ 363] blk.15.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 197/ 363] blk.15.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 198/ 363] blk.15.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 199/ 363] blk.15.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 200/ 363] blk.15.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 201/ 363] blk.15.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 202/ 363] blk.15.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 203/ 363] blk.15.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 204/ 363] blk.15.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 205/ 363] blk.15.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 206/ 363] blk.15.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 207/ 363] blk.15.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 208/ 363] blk.16.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 209/ 363] blk.16.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 210/ 363] blk.16.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 211/ 363] blk.16.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 212/ 363] blk.16.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 213/ 363] blk.16.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 214/ 363] blk.16.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 215/ 363] blk.16.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 216/ 363] blk.16.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 217/ 363] blk.16.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 218/ 363] blk.16.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 219/ 363] blk.16.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 220/ 363] blk.16.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 221/ 363] blk.17.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 222/ 363] blk.17.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 223/ 363] blk.17.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 224/ 363] blk.17.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 225/ 363] blk.17.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 226/ 363] blk.17.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 227/ 363] blk.17.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 228/ 363] blk.17.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 229/ 363] blk.17.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 230/ 363] blk.17.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 231/ 363] blk.17.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 232/ 363] blk.17.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 233/ 363] blk.17.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 234/ 363] blk.18.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 235/ 363] blk.18.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 236/ 363] blk.18.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 237/ 363] blk.18.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 238/ 363] blk.18.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 239/ 363] blk.18.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 240/ 363] blk.18.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 241/ 363] blk.18.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 242/ 363] blk.18.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 243/ 363] blk.18.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 244/ 363] blk.18.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 245/ 363] blk.18.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 246/ 363] blk.18.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 247/ 363] blk.19.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 248/ 363] blk.19.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 249/ 363] blk.19.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 250/ 363] blk.19.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 251/ 363] blk.19.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 252/ 363] blk.19.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 253/ 363] blk.19.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 254/ 363] blk.19.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 255/ 363] blk.19.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 256/ 363] blk.19.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 257/ 363] blk.19.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 258/ 363] blk.19.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 259/ 363] blk.19.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 260/ 363] blk.20.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 261/ 363] blk.20.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 262/ 363] blk.20.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 263/ 363] blk.20.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 264/ 363] blk.20.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 265/ 363] blk.20.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 266/ 363] blk.20.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 267/ 363] blk.20.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 268/ 363] blk.20.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 269/ 363] blk.20.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 270/ 363] blk.20.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 271/ 363] blk.20.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 272/ 363] blk.20.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 273/ 363] blk.21.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 274/ 363] blk.21.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 275/ 363] blk.21.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 276/ 363] blk.21.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 277/ 363] blk.21.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 278/ 363] blk.21.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 279/ 363] blk.21.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 280/ 363] blk.21.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 281/ 363] blk.21.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 282/ 363] blk.21.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 283/ 363] blk.21.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 284/ 363] blk.21.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 285/ 363] blk.21.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 286/ 363] blk.22.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 287/ 363] blk.22.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 288/ 363] blk.22.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 289/ 363] blk.22.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 290/ 363] blk.22.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 291/ 363] blk.22.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 292/ 363] blk.22.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 293/ 363] blk.22.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 294/ 363] blk.22.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 295/ 363] blk.22.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 296/ 363] blk.22.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 297/ 363] blk.22.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 298/ 363] blk.22.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 299/ 363] blk.23.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 300/ 363] blk.23.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 301/ 363] blk.23.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 302/ 363] blk.23.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 303/ 363] blk.23.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 304/ 363] blk.23.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 305/ 363] blk.23.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 306/ 363] blk.23.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 307/ 363] blk.23.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 308/ 363] blk.23.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 309/ 363] blk.23.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 310/ 363] blk.23.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 311/ 363] blk.23.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 312/ 363] blk.24.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 313/ 363] blk.24.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 314/ 363] blk.24.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 315/ 363] blk.24.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 316/ 363] blk.24.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 317/ 363] blk.24.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 318/ 363] blk.24.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 319/ 363] blk.24.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 320/ 363] blk.24.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 321/ 363] blk.24.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 322/ 363] blk.24.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 323/ 363] blk.24.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 324/ 363] blk.24.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 325/ 363] blk.25.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 326/ 363] blk.25.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 327/ 363] blk.25.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 328/ 363] blk.25.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 329/ 363] blk.25.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 330/ 363] blk.25.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 331/ 363] blk.25.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 332/ 363] blk.25.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 333/ 363] blk.25.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 334/ 363] blk.25.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 335/ 363] blk.25.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 336/ 363] blk.25.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 337/ 363] blk.25.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 338/ 363] blk.26.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 339/ 363] blk.26.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 340/ 363] blk.26.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 341/ 363] blk.26.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 342/ 363] blk.26.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 343/ 363] blk.26.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 344/ 363] blk.26.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 345/ 363] blk.26.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 346/ 363] blk.26.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 347/ 363] blk.26.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 348/ 363] blk.26.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 349/ 363] blk.26.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 350/ 363] blk.26.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 351/ 363] blk.27.attn_k.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 352/ 363] blk.27.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 353/ 363] blk.27.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 354/ 363] blk.27.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 355/ 363] blk.27.attn_v.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB
[ 356/ 363] blk.27.ffn_down_exps.weight - [ 1408, 2048, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 357/ 363] blk.27.ffn_down_shexp.weight - [ 2816, 2048, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 358/ 363] blk.27.ffn_gate_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 359/ 363] blk.27.ffn_gate_inp.weight - [ 2048, 64, 1, 1], type = f32, size = 0.500 MB
[ 360/ 363] blk.27.ffn_gate_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
[ 361/ 363] blk.27.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MB
[ 362/ 363] blk.27.ffn_up_exps.weight - [ 2048, 1408, 64, 1], type = f16, converting to q8_0 .. size = 352.00 MiB -> 187.00 MiB
[ 363/ 363] blk.27.ffn_up_shexp.weight - [ 2048, 2816, 1, 1], type = f16, converting to q8_0 .. size = 11.00 MiB -> 5.84 MiB
llama_model_quantize_impl: model size = 31241.20 MB
llama_model_quantize_impl: quant size = 16603.42 MB
main: quantize time = 127536.35 ms
main: total time = 127536.35 ms
模型现在内存占用是16.22GB,相较于原始大小30.48GB减少了46.8%,占用内存大大缩小,再结合llama运行MOE的那篇博客,量化后的模型可以正常对话,回答准确且速度快。
量化真是好东西啊!
火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。
更多推荐
所有评论(0)