突然发现之前博客写了llama运行gguf量化模型但是没写gguf是怎么来的,这里补充一下

想要完成量化需内存保持在64G以上

首先要去hugging face官网下载deepseek moe chat 16b模型,安装llama.cpp基础依赖(这一步不清楚的可以看我第一篇博客llama.cpp运行deepseek MOE 16b chat-CSDN博客

进入llama.cpp目录

cd llama.cpp

博主把deepseek模型下载放入了MOE文件夹里(MOE和llama.cpp平级关系),如果你和我下载路径/文件命名不一致的话把下面代码的路径&名称改成你的就可以了:

python3 convert_hf_to_gguf.py ../MOE --outtype f16 --outfile deepseek-moe-16b-chat.f16.gguf

随后他会立刻开始writing,这里贴一下所有的打印信息,可以清楚观察模型量化模块的形状、数据格式转换等(博主当前工作内容需要明确知道每个权重大小形状,所以对这块比较敏感,如果你方向不是这一块的话等执行完成就行了):

(torchenv) wxy@YUSN01:~/llama.cpp$ python3 convert_hf_to_gguf.py ../MOE --outtype f16 --outfile deepseek-moe-16b-chat.f16.gguf
INFO:hf-to-gguf:Loading model: MOE
WARNING:hf-to-gguf:Failed to load model config from ../MOE: The repository ../MOE contains custom code which must be executed to correctly load the model. You can inspect the repository content at /ssd/users/wxy/MOE .
 You can inspect the repository content at https://hf.co/../MOE.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: DeepseekForCausalLM
WARNING:hf-to-gguf:Failed to load model config from ../MOE: The repository ../MOE contains custom code which must be executed to correctly load the model. You can inspect the repository content at /ssd/users/wxy/MOE .
 You can inspect the repository content at https://hf.co/../MOE.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00007.safetensors'
INFO:hf-to-gguf:token_embd.weight,            torch.bfloat16 --> F16, shape = {2048, 102400}
INFO:hf-to-gguf:blk.0.attn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.0.ffn_down.weight,        torch.bfloat16 --> F16, shape = {10944, 2048}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,        torch.bfloat16 --> F16, shape = {2048, 10944}
INFO:hf-to-gguf:blk.0.ffn_up.weight,          torch.bfloat16 --> F16, shape = {2048, 10944}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.0.attn_k.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_output.weight,     torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_q.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_v.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.1.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.1.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.1.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.1.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.1.ffn_down_shexp.weight,  torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.1.ffn_gate_shexp.weight,  torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.1.ffn_up_shexp.weight,    torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.1.attn_k.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_output.weight,     torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_q.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_v.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.2.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.2.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.2.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.2.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.2.ffn_down_shexp.weight,  torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.2.ffn_gate_shexp.weight,  torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.2.ffn_up_shexp.weight,    torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.2.attn_k.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_output.weight,     torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_q.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_v.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.3.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.3.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.3.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.3.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.3.ffn_down_shexp.weight,  torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.3.ffn_gate_shexp.weight,  torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.3.ffn_up_shexp.weight,    torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.3.attn_k.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_output.weight,     torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_q.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_v.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_k.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_output.weight,     torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_q.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_v.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-00007.safetensors'
INFO:hf-to-gguf:blk.4.attn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.4.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.4.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.4.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.4.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.4.ffn_down_shexp.weight,  torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.4.ffn_gate_shexp.weight,  torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.4.ffn_up_shexp.weight,    torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.attn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.5.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.5.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.5.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.5.ffn_down_shexp.weight,  torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.5.ffn_gate_shexp.weight,  torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.5.ffn_up_shexp.weight,    torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.attn_k.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_output.weight,     torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_q.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_v.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.6.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.6.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.6.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.6.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.6.ffn_down_shexp.weight,  torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.6.ffn_gate_shexp.weight,  torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.6.ffn_up_shexp.weight,    torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.6.attn_k.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_output.weight,     torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_q.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_v.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.7.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.7.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.7.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.7.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.7.ffn_down_shexp.weight,  torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.7.ffn_gate_shexp.weight,  torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.7.ffn_up_shexp.weight,    torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.7.attn_k.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_output.weight,     torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_q.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_v.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.8.ffn_down_shexp.weight,  torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.8.ffn_gate_shexp.weight,  torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.8.ffn_up_shexp.weight,    torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.attn_k.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_output.weight,     torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_q.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_v.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:gguf: loading model part 'model-00003-of-00007.safetensors'
INFO:hf-to-gguf:blk.10.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.10.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.10.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.10.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.10.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.10.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.10.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.10.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.10.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.11.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.11.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.11.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.11.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.11.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.11.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.11.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.11.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.12.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.12.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.12.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.12.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.12.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.12.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.12.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.12.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.9.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.9.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.9.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.9.ffn_down_shexp.weight,  torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.9.ffn_gate_shexp.weight,  torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.9.ffn_up_shexp.weight,    torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.attn_k.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_output.weight,     torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_q.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_v.weight,          torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:gguf: loading model part 'model-00004-of-00007.safetensors'
INFO:hf-to-gguf:blk.13.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.13.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.13.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.13.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.13.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.13.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.13.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.13.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.14.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.14.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.14.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.14.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.14.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.14.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.15.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.15.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.15.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.15.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.15.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.15.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.15.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.15.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.16.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.16.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.16.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.16.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.16.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.16.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.16.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.16.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:gguf: loading model part 'model-00005-of-00007.safetensors'
INFO:hf-to-gguf:blk.17.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.17.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.17.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.17.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.17.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.17.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.17.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.17.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.18.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.18.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.18.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.18.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.18.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.18.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.19.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.19.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.19.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.19.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.19.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.19.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.19.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.19.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.20.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.20.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.20.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.20.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.20.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.20.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.20.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.20.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:gguf: loading model part 'model-00006-of-00007.safetensors'
INFO:hf-to-gguf:blk.21.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.21.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.21.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.21.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.21.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.21.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.21.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.21.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.22.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.22.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.22.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.22.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.22.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.22.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.23.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.23.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.23.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.23.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.23.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.23.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.23.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.23.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.24.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.24.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.24.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.24.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.24.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.24.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.24.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.24.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.24.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.24.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.24.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.24.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.25.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.25.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.25.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.25.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.25.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.25.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.25.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.25.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.25.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.25.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.25.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.25.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:gguf: loading model part 'model-00007-of-00007.safetensors'
INFO:hf-to-gguf:output.weight,                torch.bfloat16 --> F16, shape = {2048, 102400}
INFO:hf-to-gguf:blk.26.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.26.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.26.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.26.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.26.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.26.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.26.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.26.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.26.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.26.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.26.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.26.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.27.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.27.ffn_down_exps.weight,  torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.27.ffn_gate_exps.weight,  torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.27.ffn_up_exps.weight,    torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.27.ffn_gate_inp.weight,   torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.27.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.27.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.27.ffn_up_shexp.weight,   torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.27.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.27.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.27.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.27.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:output_norm.weight,           torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 4096
INFO:hf-to-gguf:gguf: embedding length = 2048
INFO:hf-to-gguf:gguf: feed forward length = 10944
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 16
INFO:hf-to-gguf:gguf: rope theta = 10000
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
INFO:hf-to-gguf:gguf: experts used count = 6
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Adding 99757 merge(s).
INFO:gguf.vocab:Setting special token type bos to 100000
INFO:gguf.vocab:Setting special token type eos to 100001
INFO:gguf.vocab:Setting special token type pad to 100001
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{{ bos_token }}{% for message in messages %}{% if message['role'] == 'user' %}{{ 'User: ' + message['content'] + '

' }}{% elif message['role'] == 'assistant' %}{{ 'Assistant: ' + message['content'] + eos_token }}{% elif message['role'] == 'system' %}{{ message['content'] + '

' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:deepseek-moe-16b-chat.f16.gguf: n_tensors = 363, total_size = 32.8G

执行完确认无报错的话执行最后一步:

./build/bin/llama-quantize deepseek-moe-16b-chat.f16.gguf deepseek-moe-16b-chat.q8_0.gguf q8_0 # 这里q8_0可以改成你需要的量化格式,比如q4_0等等,看你需要的精度

完整打印信息如下:

(torchenv) wxy@YUSN01:~/llama.cpp$ ./build/bin/llama-quantize deepseek-moe-16b-chat.f16.gguf deepseek-moe-16b-chat.q8_0.gguf q8_0
main: build = 5891 (0d922676)
main: built with gcc-11 (Ubuntu 11.4.0-2ubuntu1~18.04) 11.4.0 for x86_64-linux-gnu
main: quantizing 'deepseek-moe-16b-chat.f16.gguf' to 'deepseek-moe-16b-chat.q8_0.gguf' as Q8_0
llama_model_loader: loaded meta data with 37 key-value pairs and 363 tensors from deepseek-moe-16b-chat.f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = deepseek
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = MOE
llama_model_loader: - kv   3:                         general.size_label str              = 64x1.7B
llama_model_loader: - kv   4:                            general.license str              = other
llama_model_loader: - kv   5:                       general.license.name str              = deepseek
llama_model_loader: - kv   6:                       general.license.link str              = https://github.com/deepseek-ai/DeepSe...
llama_model_loader: - kv   7:                       deepseek.block_count u32              = 28
llama_model_loader: - kv   8:                    deepseek.context_length u32              = 4096
llama_model_loader: - kv   9:                  deepseek.embedding_length u32              = 2048
llama_model_loader: - kv  10:               deepseek.feed_forward_length u32              = 10944
llama_model_loader: - kv  11:              deepseek.attention.head_count u32              = 16
llama_model_loader: - kv  12:           deepseek.attention.head_count_kv u32              = 16
llama_model_loader: - kv  13:                    deepseek.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  14:  deepseek.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  15:                 deepseek.expert_used_count u32              = 6
llama_model_loader: - kv  16:                          general.file_type u32              = 1
llama_model_loader: - kv  17:              deepseek.rope.dimension_count u32              = 128
llama_model_loader: - kv  18:                 deepseek.rope.scaling.type str              = none
llama_model_loader: - kv  19:         deepseek.leading_dense_block_count u32              = 1
llama_model_loader: - kv  20:                        deepseek.vocab_size u32              = 102400
llama_model_loader: - kv  21:        deepseek.expert_feed_forward_length u32              = 1408
llama_model_loader: - kv  22:              deepseek.expert_weights_scale f32              = 1.000000
llama_model_loader: - kv  23:                      deepseek.expert_count u32              = 64
llama_model_loader: - kv  24:               deepseek.expert_shared_count u32              = 2
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - kv  26:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  27:                         tokenizer.ggml.pre str              = deepseek-llm
llama_model_loader: - kv  28:                      tokenizer.ggml.tokens arr[str,102400]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  29:                  tokenizer.ggml.token_type arr[i32,102400]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  30:                      tokenizer.ggml.merges arr[str,99757]   = ["? ?", "? t", "? a", "i n", "h e...
llama_model_loader: - kv  31:                tokenizer.ggml.bos_token_id u32              = 100000
llama_model_loader: - kv  32:                tokenizer.ggml.eos_token_id u32              = 100001
llama_model_loader: - kv  33:            tokenizer.ggml.padding_token_id u32              = 100001
llama_model_loader: - kv  34:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  35:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  36:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
llama_model_loader: - type  f32:   84 tensors
llama_model_loader: - type  f16:  279 tensors
[   1/ 363]                        output.weight - [ 2048, 102400,     1,     1], type =    f16, converting to q8_0 .. size =   400.00 MiB ->   212.50 MiB
[   2/ 363]                   output_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[   3/ 363]                    token_embd.weight - [ 2048, 102400,     1,     1], type =    f16, converting to q8_0 .. size =   400.00 MiB ->   212.50 MiB
[   4/ 363]                  blk.0.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[   5/ 363]               blk.0.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[   6/ 363]             blk.0.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[   7/ 363]                  blk.0.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[   8/ 363]                  blk.0.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[   9/ 363]                blk.0.ffn_down.weight - [10944,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    42.75 MiB ->    22.71 MiB
[  10/ 363]                blk.0.ffn_gate.weight - [ 2048, 10944,     1,     1], type =    f16, converting to q8_0 .. size =    42.75 MiB ->    22.71 MiB
[  11/ 363]                blk.0.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  12/ 363]                  blk.0.ffn_up.weight - [ 2048, 10944,     1,     1], type =    f16, converting to q8_0 .. size =    42.75 MiB ->    22.71 MiB
[  13/ 363]                  blk.1.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  14/ 363]               blk.1.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  15/ 363]             blk.1.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  16/ 363]                  blk.1.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  17/ 363]                  blk.1.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  18/ 363]           blk.1.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  19/ 363]          blk.1.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  20/ 363]           blk.1.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  21/ 363]            blk.1.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[  22/ 363]          blk.1.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  23/ 363]                blk.1.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  24/ 363]             blk.1.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  25/ 363]            blk.1.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  26/ 363]                  blk.2.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  27/ 363]               blk.2.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  28/ 363]             blk.2.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  29/ 363]                  blk.2.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  30/ 363]                  blk.2.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  31/ 363]           blk.2.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  32/ 363]          blk.2.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  33/ 363]           blk.2.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  34/ 363]            blk.2.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[  35/ 363]          blk.2.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  36/ 363]                blk.2.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  37/ 363]             blk.2.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  38/ 363]            blk.2.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  39/ 363]                  blk.3.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  40/ 363]               blk.3.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  41/ 363]             blk.3.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  42/ 363]                  blk.3.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  43/ 363]                  blk.3.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  44/ 363]           blk.3.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  45/ 363]          blk.3.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  46/ 363]           blk.3.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  47/ 363]            blk.3.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[  48/ 363]          blk.3.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  49/ 363]                blk.3.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  50/ 363]             blk.3.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  51/ 363]            blk.3.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  52/ 363]                  blk.4.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  53/ 363]               blk.4.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  54/ 363]             blk.4.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  55/ 363]                  blk.4.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  56/ 363]                  blk.4.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  57/ 363]           blk.4.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  58/ 363]          blk.4.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  59/ 363]           blk.4.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  60/ 363]            blk.4.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[  61/ 363]          blk.4.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  62/ 363]                blk.4.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  63/ 363]             blk.4.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  64/ 363]            blk.4.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  65/ 363]                  blk.5.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  66/ 363]               blk.5.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  67/ 363]             blk.5.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  68/ 363]                  blk.5.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  69/ 363]                  blk.5.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  70/ 363]           blk.5.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  71/ 363]          blk.5.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  72/ 363]           blk.5.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  73/ 363]            blk.5.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[  74/ 363]          blk.5.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  75/ 363]                blk.5.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  76/ 363]             blk.5.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  77/ 363]            blk.5.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  78/ 363]                  blk.6.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  79/ 363]               blk.6.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  80/ 363]             blk.6.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  81/ 363]                  blk.6.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  82/ 363]                  blk.6.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  83/ 363]           blk.6.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  84/ 363]          blk.6.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  85/ 363]           blk.6.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  86/ 363]            blk.6.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[  87/ 363]          blk.6.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  88/ 363]                blk.6.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  89/ 363]             blk.6.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  90/ 363]            blk.6.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  91/ 363]                  blk.7.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  92/ 363]               blk.7.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[  93/ 363]             blk.7.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  94/ 363]                  blk.7.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  95/ 363]                  blk.7.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[  96/ 363]           blk.7.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  97/ 363]          blk.7.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[  98/ 363]           blk.7.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[  99/ 363]            blk.7.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 100/ 363]          blk.7.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 101/ 363]                blk.7.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 102/ 363]             blk.7.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 103/ 363]            blk.7.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 104/ 363]                  blk.8.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 105/ 363]               blk.8.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 106/ 363]             blk.8.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 107/ 363]                  blk.8.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 108/ 363]                  blk.8.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 109/ 363]           blk.8.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 110/ 363]          blk.8.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 111/ 363]           blk.8.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 112/ 363]            blk.8.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 113/ 363]          blk.8.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 114/ 363]                blk.8.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 115/ 363]             blk.8.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 116/ 363]            blk.8.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 117/ 363]                  blk.9.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 118/ 363]               blk.9.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 119/ 363]             blk.9.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 120/ 363]                  blk.9.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 121/ 363]                  blk.9.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 122/ 363]           blk.9.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 123/ 363]          blk.9.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 124/ 363]           blk.9.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 125/ 363]            blk.9.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 126/ 363]          blk.9.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 127/ 363]                blk.9.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 128/ 363]             blk.9.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 129/ 363]            blk.9.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 130/ 363]                 blk.10.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 131/ 363]              blk.10.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 132/ 363]            blk.10.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 133/ 363]                 blk.10.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 134/ 363]                 blk.10.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 135/ 363]          blk.10.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 136/ 363]         blk.10.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 137/ 363]          blk.10.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 138/ 363]           blk.10.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 139/ 363]         blk.10.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 140/ 363]               blk.10.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 141/ 363]            blk.10.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 142/ 363]           blk.10.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 143/ 363]                 blk.11.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 144/ 363]              blk.11.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 145/ 363]            blk.11.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 146/ 363]                 blk.11.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 147/ 363]                 blk.11.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 148/ 363]          blk.11.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 149/ 363]         blk.11.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 150/ 363]          blk.11.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 151/ 363]           blk.11.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 152/ 363]         blk.11.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 153/ 363]               blk.11.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 154/ 363]            blk.11.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 155/ 363]           blk.11.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 156/ 363]                 blk.12.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 157/ 363]              blk.12.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 158/ 363]            blk.12.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 159/ 363]                 blk.12.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 160/ 363]                 blk.12.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 161/ 363]          blk.12.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 162/ 363]         blk.12.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 163/ 363]          blk.12.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 164/ 363]           blk.12.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 165/ 363]         blk.12.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 166/ 363]               blk.12.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 167/ 363]            blk.12.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 168/ 363]           blk.12.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 169/ 363]                 blk.13.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 170/ 363]              blk.13.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 171/ 363]            blk.13.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 172/ 363]                 blk.13.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 173/ 363]                 blk.13.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 174/ 363]          blk.13.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 175/ 363]         blk.13.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 176/ 363]          blk.13.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 177/ 363]           blk.13.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 178/ 363]         blk.13.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 179/ 363]               blk.13.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 180/ 363]            blk.13.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 181/ 363]           blk.13.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 182/ 363]                 blk.14.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 183/ 363]              blk.14.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 184/ 363]            blk.14.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 185/ 363]                 blk.14.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 186/ 363]                 blk.14.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 187/ 363]          blk.14.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 188/ 363]         blk.14.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 189/ 363]          blk.14.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 190/ 363]           blk.14.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 191/ 363]         blk.14.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 192/ 363]               blk.14.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 193/ 363]            blk.14.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 194/ 363]           blk.14.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 195/ 363]                 blk.15.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 196/ 363]              blk.15.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 197/ 363]            blk.15.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 198/ 363]                 blk.15.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 199/ 363]                 blk.15.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 200/ 363]          blk.15.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 201/ 363]         blk.15.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 202/ 363]          blk.15.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 203/ 363]           blk.15.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 204/ 363]         blk.15.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 205/ 363]               blk.15.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 206/ 363]            blk.15.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 207/ 363]           blk.15.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 208/ 363]                 blk.16.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 209/ 363]              blk.16.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 210/ 363]            blk.16.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 211/ 363]                 blk.16.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 212/ 363]                 blk.16.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 213/ 363]          blk.16.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 214/ 363]         blk.16.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 215/ 363]          blk.16.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 216/ 363]           blk.16.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 217/ 363]         blk.16.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 218/ 363]               blk.16.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 219/ 363]            blk.16.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 220/ 363]           blk.16.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 221/ 363]                 blk.17.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 222/ 363]              blk.17.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 223/ 363]            blk.17.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 224/ 363]                 blk.17.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 225/ 363]                 blk.17.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 226/ 363]          blk.17.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 227/ 363]         blk.17.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 228/ 363]          blk.17.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 229/ 363]           blk.17.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 230/ 363]         blk.17.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 231/ 363]               blk.17.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 232/ 363]            blk.17.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 233/ 363]           blk.17.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 234/ 363]                 blk.18.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 235/ 363]              blk.18.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 236/ 363]            blk.18.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 237/ 363]                 blk.18.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 238/ 363]                 blk.18.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 239/ 363]          blk.18.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 240/ 363]         blk.18.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 241/ 363]          blk.18.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 242/ 363]           blk.18.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 243/ 363]         blk.18.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 244/ 363]               blk.18.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 245/ 363]            blk.18.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 246/ 363]           blk.18.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 247/ 363]                 blk.19.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 248/ 363]              blk.19.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 249/ 363]            blk.19.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 250/ 363]                 blk.19.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 251/ 363]                 blk.19.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 252/ 363]          blk.19.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 253/ 363]         blk.19.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 254/ 363]          blk.19.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 255/ 363]           blk.19.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 256/ 363]         blk.19.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 257/ 363]               blk.19.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 258/ 363]            blk.19.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 259/ 363]           blk.19.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 260/ 363]                 blk.20.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 261/ 363]              blk.20.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 262/ 363]            blk.20.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 263/ 363]                 blk.20.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 264/ 363]                 blk.20.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 265/ 363]          blk.20.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 266/ 363]         blk.20.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 267/ 363]          blk.20.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 268/ 363]           blk.20.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 269/ 363]         blk.20.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 270/ 363]               blk.20.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 271/ 363]            blk.20.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 272/ 363]           blk.20.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 273/ 363]                 blk.21.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 274/ 363]              blk.21.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 275/ 363]            blk.21.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 276/ 363]                 blk.21.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 277/ 363]                 blk.21.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 278/ 363]          blk.21.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 279/ 363]         blk.21.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 280/ 363]          blk.21.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 281/ 363]           blk.21.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 282/ 363]         blk.21.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 283/ 363]               blk.21.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 284/ 363]            blk.21.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 285/ 363]           blk.21.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 286/ 363]                 blk.22.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 287/ 363]              blk.22.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 288/ 363]            blk.22.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 289/ 363]                 blk.22.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 290/ 363]                 blk.22.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 291/ 363]          blk.22.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 292/ 363]         blk.22.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 293/ 363]          blk.22.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 294/ 363]           blk.22.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 295/ 363]         blk.22.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 296/ 363]               blk.22.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 297/ 363]            blk.22.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 298/ 363]           blk.22.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 299/ 363]                 blk.23.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 300/ 363]              blk.23.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 301/ 363]            blk.23.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 302/ 363]                 blk.23.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 303/ 363]                 blk.23.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 304/ 363]          blk.23.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 305/ 363]         blk.23.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 306/ 363]          blk.23.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 307/ 363]           blk.23.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 308/ 363]         blk.23.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 309/ 363]               blk.23.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 310/ 363]            blk.23.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 311/ 363]           blk.23.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 312/ 363]                 blk.24.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 313/ 363]              blk.24.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 314/ 363]            blk.24.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 315/ 363]                 blk.24.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 316/ 363]                 blk.24.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 317/ 363]          blk.24.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 318/ 363]         blk.24.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 319/ 363]          blk.24.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 320/ 363]           blk.24.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 321/ 363]         blk.24.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 322/ 363]               blk.24.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 323/ 363]            blk.24.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 324/ 363]           blk.24.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 325/ 363]                 blk.25.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 326/ 363]              blk.25.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 327/ 363]            blk.25.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 328/ 363]                 blk.25.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 329/ 363]                 blk.25.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 330/ 363]          blk.25.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 331/ 363]         blk.25.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 332/ 363]          blk.25.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 333/ 363]           blk.25.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 334/ 363]         blk.25.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 335/ 363]               blk.25.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 336/ 363]            blk.25.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 337/ 363]           blk.25.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 338/ 363]                 blk.26.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 339/ 363]              blk.26.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 340/ 363]            blk.26.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 341/ 363]                 blk.26.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 342/ 363]                 blk.26.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 343/ 363]          blk.26.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 344/ 363]         blk.26.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 345/ 363]          blk.26.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 346/ 363]           blk.26.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 347/ 363]         blk.26.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 348/ 363]               blk.26.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 349/ 363]            blk.26.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 350/ 363]           blk.26.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 351/ 363]                 blk.27.attn_k.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 352/ 363]              blk.27.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 353/ 363]            blk.27.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 354/ 363]                 blk.27.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 355/ 363]                 blk.27.attn_v.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB
[ 356/ 363]          blk.27.ffn_down_exps.weight - [ 1408,  2048,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 357/ 363]         blk.27.ffn_down_shexp.weight - [ 2816,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 358/ 363]          blk.27.ffn_gate_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 359/ 363]           blk.27.ffn_gate_inp.weight - [ 2048,    64,     1,     1], type =    f32, size =    0.500 MB
[ 360/ 363]         blk.27.ffn_gate_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
[ 361/ 363]               blk.27.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MB
[ 362/ 363]            blk.27.ffn_up_exps.weight - [ 2048,  1408,    64,     1], type =    f16, converting to q8_0 .. size =   352.00 MiB ->   187.00 MiB
[ 363/ 363]           blk.27.ffn_up_shexp.weight - [ 2048,  2816,     1,     1], type =    f16, converting to q8_0 .. size =    11.00 MiB ->     5.84 MiB
llama_model_quantize_impl: model size  = 31241.20 MB
llama_model_quantize_impl: quant size  = 16603.42 MB

main: quantize time = 127536.35 ms
main:    total time = 127536.35 ms

模型现在内存占用是16.22GB,相较于原始大小30.48GB减少了46.8%,占用内存大大缩小,再结合llama运行MOE的那篇博客,量化后的模型可以正常对话,回答准确且速度快。

量化真是好东西啊!

Logo

火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。

更多推荐