大模型面试手撕代码指南
# 写在前面:本人已找到实习,暂时不会维护此项目,所以如有新的常见手撕代码欢迎fork后自行修改或向我提出pr
---
收集了一些可能用于大模型面试的手撕代码,仅能用于展示基本运行逻辑,不保证能直接并入模型使用,有使用AI对代码进行纠正与优化,减弱工程化带来的代码难读问题
目前已经完成了比较基础的一些部分,也欢迎提交 issue 或 PR
等找到实习之后可能会更新文字讲解
链接在https://github.com/Ashside/LLM-HandCoding-Interview
---
# LLM-HandCoding-Interview
收集为大模型面试准备的手撕代码
## 常见 Attention
- [x] Self-Attention
- [x] Multi-Head Attention
- [x] Cross-Attention
- [x] Causal Attention (Masked Self-Attention)
- [x] Multi-Query Attention (MQA)
- [x] Grouped Query Attention (GQA)
- [x] Gated Attention
- [ ] Multi-Head Latent Attention (MLA)
- [x] Rotary Position Embedding (RoPE)
- [x] Sinusoidal Position Embedding
- [x] KV Cache
- [ ] Flash Attention
## 常见 RL 方法
- [x] LoRA (Low-Rank Adaptation)
- [ ] Distillation
- [x] PPO (Proximal Policy Optimization)
- [x] DPO (Direct Preference Optimization)
- [x] GRPO
- [ ] SPO
- [ ] DAPO
## 常见 Utils
- [x] Softmax
- [x] LayerNorm
- [x] RMSNorm
- [x] SwiGLU
- [x] AdamW
- [x] Learning Rate Scheduler
- [x] Gradient Clipping
- [ ] Mixed Precision Training
- [ ] Distributed Data Parallel (DDP)
更多推荐
所有评论(0)