用LoRA微调Stable Diffusion 并开启 Gradient Checkpointing的方法

检查点分段：将前向传播过程划分为多个段（Segment），仅缓存每个段的输入和输出。动态重新计算：在反向传播时，对每个段重新执行前向计算，得到中间激活值（用完即弃）。显存节省：显存占用从 O(N)（N=层数）降至 O(√N)

Lil Shake

922人浏览 · 2025-02-21 20:07:38

Lil Shake · 2025-02-21 20:07:38 发布

LoRA和Gradient Checkpointing介绍

方法介绍部分由Deepseek提供

LoRA（低秩矩阵分解实现参数高效微调）

LoRA 是一种 参数高效微调（Parameter-Efficient Fine-Tuning, PEFT） 技术，核心思想是通过 低秩矩阵分解，在预训练大模型（如LLM、Stable Diffusion）的权重矩阵上添加轻量级适配层，仅训练少量参数即可实现下游任务适应。

数学原理

假设原始模型的某个权重矩阵为 $W \in \mathbb{R}^{d\times k}$ ，LoRA 在其基础上添加两个低秩矩阵 $A$ 和 $B$ ：

$W' = W+\Delta W = W + B \cdot A \ (A \in \mathbb{R}^{d\times r}, B \in \mathbb{R}^{r \times k}, r << min(d,k))$

其中：

$r$ 是秩（Rank），控制适配层复杂度
微调时仅更新 $A$ 和 $B$ ，原始 $W$ 冻结

Gradient Checkpointing（梯度检查方法降低显存）

检查点分段：将前向传播过程划分为多个段（Segment），仅缓存每个段的输入和输出。
动态重新计算：在反向传播时，对每个段重新执行前向计算，得到中间激活值（用完即弃）。
显存节省：显存占用从 O(N)（N=层数）降至 O(√N)

模型加载和参数设置

1. 加载预训练模型架构和参数。

pipeline = StableDiffusionPipeline.from_pretrained(config.pretrained.model, revision=config.pretrained.revision)

2. 冻结参数，取消安全检查。

# freeze parameters of models to save more memory
pipeline.vae.requires_grad_(False)
pipeline.text_encoder.requires_grad_(False)
pipeline.unet.requires_grad_(not config.use_lora)

# disable safety checker
pipeline.safety_checker = None

3. 设置采样的scheduler（如DDIM）。

# switch to DDIM scheduler
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)

4. 设置LoRA层。

# Set correct lora layers
lora_attn_procs = {}
for name in pipeline.unet.attn_processors.keys():
    cross_attention_dim = (None if name.endswith("attn1.processor") else pipeline.unet.config.cross_attention_dim)
    if name.startswith("mid_block"):
        hidden_size = pipeline.unet.config.block_out_channels[-1]
    elif name.startswith("up_blocks"):
        block_id = int(name[len("up_blocks.")])
        hidden_size = list(reversed(pipeline.unet.config.block_out_channels))[block_id]
    elif name.startswith("down_blocks"):
        block_id = int(name[len("down_blocks.")])
        hidden_size = pipeline.unet.config.block_out_channels[block_id]

    lora_attn_procs[name] = LoRAAttnProcessor(hidden_size=hidden_size, cross_attention_dim=cross_attention_dim)
    pipeline.unet.set_attn_processor(lora_attn_procs)
    # this is a hack to synchronize gradients properly. the module that registers the parameters we care about (in
    # this case, AttnProcsLayers) needs to also be used for the forward pass. AttnProcsLayers doesn't have a
    # `forward` method, so we wrap it to add one and capture the rest of the unet parameters using a closure.
    class _Wrapper(AttnProcsLayers):
        def forward(self, *args, **kwargs):
            return pipeline.unet(*args, **kwargs)

    unet = _Wrapper(pipeline.unet.attn_processors)

5. 调用diffusers包中class UNet2DConditionModel自带的函数_set_gradient_checkpointing对LoRA层的模块开启 Gradient Checkpointing。

# set gradient checkpointing
pipeline.unet._set_gradient_checkpointing(unet, True)

6. 初始化优化器。

optimizer = optimizer_cls(
        unet.parameters(),
        lr=config.train.learning_rate,
        betas=(config.train.adam_beta1, config.train.adam_beta2),
        weight_decay=config.train.adam_weight_decay,
        eps=config.train.adam_epsilon,
    )

unet, optimizer = accelerator.prepare(unet, optimizer)

7. 开始训练部分代码

遇到如下警告

UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.

还未找到合适的解决方式，不确定是否需要解决。

智能体开发者社区

中国智能体开发者社区，聚焦智能体与大模型开发，提供前沿资讯、实用工具链、开源项目及行业案例。通过技术沙龙、开发者大赛等活动，促进经验交流与协作，助力开发者快速构建创新智能应用。

更多推荐

OpenClaw 本地部署完整指南（Windows + Ollama）

本文档基于实际部署经验编写，旨在帮助你在 Windows 系统上从零开始搭建 OpenClaw，并连接本地 Ollama 模型（如 Qwen2.5 或 Qwen3），使其具备完整的智能体能力。文档包含了所有关键步骤以及常见问题的解决方案。

智能体开发者社区

OpenClaw 小白安装指南（Windows版）

（类似一个能自动执行任务的AI机器人），不是游戏。API Key只保存在你本地电脑的加密文件里，不会上传到任何地方。访问：https://github.com/miaoxworld/openclaw-manager/releases。: 一键安装脚本会自动安装Node.js 22+，如果失败，手动下载安装：https://nodejs.org/：在PowerShell中，鼠标右键就是粘贴，不需要按

智能体开发者社区

飞书 × OpenClaw 接入指南：不用服务器，用长连接把机器人跑起来

这个项目存在的意义，就是把“飞书接 OpenClaw”这件事，整理成一套的配置入口，并把官方文档没覆盖到的坑集中写成排查清单。先说清楚它的角色：OpenClaw 现在已经内置官方飞书插件 @openclaw/feishu，功能更完整、维护也更及时。，说明飞书 + AI 的接入已经走通。另外，仓库也推荐了一个新项目：把 OpenClaw 变成“多 Agent 团队”，用多个 Agent 分工，Sla