CVPR 2025图像/视频/3D生成论文汇总（附论文呢/代码）

多模态大语言模型(Multi-Modal Large Language Model)图像生成(Image Generation/Image Synthesis)视频生成(Video Generation/Image Synthesis)3D生成(3D Generation/3D Synthesis)2025年CVPR可复现论文合集，含代码。图像编辑（Image Editing)视频编辑(Video

yyyyyybw

2000人浏览 · 2025-06-11 15:44:12

yyyyyybw · 2025-06-11 15:44:12 发布

Awesome-CVPR2025-AIGC

A Collection of Papers and Codes for CVPR2025 AIGC

整理汇总下2025年CVPR AIGC相关的论文和代码，具体如下。

【Contents】

图像生成(Image Generation/Image Synthesis)
图像编辑（Image Editing)
视频生成(Video Generation/Image Synthesis)
视频编辑(Video Editing)
3D生成(3D Generation/3D Synthesis)
3D编辑(3D Editing)
多模态大语言模型(Multi-Modal Large Language Model)
其他多任务(Others)

2025年CVPR可复现论文合集，含代码https://docs.qq.com/doc/DQ25HbWt6WmdOZEta?u=7f01826fa3f140bb8e36e875087997e8&nlc=1

1.图像生成(Image Generation/Image Synthesis)

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Paper: https://arxiv.org/abs/2411.17787

Code: https://github.com/czg1225/CoDe

Paper: https://arxiv.org/abs/2408.16266

Code: https://github.com/scuwyh2000/Diff-II

Paper: https://arxiv.org/abs/2412.15119

Code: https://github.com/Epiphqny/PAR

Paper: https://arxiv.org/abs/2412.03177

Code: https://github.com/hqhQAQ/PatchDPO

Paper: https://arxiv.org/abs/2501.01423

Code: https://github.com/hustvl/LightningDiT

Paper: https://arxiv.org/abs/2410.18737

Code: https://github.com/thuxmf/recfg

Paper: https://arxiv.org/abs/2403.09055

Code: https://github.com/ironjr/semantic-draw

Paper: https://arxiv.org/abs/2412.04852

Code: https://github.com/taco-group/SleeperMark

Paper: https://arxiv.org/abs/2412.03069

Code: https://github.com/ByteFlow-AI/TokenFlow

2.图像编辑(Image Editing)

Attention Distillation: A Unified Approach to Visual Characteristics Transfer

Paper: https://arxiv.org/abs/2502.20235
Code: https://github.com/xugao97/AttentionDistillation

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

Paper: https://arxiv.org/abs/2411.16832
Code: https://github.com/taco-group/FaceLock

EmoEdit: Evoking Emotions through Image Manipulation

Paper: https://arxiv.org/abs/2405.12661
Code: https://github.com/JingyuanYY/EmoEdit

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

Paper: https://arxiv.org/abs/2502.18461
Code: https://github.com/HVision-NKU/K-LoRA

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Paper: https://arxiv.org/abs/2412.08503
Code: https://github.com/Westlake-AGI-Lab/StyleStudio

3.视频生成(Video Generation/Video Synthesis)

ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way

Paper: https://arxiv.org/abs/2410.06241
Code: https://github.com/Bujiazi/ByTheWay

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

Paper: https://arxiv.org/abs/2411.17440
Code: https://github.com/PKU-YuanGroup/ConsisID

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

Paper: https://arxiv.org/abs/2412.09283
Code: https://github.com/NJU-PCALab/InstanceCap

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Paper: https://arxiv.org/abs/2411.17459
Code: https://github.com/PKU-YuanGroup/WF-VAE

4.视频编辑(Video Editing)

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

Paper: https://arxiv.org/abs/2407.15642
Code: https://github.com/maxin-cn/Cinemo

Generative Inbetweening through Frame-wise Conditions-Driven Video Generation

Paper: https://arxiv.org/abs/2412.11755
Code: https://github.com/Tian-one/FCVG

X-Dyna: Expressive Dynamic Human Image Animation

Paper: https://arxiv.org/abs/2501.10021
Code: https://github.com/bytedance/X-Dyna

5.3D生成(3D Generation/3D Synthesis)

Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

Paper: https://arxiv.org/abs/2411.16185
Code: https://github.com/YuQiao0303/Fancy123

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

Paper: https://arxiv.org/abs/2501.13928
Code: https://github.com/facebookresearch/fast3r

GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation

Paper: https://arxiv.org/abs/2406.06526
Code: https://github.com/hzxie/GaussianCity

LT3SD: Latent Trees for 3D Scene Diffusion

Paper: https://arxiv.org/abs/2409.08215
Code: https://github.com/quan-meng/lt3sd

Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture

Paper: https://arxiv.org/abs/2503.00495
Code: https://github.com/XuanchenLi/TexTalk

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

Paper: https://arxiv.org/abs/2412.06699
Code: https://github.com/baaivision/See3D

6.3D编辑(3D Editing)

DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters

Paper: https://arxiv.org/abs/2411.17423
Code: https://github.com/yisuanwang/DRiVE

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

Paper: https://arxiv.org/abs/2411.15604
Code: https://github.com/zjwfufu/FateAvatar

Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters

Paper: https://arxiv.org/abs/2411.18197
Code: https://github.com/jasongzy/Make-It-Animatable

7.多模态大语言模型(Multi-Modal Large Language Models)

Automated Generation of Challenging Multiple Choice Questions for Vision Language Model Evaluation

Paper: https://arxiv.org/abs/2501.03225
Code: https://github.com/yuhui-zh15/AutoConverter

RAP-MLLM: Retrieval-Augmented Personalization for Multimodal Large Language Model

Paper: https://arxiv.org/abs/2410.13360
Code: https://github.com/Hoar012/RAP-MLLM

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

Paper: https://arxiv.org/abs/2412.01550
Code: https://github.com/hq-King/SeqAfford

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper: https://arxiv.org/abs/2411.17465
Code: https://github.com/showlab/ShowUI

8.其他任务(Others)

Continuous and Locomotive Crowd Behavior Generation

Paper:
Code: https://github.com/InhwanBae/Crowd-Behavior-Generation

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Paper: https://arxiv.org/abs/2412.15322
Code: https://github.com/hkchengrex/MMAudio

火山引擎 ADG 社区

火山引擎开发者社区是火山引擎打造的AI技术生态平台，聚焦Agent与大模型开发，提供豆包系列模型（图像/视频/视觉）、智能分析与会话工具，并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长，新用户可领50万Tokens权益，助力构建智能应用。

更多推荐

OpenClaw 本地部署完整指南（Windows + Ollama）

本文档基于实际部署经验编写，旨在帮助你在 Windows 系统上从零开始搭建 OpenClaw，并连接本地 Ollama 模型（如 Qwen2.5 或 Qwen3），使其具备完整的智能体能力。文档包含了所有关键步骤以及常见问题的解决方案。

火山引擎 ADG 社区

OpenClaw 小白安装指南（Windows版）

（类似一个能自动执行任务的AI机器人），不是游戏。API Key只保存在你本地电脑的加密文件里，不会上传到任何地方。访问：https://github.com/miaoxworld/openclaw-manager/releases。: 一键安装脚本会自动安装Node.js 22+，如果失败，手动下载安装：https://nodejs.org/：在PowerShell中，鼠标右键就是粘贴，不需要按

火山引擎 ADG 社区

飞书 × OpenClaw 接入指南：不用服务器，用长连接把机器人跑起来

这个项目存在的意义，就是把“飞书接 OpenClaw”这件事，整理成一套的配置入口，并把官方文档没覆盖到的坑集中写成排查清单。先说清楚它的角色：OpenClaw 现在已经内置官方飞书插件 @openclaw/feishu，功能更完整、维护也更及时。，说明飞书 + AI 的接入已经走通。另外，仓库也推荐了一个新项目：把 OpenClaw 变成“多 Agent 团队”，用多个 Agent 分工，Sla