CVPR 2025图像/视频/3D生成论文汇总(附论文呢/代码)
多模态大语言模型(Multi-Modal Large Language Model)图像生成(Image Generation/Image Synthesis)视频生成(Video Generation/Image Synthesis)3D生成(3D Generation/3D Synthesis)2025年CVPR可复现论文合集,含代码。图像编辑(Image Editing)视频编辑(Video
Awesome-CVPR2025-AIGC
A Collection of Papers and Codes for CVPR2025 AIGC
整理汇总下2025年CVPR AIGC相关的论文和代码,具体如下。
【Contents】
-
图像生成(Image Generation/Image Synthesis)
-
图像编辑(Image Editing)
-
视频生成(Video Generation/Image Synthesis)
-
视频编辑(Video Editing)
-
3D生成(3D Generation/3D Synthesis)
-
3D编辑(3D Editing)
-
多模态大语言模型(Multi-Modal Large Language Model)
-
其他多任务(Others)
1.图像生成(Image Generation/Image Synthesis)
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Paper: https://arxiv.org/abs/2411.17787
Code: https://github.com/czg1225/CoDe
Paper: https://arxiv.org/abs/2408.16266
Code: https://github.com/scuwyh2000/Diff-II
Paper: https://arxiv.org/abs/2412.15119
Code: https://github.com/Epiphqny/PAR
Paper: https://arxiv.org/abs/2412.03177
Code: https://github.com/hqhQAQ/PatchDPO
Paper: https://arxiv.org/abs/2501.01423
Code: https://github.com/hustvl/LightningDiT
Paper: https://arxiv.org/abs/2410.18737
Code: https://github.com/thuxmf/recfg
Paper: https://arxiv.org/abs/2403.09055
Code: https://github.com/ironjr/semantic-draw
Paper: https://arxiv.org/abs/2412.04852
Code: https://github.com/taco-group/SleeperMark
Paper: https://arxiv.org/abs/2412.03069
Code: https://github.com/ByteFlow-AI/TokenFlow
2.图像编辑(Image Editing)
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
-
Paper: https://arxiv.org/abs/2502.20235
-
Code: https://github.com/xugao97/AttentionDistillation
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
-
Paper: https://arxiv.org/abs/2411.16832
-
Code: https://github.com/taco-group/FaceLock
EmoEdit: Evoking Emotions through Image Manipulation
-
Paper: https://arxiv.org/abs/2405.12661
-
Code: https://github.com/JingyuanYY/EmoEdit
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
-
Paper: https://arxiv.org/abs/2502.18461
-
Code: https://github.com/HVision-NKU/K-LoRA
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
-
Paper: https://arxiv.org/abs/2412.08503
-
Code: https://github.com/Westlake-AGI-Lab/StyleStudio
3.视频生成(Video Generation/Video Synthesis)
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
-
Paper: https://arxiv.org/abs/2410.06241
-
Code: https://github.com/Bujiazi/ByTheWay
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
-
Paper: https://arxiv.org/abs/2411.17440
-
Code: https://github.com/PKU-YuanGroup/ConsisID
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
-
Paper: https://arxiv.org/abs/2412.09283
-
Code: https://github.com/NJU-PCALab/InstanceCap
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
-
Paper: https://arxiv.org/abs/2411.17459
-
Code: https://github.com/PKU-YuanGroup/WF-VAE
4.视频编辑(Video Editing)
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
-
Paper: https://arxiv.org/abs/2407.15642
-
Code: https://github.com/maxin-cn/Cinemo
Generative Inbetweening through Frame-wise Conditions-Driven Video Generation
-
Paper: https://arxiv.org/abs/2412.11755
-
Code: https://github.com/Tian-one/FCVG
X-Dyna: Expressive Dynamic Human Image Animation
-
Paper: https://arxiv.org/abs/2501.10021
-
Code: https://github.com/bytedance/X-Dyna
5.3D生成(3D Generation/3D Synthesis)
Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation
-
Paper: https://arxiv.org/abs/2411.16185
-
Code: https://github.com/YuQiao0303/Fancy123
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
-
Paper: https://arxiv.org/abs/2501.13928
-
Code: https://github.com/facebookresearch/fast3r
GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation
-
Paper: https://arxiv.org/abs/2406.06526
-
Code: https://github.com/hzxie/GaussianCity
LT3SD: Latent Trees for 3D Scene Diffusion
-
Paper: https://arxiv.org/abs/2409.08215
-
Code: https://github.com/quan-meng/lt3sd
Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture
-
Paper: https://arxiv.org/abs/2503.00495
-
Code: https://github.com/XuanchenLi/TexTalk
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
-
Paper: https://arxiv.org/abs/2412.06699
-
Code: https://github.com/baaivision/See3D
-

6.3D编辑(3D Editing)
DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters
-
Paper: https://arxiv.org/abs/2411.17423
-
Code: https://github.com/yisuanwang/DRiVE
FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video
-
Paper: https://arxiv.org/abs/2411.15604
-
Code: https://github.com/zjwfufu/FateAvatar
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters
-
Paper: https://arxiv.org/abs/2411.18197
-
Code: https://github.com/jasongzy/Make-It-Animatable
7.多模态大语言模型(Multi-Modal Large Language Models)
Automated Generation of Challenging Multiple Choice Questions for Vision Language Model Evaluation
-
Paper: https://arxiv.org/abs/2501.03225
-
Code: https://github.com/yuhui-zh15/AutoConverter
RAP-MLLM: Retrieval-Augmented Personalization for Multimodal Large Language Model
-
Paper: https://arxiv.org/abs/2410.13360
-
Code: https://github.com/Hoar012/RAP-MLLM
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
-
Paper: https://arxiv.org/abs/2412.01550
-
Code: https://github.com/hq-King/SeqAfford
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
-
Paper: https://arxiv.org/abs/2411.17465
-
Code: https://github.com/showlab/ShowUI
8.其他任务(Others)
Continuous and Locomotive Crowd Behavior Generation
-
Paper:
-
Code: https://github.com/InhwanBae/Crowd-Behavior-Generation
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
-
Paper: https://arxiv.org/abs/2412.15322
-
Code: https://github.com/hkchengrex/MMAudio
火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。
更多推荐
所有评论(0)