视频理解综述
万字长文总结多模态大模型评估最新进展 - yearn的文章 - 知乎。
CVPR2025
CVPR25 Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
https://github.com/BradyFU/Video-MME?tab=readme-ov-file
1、Awesome-LLMs-for-Video-Understanding
https://github.com/yunlong10/Awesome-LLMs-for-Video-Understanding
https://arxiv.org/pdf/2312.17432v4 (2407修订版)

From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
https://arxiv.org/pdf/2409.18938
https://github.com/Vincent-ZHQ/LV-LLMs

paperwithcode Video Understanding
https://paperswithcode.com/task/video-understanding/latest
Awesome-Multimodal-Large-Language-Models
https://github.com/yfzhang114/Awesome-Multimodal-Large-Language-Models?tab=readme-ov-file
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
https://arxiv.org/pdf/2411.15296
万字长文总结多模态大模型评估最新进展 - yearn的文章 - 知乎
https://zhuanlan.zhihu.com/p/16815782175
另一个Awesome-Multimodal-Large-Language-Models
https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models?tab=readme-ov-file
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
多模态学习有什么好的研究方向? - 梦想成真的回答 - 知乎
https://www.zhihu.com/question/332876504/answer/130142183129
火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。
更多推荐
所有评论(0)