本文讨论了多模态大语言模型领域的相关研究,列举了该领域的多篇论文及其对应的链接。关键要点包括:

  1. ClearSight 相关论文:论文名为 ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models,链接为 https://arxiv.org/abs/2503.13107

  2. VCR 相关论文:论文名为 VCR: A “Cone of Experience” Driven Synthetic Data Generation Framework for Mathematical Reasoning,链接为 VCR: A “Cone of Experience” Driven Synthetic Data Generation Framework for Mathematical Reasoning| Proceedings of the AAAI Conference on Artificial Intelligence

  3. 多视角综述论文:A Survey of Multimodal Large Language Model from A Data-centric Perspective,链接为 https://arxiv.org/abs/2405.16640v1

  4. 多模态大语言模型综述论文:The Revolution of Multimodal Large Language Models: A Survey,链接为 https://aclanthology.org/2024.findings-acl.807.pdf

  5. EarthGPT 相关论文:EarthGPT: A Universal Multimodal Large Language Model for Multisensor Image Comprehension in Remote Sensing Domain,链接为 EarthGPT: A Universal Multimodal Large Language Model for Multisensor Image Comprehension in Remote Sensing Domain | IEEE Journals & Magazine | IEEE Xplore

  6. Chain of Images 相关论文:Chain of Images for Intuitively Reasoning,链接为 [2311.09241] Chain of Images for Intuitively Reasoning

  7. GPT-4 多模态分析论文:GPT-4 Multimodal Analysis on Ophthalmology Clinical Cases Including Text and Images,链接为 GPT-4 Multimodal Analysis on Ophthalmology Clinical Cases Including Text and Images | medRxiv

1、ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models

论文链接:https://arxiv.org/abs/2503.13107

2、VCR: A “Cone of Experience” Driven Synthetic Data Generation Framework for Mathematical Reasoning

论文链接:https://ojs.aaai.org/index.php/AAAI/article/view/34645

3、A Survey of Multimodal Large Language Model from A Data-centric Perspective

论文链接:https://arxiv.org/abs/2405.16640v1

4、The Revolution of Multimodal Large Language Models: A Survey

论文链接:https://aclanthology.org/2024.findings-acl.807.pdf

5、EarthGPT: A Universal Multimodal Large Language Model for Multisensor Image Comprehension in Remote Sensing Domain

论文链接:https://ieeexplore.ieee.org/abstract/document/10547418?signout=success

6、Chain of Images for Intuitively Reasoning

论文链接:https://arxiv.org/abs/2311.09241

暂时无法在飞书文档外展示此内容

7、GPT-4 Multimodal Analysis on Ophthalmology Clinical Cases Including Text and Images

论文链接:https://www.medrxiv.org/content/10.1101/2023.11.24.23298953v1

暂时无法在飞书文档外展示此内容

8、LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

论文链接:https://arxiv.org/abs/2311.11860

暂时无法在飞书文档外展示此内容

9、MM-LLMs: Recent Advances in MultiModal Large Language Models

论文链接:https://arxiv.org/abs/2401.13601

暂时无法在飞书文档外展示此内容

10、SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

论文链接:https://arxiv.org/abs/2409.06633

暂时无法在飞书文档外展示此内容

11、VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

论文链接:https://openreview.net/forum?id=BH7ZAmkWVc

暂时无法在飞书文档外展示此内容

12、RocketEval: Efficient automated LLM evaluation via grading checklist

论文链接:https://openreview.net/forum?id=zJjzNj6QUe

Logo

火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。

更多推荐