[ICML 2025]MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding

②Responsive voxel selected will cause different voxel number when brings higher performance. Pooling or sampling them to the same number may cause loss of information

③Their method aims to complete tasks of perception & scene understanding, memory & knowledge retrieval, language & symbolic processing, and complex reasoning

prosthesis n.假体(如假肢、假眼或假牙)

2.3. Related Works

①⭐VQA responds answers which is not relevant to β value

②⭐Cross-subject methods did not deal well with voxel differentiation, flattening or samling may cause spatial/individual information loss:

③Designing different encoder for different person actually limits. And caption annotation only is also a limitation

2.4. Method

2.4.1. Method Overview

①Overall framework of MindLLM:

where LLM is Vicuna-7b（适合开放对话？？长文本理解？？）

②Input brain signal of each subject: $\boldsymbol{v}=[v_{1},v_{2},\cdots,v_{N}]\in\mathbb{R}^{N}$ , $N \in \left [ 12682,17907 \right ]$ denotes voxels

③fMRI encoder $f_\theta$ encodes $\boldsymbol{v}$ to fMRI tokens $X_{v}=[\boldsymbol{x}_{v,1},\boldsymbol{x}_{v,2},\cdots,\boldsymbol{x}_{v,L}]\in\mathbb{R}^{d\times L}$ with $d$ dimension and $L$ tokens

2.4.2. fMRI Encoder

①在注意力里面，V是某个体素激活，K是那个体素的傅里叶坐标和很多个属于不同脑图谱ROI的区域嵌入：

$k_i=k_i^\mathrm{pos}\|k_i^\mathrm{reg,}\mathcal{P}^1\|k_i^\mathrm{reg,}\mathcal{P}^2\|\cdots$

② $z_q\in\mathbb{R}^{N_q}$ is the output of attention layer and then employed a MLP:

$X_{v}=\mathrm{reshape}\left(\mathrm{MLP}(\{\boldsymbol{z}_{q}\})\right)\in\mathbb{R}^{L\times d}$

2.4.3. Brain Instruction Tuning (BIT)

①Tasks of MindLLM:

signifier n.能指(语言符号的形式)

②Multi-run conversation $X_{t}=(X_u^1,X_a^1,\cdots,X_u^T,X_a^T)$ with $T\geq1$ number of runs, $a$ message from the assistant and $u$ message is from the user for each sample $\boldsymbol{v}$

③Training object:

$\arg\max_\theta p(X_a|X_v,X_{\mathrm{inst}})=\prod_{t=1}^Tp(X_a^t|X_u^{\leq t},X_a^{\leq t},X_{\mathrm{inst}},X_v)$

④Examples of Q&A:

2.5. Experiments

2.5.1. Settings

①Datasets: NSD and other downstream datasets

2.5.2. Brain Captioning

①Captioning performance:

where CIDEr is scaled by a factor of 100

2.5.3. Versatile Decoding

①Performance of versatile decoding:

2.5.4. Unseen Subject Generalization

①Train on 1~7 subjects but evaluate on the 8:

2.5.5. Adapting to New Tasks

①Performance on sentiment understanding and utility/affordance tasks:

2.5.6. Ablation Study

①Ablation of position encoding:

2.5.7. Visualizations and Interpretations

①Attention of brain voxels:

2.6. Conclusion

火山引擎 ADG 社区

火山引擎开发者社区是火山引擎打造的AI技术生态平台，聚焦Agent与大模型开发，提供豆包系列模型（图像/视频/视觉）、智能分析与会话工具，并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长，新用户可领50万Tokens权益，助力构建智能应用。

更多推荐

OpenClaw 本地部署完整指南（Windows + Ollama）

本文档基于实际部署经验编写，旨在帮助你在 Windows 系统上从零开始搭建 OpenClaw，并连接本地 Ollama 模型（如 Qwen2.5 或 Qwen3），使其具备完整的智能体能力。文档包含了所有关键步骤以及常见问题的解决方案。

火山引擎 ADG 社区

OpenClaw 小白安装指南（Windows版）

（类似一个能自动执行任务的AI机器人），不是游戏。API Key只保存在你本地电脑的加密文件里，不会上传到任何地方。访问：https://github.com/miaoxworld/openclaw-manager/releases。: 一键安装脚本会自动安装Node.js 22+，如果失败，手动下载安装：https://nodejs.org/：在PowerShell中，鼠标右键就是粘贴，不需要按

火山引擎 ADG 社区

飞书 × OpenClaw 接入指南：不用服务器，用长连接把机器人跑起来

这个项目存在的意义，就是把“飞书接 OpenClaw”这件事，整理成一套的配置入口，并把官方文档没覆盖到的坑集中写成排查清单。先说清楚它的角色：OpenClaw 现在已经内置官方飞书插件 @openclaw/feishu，功能更完整、维护也更及时。，说明飞书 + AI 的接入已经走通。另外，仓库也推荐了一个新项目：把 OpenClaw 变成“多 Agent 团队”，用多个 Agent 分工，Sla