ELMo 说明解析及用法

与传统静态词嵌入（如 Word2Vec、GloVe）不同，ELMo 生成的词向量会随上下文动态变化，解决了多义词和复杂语境下的语义表示问题。（Embeddings from Language Models）是由 AllenNLP 在 2018 年提出的。计算句子对的语义相似度（如问答匹配、 paraphrase 检测）。利用 ELMo 捕捉上下文敏感的实体边界（如人名、地名）分类、命名实体识别（N

suixinm

625人浏览 · 2025-06-22 20:09:45

suixinm · 2025-06-22 20:09:45 发布

ELMo（Embeddings from Language Models） 的全面解析，包括原理、用法、解决的问题以及代码实现示例：

一. ELMo 简介

ELMo（Embeddings from Language Models）是由 AllenNLP 在 2018 年提出的 上下文相关的词嵌入模型。与传统静态词嵌入（如 Word2Vec、GloVe）不同，ELMo 生成的词向量会随上下文动态变化，解决了多义词和复杂语境下的语义表示问题。

二. ELMo 的核心思想

双向语言模型（BiLM）： ELMo 通过双向 LSTM 分别建模正向和反向的语言模型，捕捉上下文信息。
- 正向语言模型：根据前文预测当前词。
- 反向语言模型：根据后文预测当前词。
多层表示融合：整合 LSTM 不同层的隐藏状态（浅层捕捉语法，深层捕捉语义），生成动态词向量。

结构图

示意图

三、实现过程

四. ELMo 解决的问题

问题类型	传统方法缺陷	ELMo 的改进
多义词歧义	Word2Vec 对多义词只有单一表示	根据上下文生成不同嵌入（如 "bank" 在金融/河流场景不同）
复杂语境理解	忽略句子结构信息	通过双向 LSTM 捕捉前后文依赖关系
任务特定特征提取	需从头训练模型	提供预训练嵌入，支持下游任务微调

五. ELMo 的用法

安装依赖

#导入

pip install allennlp allennlp-models

自定义 ELMo 嵌入提取

from allennlp.commands.elmo import ElmoEmbedder

# 加载预训练 ELMo
elmo = ElmoEmbedder()

# 提取单句词向量
sentence = ["I", "ate", "an", "apple"]
vectors = elmo.embed_sentence(sentence)  # 返回三层 LSTM 的输出（每层 1024 维）
print(vectors.shape)  # (3, 4, 1024): 3 层 x 4 词 x 1024 维

# 提取批量句子
batch = [["Hello", "world"], ["ELMo", "is", "awesome"]]
batch_vectors = elmo.embed_sentences(batch)

import torch
from allennlp.modules.elmo import Elmo

# 配置 ELMo
options_file = "path/to/options.json"
weight_file = "path/to/weights.hdf5"
elmo = Elmo(options_file, weight_file, num_output_representations=1)

# 模拟输入
input_ids = torch.randn(2, 10, 50)  # 假设已转换为字符 ID
embeddings = elmo(input_ids)["elmo_representations"][0]  # (2, 10, 1024)

六. 使用场景 分类、命名实体识别（NER）和语义相似度计算等任务

文本分类（Text Classification）利用 ELMo 的动态词向量增强输入表示，提升分类效果（如情感分析、新闻分类）

from allennlp.modules.elmo import Elmo, batch_to_ids
import torch
import torch.nn as nn

# 配置 ELMo
options_file = "https://allennlp.s3.amazonaws.com/models/elmo/2x4096_512_2048cnn/2x4096_512_2048cnn_elmo_options.json"
weight_file = "https://allennlp.s3.amazonaws.com/models/elmo/2x4096_512_2048cnn/2x4096_512_2048cnn_elmo_weights.hdf5"

# 定义分类模型
class ELMoTextClassifier(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.elmo = Elmo(options_file, weight_file, num_output_representations=1, dropout=0)
        self.lstm = nn.LSTM(input_size=1024, hidden_size=256, batch_first=True)
        self.classifier = nn.Linear(256, num_classes)

    def forward(self, sentences):
        # 生成 ELMo 嵌入
        character_ids = batch_to_ids(sentences)  # 将文本转为字符ID
        elmo_emb = self.elmo(character_ids)["elmo_representations"][0]  # (batch, seq_len, 1024)
        
        # 通过LSTM和分类器
        lstm_out, _ = self.lstm(elmo_emb)
        logits = self.classifier(lstm_out[:, -1, :])  # 取最后时间步
        return logits

# 示例使用
model = ELMoTextClassifier(num_classes=2)
sentences = [["I", "love", "this", "movie"], ["This", "is", "terrible"]]
output = model(sentences)
print(output.shape)  # torch.Size([2, 2])

命名实体识别（Named Entity Recognition, NER） 利用 ELMo 捕捉上下文敏感的实体边界（如人名、地名）

from allennlp.modules.elmo import Elmo, batch_to_ids
import torch
import torch.nn as nn

class ELMoForNER(nn.Module):
    def __init__(self, num_tags):
        super().__init__()
        self.elmo = Elmo(options_file, weight_file, num_output_representations=1, dropout=0)
        self.lstm = nn.LSTM(input_size=1024, hidden_size=256, batch_first=True, bidirectional=True)
        self.classifier = nn.Linear(512, num_tags)  # 双向LSTM输出拼接

    def forward(self, sentences):
        character_ids = batch_to_ids(sentences)
        elmo_emb = self.elmo(character_ids)["elmo_representations"][0]  # (batch, seq_len, 1024)
        
        # 双向LSTM
        lstm_out, _ = self.lstm(elmo_emb)  # (batch, seq_len, 512)
        
        # 每个词对应的标签logits
        tag_logits = self.classifier(lstm_out)  # (batch, seq_len, num_tags)
        return tag_logits

# 示例使用
model = ELMoForNER(num_tags=5)  # 假设5种实体类型
sentences = [["Apple", "is", "based", "in", "Cupertino"]]
output = model(sentences)
print(output.shape)  # torch.Size([1, 5, 5])

语义相似度计算（Semantic Similarity）

计算句子对的语义相似度（如问答匹配、 paraphrase 检测）。

代码实现：

from allennlp.modules.elmo import Elmo, batch_to_ids
import torch
import torch.nn.functional as F

def elmo_sentence_similarity(sentence1, sentence2):
    # 初始化ELMo
    elmo = Elmo(options_file, weight_file, num_output_representations=1, dropout=0)
    
    # 生成句子嵌入
    char_ids = batch_to_ids([sentence1, sentence2])
    embeddings = elmo(char_ids)["elmo_representations"][0]  # (2, seq_len, 1024)
    
    # 取句子整体嵌入（均值池化）
    sent1_emb = torch.mean(embeddings[0], dim=0)  # (1024,)
    sent2_emb = torch.mean(embeddings[1], dim=0)  # (1024,)
    
    # 计算余弦相似度
    similarity = F.cosine_similarity(sent1_emb.unsqueeze(0), sent2_emb.unsqueeze(0), dim=1)
    return similarity.item()

# 示例使用
sentence1 = ["The", "cat", "sat", "on", "the", "mat"]
sentence2 = ["A", "feline", "is", "sitting", "on", "a", "rug"]
similarity = elmo_sentence_similarity(sentence1, sentence2)
print(f"Similarity: {similarity:.4f}")  # 输出范围 [-1, 1]

词义消歧（Word Sense Disambiguation）

根据上下文动态区分多义词的不同含义。

代码实现：

from allennlp.modules.elmo import ElmoEmbedder

def disambiguate_word_sense(word, context):
    elmo = ElmoEmbedder()
    embeddings = elmo.embed_sentence(context)  # (3 layers, seq_len, 1024)
    
    # 获取目标词的ELMo嵌入（所有层拼接）
    word_index = context.index(word)
    word_embedding = torch.cat([
        torch.tensor(embeddings[i][word_index]) for i in range(3)
    ], dim=0)  # 3072维
    
    return word_embedding

# 示例：区分 "bank" 的不同含义
context1 = ["He", "went", "to", "the", "bank", "to", "deposit", "money"]  # 金融机构
context2 = ["They", "fished", "by", "the", "bank", "of", "the", "river"]  # 河岸

embedding1 = disambiguate_word_sense("bank", context1)
embedding2 = disambiguate_word_sense("bank", context2)

similarity = F.cosine_similarity(embedding1.unsqueeze(0), embedding2.unsqueeze(0), dim=1)
print(f"Similarity between 'bank' senses: {similarity.item():.4f}")  # 预期较低（不同含义）

火山引擎 ADG 社区

火山引擎开发者社区是火山引擎打造的AI技术生态平台，聚焦Agent与大模型开发，提供豆包系列模型（图像/视频/视觉）、智能分析与会话工具，并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长，新用户可领50万Tokens权益，助力构建智能应用。

更多推荐

OpenClaw 本地部署完整指南（Windows + Ollama）

本文档基于实际部署经验编写，旨在帮助你在 Windows 系统上从零开始搭建 OpenClaw，并连接本地 Ollama 模型（如 Qwen2.5 或 Qwen3），使其具备完整的智能体能力。文档包含了所有关键步骤以及常见问题的解决方案。

火山引擎 ADG 社区

OpenClaw 小白安装指南（Windows版）

（类似一个能自动执行任务的AI机器人），不是游戏。API Key只保存在你本地电脑的加密文件里，不会上传到任何地方。访问：https://github.com/miaoxworld/openclaw-manager/releases。: 一键安装脚本会自动安装Node.js 22+，如果失败，手动下载安装：https://nodejs.org/：在PowerShell中，鼠标右键就是粘贴，不需要按

火山引擎 ADG 社区

飞书 × OpenClaw 接入指南：不用服务器，用长连接把机器人跑起来

这个项目存在的意义，就是把“飞书接 OpenClaw”这件事，整理成一套的配置入口，并把官方文档没覆盖到的坑集中写成排查清单。先说清楚它的角色：OpenClaw 现在已经内置官方飞书插件 @openclaw/feishu，功能更完整、维护也更及时。，说明飞书 + AI 的接入已经走通。另外，仓库也推荐了一个新项目：把 OpenClaw 变成“多 Agent 团队”，用多个 Agent 分工，Sla