langchain4j语音交互全攻略：从实时语音输入到智能行动落地

你是否还在为Java应用集成语音功能而烦恼？传统语音交互方案需要手动拼接STT（语音转文本）、LLM（大语言模型）和工具调用流程，不仅开发复杂度高，还面临音频格式兼容、上下文管理等难题。本文将系统讲解如何使用langchain4j构建端到端语音交互系统，通过5个核心步骤实现从麦克风输入到智能行动的闭环，包含12个代码示例和3个架构流程图，让你2小时内掌握企业级语音交互开发。读完本文你将获得：...

黎连研Shana

1278人浏览 · 2025-09-07 09:36:21

黎连研Shana · 2025-09-07 09:36:21 发布

langchain4j语音交互全攻略：从实时语音输入到智能行动落地

【免费下载链接】langchain4j langchain4j - 一个Java库，旨在简化将AI/LLM（大型语言模型）能力集成到Java应用程序中。项目地址: https://gitcode.com/GitHub_Trending/la/langchain4j

引言：语音交互的技术痛点与解决方案

读完本文你将获得：

基于langchain4j+OpenAI构建语音交互系统的完整方案
音频文件/实时流处理的最佳实践
语音指令触发工具调用的实现方法
多轮对话状态管理与上下文维护技巧
性能优化与错误处理的实战经验

技术原理：语音交互的技术栈与工作流

核心技术栈对比

方案	实现复杂度	响应延迟	开发成本	适用场景
传统拼接方案	高（需集成STT/LLM/TTS）	>500ms	高（多API对接）	定制化需求强的场景
langchain4j集成方案	低（一站式API）	<300ms	低（无需关注中间层）	快速开发与迭代
纯客户端方案	中（依赖浏览器API）	低（本地处理）	中（模型部署复杂）	离线场景

工作流程图

mermaid

环境准备：开发环境与依赖配置

开发环境要求

JDK 17+
Maven/Gradle
OpenAI API密钥（需申请gpt-4o-audio-preview访问权限）
音频处理依赖：FFmpeg（可选，用于格式转换）

Maven依赖配置

<dependencies>
    <!-- langchain4j核心 -->
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j-core</artifactId>
        <version>0.32.0</version>
    </dependency>
    
    <!-- OpenAI集成 -->
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j-open-ai</artifactId>
        <version>0.32.0</version>
    </dependency>
    
    <!-- 音频处理 -->
    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-io</artifactId>
        <version>1.3.2</version>
    </dependency>
</dependencies>

API密钥配置

// src/main/resources/application.properties
openai.api.key=sk-xxx
openai.model.name=gpt-4o-audio-preview
openai.temperature=0.3

核心实现：从语音到行动的五步流程

第一步：音频采集与编码

1.1 读取本地音频文件

import dev.langchain4j.data.audio.Audio;
import dev.langchain4j.data.message.AudioContent;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Base64;

public class AudioUtils {
    public static AudioContent fromFile(String filePath, String mimeType) throws Exception {
        // 读取音频文件并编码为Base64
        byte[] audioBytes = Files.readAllBytes(Paths.get(filePath));
        String base64Audio = Base64.getEncoder().encodeToString(audioBytes);
        
        // 创建Audio对象
        Audio audio = Audio.builder()
                .base64Data(base64Audio)
                .mimeType(mimeType)
                .build();
                
        return AudioContent.from(audio);
    }
}

// 使用示例
AudioContent audioContent = AudioUtils.fromFile("src/main/resources/audio.wav", "audio/wav");

1.2 麦克风实时录音（基于JavaSound）

import javax.sound.sampled.*;
import java.io.ByteArrayOutputStream;
import java.util.concurrent.CountDownLatch;

public class MicrophoneRecorder {
    private TargetDataLine line;
    private ByteArrayOutputStream out;
    private boolean isRecording = false;

    public byte[] record(int durationSeconds) throws Exception {
        AudioFormat format = new AudioFormat(16000, 16, 1, true, true);
        DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
        
        if (!AudioSystem.isLineSupported(info)) {
            throw new Exception("音频格式不支持");
        }
        
        line = (TargetDataLine) AudioSystem.getLine(info);
        line.open(format);
        line.start();
        
        out = new ByteArrayOutputStream();
        isRecording = true;
        CountDownLatch latch = new CountDownLatch(1);
        
        // 启动录音线程
        new Thread(() -> {
            byte[] buffer = new byte[1024];
            while (isRecording) {
                int bytesRead = line.read(buffer, 0, buffer.length);
                out.write(buffer, 0, bytesRead);
            }
            latch.countDown();
        }).start();
        
        // 录音指定时长
        Thread.sleep(durationSeconds * 1000);
        stopRecording();
        latch.await();
        
        return out.toByteArray();
    }
    
    public void stopRecording() {
        isRecording = false;
        line.stop();
        line.close();
    }
}

// 使用示例
MicrophoneRecorder recorder = new MicrophoneRecorder();
byte[] audioBytes = recorder.record(5); // 录制5秒
String base64Audio = Base64.getEncoder().encodeToString(audioBytes);

第二步：音频内容封装与模型调用

2.1 创建OpenAI语音模型

import dev.langchain4j.model.openai.OpenAiChatModel;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.chat.ChatResponse;

public class AudioChatService {
    private OpenAiChatModel audioModel;
    
    public AudioChatService() {
        this.audioModel = OpenAiChatModel.builder()
                .apiKey(System.getenv("OPENAI_API_KEY"))
                .modelName("gpt-4o-audio-preview")
                .temperature(0.0)
                .timeout(30000) // 音频处理超时设为30秒
                .build();
    }
    
    public String processAudio(AudioContent audioContent) {
        UserMessage userMessage = UserMessage.from(
            "请转录并理解这段音频，提取关键指令:",
            audioContent
        );
        
        ChatResponse response = audioModel.chat(userMessage);
        return response.aiMessage().text();
    }
}

2.2 处理多模态输入（语音+文本）

UserMessage multiModalMessage = UserMessage.from(
    TextContent.from("结合以下上下文理解音频: 当前城市是北京，时间是2025年"),
    audioContent
);

第三步：工具调用与行动执行

3.1 定义工具规范

import dev.langchain4j.agent.tool.ToolSpecification;
import dev.langchain4j.data.message.ToolExecutionResultMessage;
import dev.langchain4j.model.chat.ChatRequest;

public class VoiceAssistant {
    // 定义计算器工具
    private ToolSpecification calculatorTool = ToolSpecification.builder()
            .name("calculator")
            .description("用于计算数学表达式，返回计算结果")
            .parameters("{\"type\":\"object\",\"properties\":{\"expression\":{\"type\":\"string\",\"description\":\"数学表达式，如2+2\"}},\"required\":[\"expression\"]}")
            .build();
    
    // 定义网络搜索工具
    private ToolSpecification webSearchTool = ToolSpecification.builder()
            .name("web_search")
            .description("用于获取实时信息，如天气、新闻、股票等")
            .parameters("{\"type\":\"object\",\"properties\":{\"query\":{\"type\":\"string\",\"description\":\"搜索关键词\"}},\"required\":[\"query\"]}")
            .build();
    
    // 创建带工具的聊天请求
    public ChatRequest createToolEnabledRequest(String text, AudioContent audio) {
        return ChatRequest.builder()
                .messages(UserMessage.from(text, audio))
                .toolSpecifications(calculatorTool, webSearchTool)
                .build();
    }
}

3.2 处理工具调用结果

public String processWithTools(ChatRequest request) {
    ChatResponse response = audioModel.chat(request);
    
    // 检查是否需要工具调用
    if (response.aiMessage().hasToolExecutionRequests()) {
        // 执行工具调用
        String toolResult = executeTool(response.aiMessage().toolExecutionRequests().get(0));
        
        // 将工具结果返回给模型
        ToolExecutionResultMessage resultMessage = ToolExecutionResultMessage.from(
            response.aiMessage().toolExecutionRequests().get(0), 
            toolResult
        );
        
        // 创建包含工具结果的新请求
        ChatRequest followUpRequest = ChatRequest.builder()
                .messages(request.messages(), response.aiMessage(), resultMessage)
                .toolSpecifications(calculatorTool, webSearchTool)
                .build();
        
        // 获取最终回答
        ChatResponse finalResponse = audioModel.chat(followUpRequest);
        return finalResponse.aiMessage().text();
    }
    
    return response.aiMessage().text();
}

private String executeTool(ToolExecutionRequest request) {
    if ("calculator".equals(request.name())) {
        // 解析参数并执行计算
        String expression = parseJsonParameter(request.arguments(), "expression");
        return evaluateExpression(expression);
    } else if ("web_search".equals(request.name())) {
        String query = parseJsonParameter(request.arguments(), "query");
        return webSearchService.search(query);
    }
    return "未知工具: " + request.name();
}

第四步：多轮对话管理

4.1 对话状态管理

import dev.langchain4j.data.message.ChatMessage;
import java.util.ArrayList;
import java.util.List;

public class ConversationManager {
    private List<ChatMessage> conversationHistory = new ArrayList<>();
    private static final int MAX_HISTORY_SIZE = 10; // 最多保留10轮对话
    
    public void addMessage(ChatMessage message) {
        conversationHistory.add(message);
        // 控制历史记录大小，防止token超限
        if (conversationHistory.size() > MAX_HISTORY_SIZE) {
            conversationHistory.remove(0);
        }
    }
    
    public List<ChatMessage> getHistory() {
        return new ArrayList<>(conversationHistory);
    }
    
    public void clear() {
        conversationHistory.clear();
    }
}

4.2 上下文感知对话

// 在处理新音频前添加历史对话
List<ChatMessage> messages = new ArrayList<>(conversationManager.getHistory());
messages.add(UserMessage.from(audioContent));

ChatRequest request = ChatRequest.builder()
        .messages(messages)
        .toolSpecifications(calculatorTool, webSearchTool)
        .build();

第五步：响应处理与语音输出（可选）

5.1 文本转语音（使用第三方API）

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.file.Files;
import java.nio.file.Paths;

public class TextToSpeechService {
    private static final String TTS_API_URL = "https://api.openai.com/v1/audio/speech";
    
    public void textToSpeech(String text, String outputFilePath) throws Exception {
        String jsonBody = "{\"model\":\"tts-1\",\"input\":\"" + text + "\",\"voice\":\"alloy\"}";
        
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(TTS_API_URL))
                .header("Content-Type", "application/json")
                .header("Authorization", "Bearer " + System.getenv("OPENAI_API_KEY"))
                .POST(HttpRequest.BodyPublishers.ofString(jsonBody))
                .build();
        
        HttpClient client = HttpClient.newHttpClient();
        client.sendAsync(request, HttpResponse.BodyHandlers.ofByteArray())
                .thenApply(HttpResponse::body)
                .thenAccept(bytes -> {
                    try {
                        Files.write(Paths.get(outputFilePath), bytes);
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }).join();
    }
}

实战案例：语音控制智能家居系统

系统架构

mermaid

核心代码实现

public class SmartHomeVoiceAssistant {
    private AudioChatService audioService;
    private ConversationManager conversationManager;
    private SmartHomeTool smartHomeTool;
    
    public SmartHomeVoiceAssistant() {
        this.audioService = new AudioChatService();
        this.conversationManager = new ConversationManager();
        this.smartHomeTool = new SmartHomeTool();
    }
    
    public void startListening() {
        System.out.println("正在聆听...(说'退出'结束)");
        MicrophoneRecorder recorder = new MicrophoneRecorder();
        
        try {
            while (true) {
                System.out.println("请说话...");
                byte[] audioBytes = recorder.record(5); // 每次录制5秒
                String base64Audio = Base64.getEncoder().encodeToString(audioBytes);
                AudioContent audioContent = AudioContent.from(base64Audio, "audio/wav");
                
                // 处理音频
                String response = processVoiceCommand(audioContent);
                System.out.println("助手: " + response);
                
                // 播放响应(可选)
                // new TextToSpeechService().textToSpeech(response, "response.mp3");
                // playAudio("response.mp3");
                
                if (response.contains("退出")) {
                    break;
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    
    private String processVoiceCommand(AudioContent audioContent) {
        // 创建带工具的请求
        ToolSpecification homeControlTool = smartHomeTool.createToolSpecification();
        
        UserMessage userMessage = UserMessage.from(audioContent);
        conversationManager.addMessage(userMessage);
        
        ChatRequest request = ChatRequest.builder()
                .messages(conversationManager.getHistory())
                .toolSpecifications(homeControlTool)
                .build();
        
        // 处理请求并获取响应
        String responseText = audioService.processWithTools(request);
        conversationManager.addMessage(AiMessage.from(responseText));
        
        return responseText;
    }
    
    public static void main(String[] args) {
        SmartHomeVoiceAssistant assistant = new SmartHomeVoiceAssistant();
        assistant.startListening();
    }
}

性能优化与最佳实践

音频处理优化

格式选择：优先使用16kHz采样率、单声道、16位PCM编码的WAV格式，平衡质量与大小
分块处理：长音频采用10秒分段处理，减少单次请求大小
缓存策略：对常见指令的转录结果进行缓存，减少重复处理

// 音频分块处理示例
public List<AudioContent> splitAudio(byte[] audioBytes, int chunkSizeSeconds) {
    List<AudioContent> chunks = new ArrayList<>();
    int bytesPerSecond = 16000 * 2; // 16kHz, 16位(2字节)单声道
    int chunkSizeBytes = chunkSizeSeconds * bytesPerSecond;
    
    for (int i = 0; i < audioBytes.length; i += chunkSizeBytes) {
        int end = Math.min(i + chunkSizeBytes, audioBytes.length);
        byte[] chunk = Arrays.copyOfRange(audioBytes, i, end);
        String base64Chunk = Base64.getEncoder().encodeToString(chunk);
        chunks.add(AudioContent.from(base64Chunk, "audio/wav"));
    }
    
    return chunks;
}

错误处理策略

错误类型	处理方法	重试策略	用户反馈
API超时	切换备用端点	指数退避(1s, 2s, 4s)	"网络繁忙，请稍后再试"
音频格式错误	自动转换格式	立即重试	"音频格式不支持，已自动转换"
工具调用失败	回退到人工模式	固定延迟(3s)	"设备暂时无法连接，请检查网络"

资源释放与内存管理

// 确保资源正确释放的示例
public class ResourceManagedAudioService implements AutoCloseable {
    private OpenAiChatModel model;
    private MicrophoneRecorder recorder;
    
    public ResourceManagedAudioService() {
        this.model = OpenAiChatModel.builder()
                .apiKey(System.getenv("OPENAI_API_KEY"))
                .modelName("gpt-4o-audio-preview")
                .build();
        this.recorder = new MicrophoneRecorder();
    }
    
    // 业务方法...
    
    @Override
    public void close() throws Exception {
        // 释放模型连接
        if (model instanceof AutoCloseable) {
            ((AutoCloseable) model).close();
        }
        // 停止录音并释放音频设备
        recorder.stopRecording();
    }
}

// 使用try-with-resources确保资源释放
try (ResourceManagedAudioService service = new ResourceManagedAudioService()) {
    service.processAudio(...);
} catch (Exception e) {
    e.printStackTrace();
}

扩展与进阶：自定义语音交互能力

本地语音模型集成

对于隐私敏感场景，可以集成本地语音模型替代云端API：

// 伪代码示例：集成本地Whisper模型
public class LocalSpeechToTextService {
    private WhisperModel localModel;
    
    public LocalSpeechToTextService() {
        this.localModel = WhisperModel.load("models/whisper-base");
    }
    
    public String transcribe(byte[] audioBytes) {
        Audio audio = Audio.fromBytes(audioBytes, "audio/wav");
        return localModel.transcribe(audio);
    }
}

自定义唤醒词检测

import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.LiveSpeechRecognizer;

public class WakeWordDetector {
    private LiveSpeechRecognizer recognizer;
    private boolean wakeWordDetected = false;
    
    public WakeWordDetector() throws Exception {
        Configuration config = new Configuration();
        config.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
        config.setDictionaryPath("wakeword.dic");
        config.setLanguageModelPath("wakeword.lm");
        
        recognizer = new LiveSpeechRecognizer(config);
        recognizer.startRecognition(true);
    }
    
    public void startDetection() {
        new Thread(() -> {
            while (true) {
                String result = recognizer.getResult().getHypothesis();
                if ("小助手".equals(result)) { // 自定义唤醒词
                    wakeWordDetected = true;
                    System.out.println("唤醒词已检测到，开始聆听指令...");
                    // 重置标志等待下次唤醒
                    try {
                        Thread.sleep(5000); // 5秒内不再检测唤醒词
                    } catch (InterruptedException e) {
                        Thread.currentThread().interrupt();
                    }
                    wakeWordDetected = false;
                }
            }
        }).start();
    }
    
    public boolean isWakeWordDetected() {
        return wakeWordDetected;
    }
}

总结与展望

本文系统介绍了基于langchain4j构建语音交互系统的完整方案，从音频采集、模型调用到工具执行，覆盖了开发全流程。通过合理利用langchain4j的抽象封装，可以大幅降低语音交互功能的实现复杂度，同时保持系统的可扩展性和可维护性。

随着本地大语言模型的发展，未来语音交互将向"端云协同"方向演进：轻量级模型在设备端处理实时语音转录和唤醒词检测，复杂的意图理解和工具调用则在云端完成，实现低延迟与高性能的平衡。langchain4j作为Java生态中领先的LLM应用开发框架，必将在这一领域发挥越来越重要的作用。

最后，为确保系统稳定运行，建议定期关注langchain4j的版本更新，特别是OpenAI API客户端和音频处理模块的变化，及时修复潜在的兼容性问题。

收藏与互动

如果本文对你有帮助，请点赞、收藏、关注三连支持！下一篇我们将深入探讨"langchain4j多模态交互：融合语音、图像与文本的智能应用开发"，敬请期待。

有任何问题或建议，欢迎在评论区留言讨论，我会尽快回复。也欢迎分享你的语音交互应用案例，一起交流学习！

智能体开发者社区

中国智能体开发者社区，聚焦智能体与大模型开发，提供前沿资讯、实用工具链、开源项目及行业案例。通过技术沙龙、开发者大赛等活动，促进经验交流与协作，助力开发者快速构建创新智能应用。

更多推荐

OpenClaw 本地部署完整指南（Windows + Ollama）

本文档基于实际部署经验编写，旨在帮助你在 Windows 系统上从零开始搭建 OpenClaw，并连接本地 Ollama 模型（如 Qwen2.5 或 Qwen3），使其具备完整的智能体能力。文档包含了所有关键步骤以及常见问题的解决方案。

智能体开发者社区

OpenClaw 小白安装指南（Windows版）

（类似一个能自动执行任务的AI机器人），不是游戏。API Key只保存在你本地电脑的加密文件里，不会上传到任何地方。访问：https://github.com/miaoxworld/openclaw-manager/releases。: 一键安装脚本会自动安装Node.js 22+，如果失败，手动下载安装：https://nodejs.org/：在PowerShell中，鼠标右键就是粘贴，不需要按

智能体开发者社区

飞书 × OpenClaw 接入指南：不用服务器，用长连接把机器人跑起来

这个项目存在的意义，就是把“飞书接 OpenClaw”这件事，整理成一套的配置入口，并把官方文档没覆盖到的坑集中写成排查清单。先说清楚它的角色：OpenClaw 现在已经内置官方飞书插件 @openclaw/feishu，功能更完整、维护也更及时。，说明飞书 + AI 的接入已经走通。另外，仓库也推荐了一个新项目：把 OpenClaw 变成“多 Agent 团队”，用多个 Agent 分工，Sla