010-Spring AI Alibaba Audio 功能完整案例

本案例将演示如何利用 Spring AI Alibaba 的音频处理功能，实现语音识别（语音转文本）和语音合成（文本转语音）。

rengang66

1499人浏览 · 2025-10-27 07:22:24

rengang66 · 2025-10-27 07:22:24 发布

本案例将引导您一步步构建一个 Spring Boot 应用，演示如何利用 Spring AI Alibaba 的音频处理功能，实现语音识别（语音转文本）和语音合成（文本转语音）。

1. 案例目标

我们将创建一个包含两个核心功能的 Web 应用：

语音识别（Speech-to-Text, STT）：将音频文件转换为文本，支持同步、流式和异步三种处理方式。
语音合成（Text-to-Speech, TTS）：将文本转换为音频文件，支持同步和流式两种处理方式。

2. 技术栈与核心依赖

Spring Boot 3.x
Spring AI Alibaba（用于对接阿里云 DashScope 通义大模型）
Maven（项目构建工具）

在 pom.xml 中，你需要引入以下核心依赖：

<dependencies>
    <!-- Spring AI Alibaba 核心启动器，集成 DashScope -->
    <dependency>
        <groupId>com.alibaba.cloud.ai</groupId>
        <artifactId>spring-ai-alibaba-starter-dashscope</artifactId>
        <version>1.0.0-M2</version> <!-- 请使用最新版本 -->
    </dependency>
    <!-- Spring Web 用于构建 RESTful API -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <!-- Apache Commons IO 用于文件操作 -->
    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.18.0</version>
    </dependency>
</dependencies>
<!-- 添加 Spring Boot 和 Spring Cloud 的版本管理 -->
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-dependencies</artifactId>
            <version>3.3.1</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
        <dependency>
            <groupId>com.alibaba.cloud</groupId>
            <artifactId>spring-cloud-alibaba-dependencies</artifactId>
            <version>2023.0.1.2</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

3. 项目配置

在 src/main/resources/application.yml 文件中，配置你的 DashScope API Key。

server:
  port: 10009

spring:
  application:
    name: spring-ai-alibaba-audio-example-application

  ai:
    dashscope:
        api-key: ${AI_DASHSCOPE_API_KEY}

重要提示：请将 AI_DASHSCOPE_API_KEY 环境变量设置为你从阿里云获取的有效 API Key。你也可以直接将其写在配置文件中，但这不推荐用于生产环境。

4. 项目结构

创建以下项目结构：

spring-ai-alibaba-audio-example/
├── dashscope-audio/
│ ├── src/
│ │ ├── main/
│ │ │ ├── java/
│ │ │ │ └── com/
│ │ │ │ └── alibaba/
│ │ │ │ └── cloud/
│ │ │ │ └── ai/
│ │ │ │ └── example/
│ │ │ │ └── audio/
│ │ │ │ ├── AudioSpeechController.java
│ │ │ │ ├── AudioTranscriptionController.java
│ │ │ │ └── DashScopeAudioApplication.java
│ │ │ └── resources/
│ │ │ ├── application.yml
│ │ │ └── hello_world_male_16k_16bit_mono.wav
│ ├── pom.xml
│ ├── README.md
│ └── dashscope-audio.http
└── pom.xml

5. 编写 Java 代码

5.1 主应用类

创建主应用类 DashScopeAudioApplication.java：

package com.alibaba.cloud.ai.example.audio;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

/**
 * @author yuluo
 * @author <a href="mailto:yuluo08290126@gmail.com">yuluo</a>
 */

@SpringBootApplication
public class DashScopeAudioApplication {

    public static void main(String[] args) {
        SpringApplication.run(DashScopeAudioApplication.class, args);
    }
}

5.2 语音识别控制器

创建语音识别控制器 AudioTranscriptionController.java，实现三种语音识别方式：

package com.alibaba.cloud.ai.example.audio;

import com.alibaba.cloud.ai.dashscope.api.DashScopeAudioTranscriptionApi;
import com.alibaba.cloud.ai.dashscope.audio.DashScopeAudioTranscriptionOptions;
import com.alibaba.cloud.ai.dashscope.audio.transcription.AudioTranscriptionModel;
import com.alibaba.cloud.ai.dashscope.common.DashScopeException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.audio.transcription.AudioTranscriptionPrompt;
import org.springframework.ai.audio.transcription.AudioTranscriptionResponse;
import org.springframework.core.io.FileSystemResource;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Flux;

import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Objects;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

/**
 * 语音转文本（语音合成）
 * @author yuluo
 * @author <a href="mailto:yuluo08290126@gmail.com">yuluo</a>
 */

@RestController
@RequestMapping("/ai/transcription")
public class AudioTranscriptionController {

    private final AudioTranscriptionModel transcriptionModel;

    private static final Logger log = LoggerFactory.getLogger(AudioTranscriptionController.class);

    // 模型列表：https://help.aliyun.com/zh/model-studio/sambert-websocket-api
    private static final String DEFAULT_MODEL = DashScopeAudioTranscriptionApi.AudioTranscriptionModel.PARAFORMER_REALTIME_V2.getValue();

    private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);

    public AudioTranscriptionController(AudioTranscriptionModel transcriptionModel) {
        this.transcriptionModel = transcriptionModel;
    }

    /**
     * 同步语音识别
     */
    @GetMapping
    public String stt() {
        String currentDir = System.getProperty("user.dir");
        Path filePath = Paths.get(currentDir, "hello_world_male_16k_16bit_mono.wav");

        AudioTranscriptionResponse response = transcriptionModel.call(
                new AudioTranscriptionPrompt(
                        new FileSystemResource(filePath),
                        DashScopeAudioTranscriptionOptions.builder()
                                .withModel(DEFAULT_MODEL)
                                .build()
                )
        );

        return response.getResult().getOutput();
    }

    /**
     * 流式语音识别
     */
    @GetMapping("/stream")
    public String streamSTT() {
        String currentDir = System.getProperty("user.dir");
        Path filePath = Paths.get(currentDir, "spring-ai-alibaba-audio-example/dashscope-audio/src/main/resources/gen/tts/output.mp3");

        CountDownLatch latch = new CountDownLatch(1);
        StringBuilder stringBuilder = new StringBuilder();

        Flux<AudioTranscriptionResponse> response = transcriptionModel
                .stream(
                        new AudioTranscriptionPrompt(
                                new FileSystemResource(filePath),
                                DashScopeAudioTranscriptionOptions.builder()
                                        .withModel(DEFAULT_MODEL)
                                        .withSampleRate(16000)
                                        .withFormat(DashScopeAudioTranscriptionOptions.AudioFormat.PCM)
                                        .withDisfluencyRemovalEnabled(false)
                                        .build()
                        )
                );

        response.doFinally(
                signal -> latch.countDown()
        ).subscribe(
                resp -> stringBuilder.append(resp.getResult().getOutput())
        );

        try {
            latch.await();
        }
        catch (InterruptedException e) {
            throw new RuntimeException(e);
        }

        return stringBuilder.toString();
    }

    /**
     * 异步语音识别
     */
    @GetMapping("/async")
    public String asyncSTT() {
        StringBuilder stringBuilder = new StringBuilder();
        CountDownLatch latch = new CountDownLatch(1);

        String currentDir = System.getProperty("user.dir");
        Path filePath = Paths.get(currentDir, "spring-ai-alibaba-audio-example/dashscope-audio/src/main/resources/gen/tts/output-stream.mp3");

        try {
            AudioTranscriptionResponse submitResponse = transcriptionModel.asyncCall(
                    new AudioTranscriptionPrompt(
                            new FileSystemResource(filePath),
                            DashScopeAudioTranscriptionOptions.builder()
                                    .withModel(DEFAULT_MODEL)
                                    .build()
                    )
            );

            DashScopeAudioTranscriptionApi.Response.Output submitOutput = Objects.requireNonNull(submitResponse.getMetadata()
                    .get("output"));
            String taskId = submitOutput.taskId();

            scheduler.scheduleAtFixedRate(
                    () -> checkTaskStatus(taskId, stringBuilder, latch), 0, 1, TimeUnit.SECONDS);
            latch.await();

        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new DashScopeException("Thread was interrupted: " + e.getMessage());
        }
        finally {
            scheduler.shutdown();
        }

        return stringBuilder.toString();
    }

    private void checkTaskStatus(String taskId, StringBuilder stringBuilder, CountDownLatch latch) {
        try {
            AudioTranscriptionResponse fetchResponse = transcriptionModel.fetch(taskId);
            DashScopeAudioTranscriptionApi.Response.Output fetchOutput =
                    Objects.requireNonNull(fetchResponse.getMetadata().get("output"));
            DashScopeAudioTranscriptionApi.TaskStatus taskStatus = fetchOutput.taskStatus();

            if (taskStatus.equals(DashScopeAudioTranscriptionApi.TaskStatus.SUCCEEDED)) {
                stringBuilder.append(fetchResponse.getResult().getOutput());
                latch.countDown();
            }
            else if (taskStatus.equals(DashScopeAudioTranscriptionApi.TaskStatus.FAILED)) {
                log.warn("Transcription failed.");
                latch.countDown();
            }
        }
        catch (Exception e) {
            latch.countDown();
            throw new RuntimeException("Error occurred while checking task status: " + e.getMessage());
        }
    }
}

5.3 语音合成控制器

创建语音合成控制器 AudioSpeechController.java，实现两种语音合成方式：

package com.alibaba.cloud.ai.example.audio;

import com.alibaba.cloud.ai.dashscope.api.DashScopeAudioSpeechApi;
import com.alibaba.cloud.ai.dashscope.audio.DashScopeAudioSpeechOptions;
import com.alibaba.cloud.ai.dashscope.audio.synthesis.SpeechSynthesisModel;
import com.alibaba.cloud.ai.dashscope.audio.synthesis.SpeechSynthesisPrompt;
import com.alibaba.cloud.ai.dashscope.audio.synthesis.SpeechSynthesisResponse;
import jakarta.annotation.PreDestroy;
import org.apache.commons.io.FileUtils;
import org.springframework.boot.ApplicationArguments;
import org.springframework.boot.ApplicationRunner;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Flux;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.concurrent.CountDownLatch;

/**
 * 语音转文本（语音识别）
 * @author yuluo
 * @author <a href="mailto:yuluo08290126@gmail.com">yuluo</a>
 */

@RestController
@RequestMapping("/ai/speech")
public class AudioSpeechController implements ApplicationRunner {

    private final SpeechSynthesisModel speechSynthesisModel;

    private static final String TEXT = "白日依山尽，黄河入海流。这是测试";

    private static final String FILE_PATH = "spring-ai-alibaba-audio-example/dashscope-audio/src/main/resources/gen/tts";

    public AudioSpeechController(SpeechSynthesisModel speechSynthesisModel) {
        this.speechSynthesisModel = speechSynthesisModel;
    }

    /**
     * 同步语音合成
     */
    @GetMapping
    public void tts() throws IOException {
        SpeechSynthesisResponse response = speechSynthesisModel.call(
                new SpeechSynthesisPrompt(
                        TEXT,
                        DashScopeAudioSpeechOptions.builder()
                                .model(DashScopeAudioSpeechApi.AudioSpeechModel.SAM_BERT_ZHICHU_V1.getValue())
                                .build()
                        )
        );

        File file = new File(FILE_PATH + "/output.mp3");
        try (FileOutputStream fos = new FileOutputStream(file)) {
            ByteBuffer byteBuffer = response.getResult().getOutput().getAudio();
            fos.write(byteBuffer.array());
        }
        catch (IOException e) {
            throw new IOException(e.getMessage());
        }
    }

    /**
     * 流式语音合成
     */
    @GetMapping("/stream")
    public void streamTTS() {
        Flux<SpeechSynthesisResponse> response = speechSynthesisModel.stream(
                new SpeechSynthesisPrompt(
                        TEXT,
                        DashScopeAudioSpeechOptions.builder()
                                .model(DashScopeAudioSpeechApi.AudioSpeechModel.SAM_BERT_ZHITING_V1.getValue())
                                .build()
                )
        );

        CountDownLatch latch = new CountDownLatch(1);
        File file = new File(FILE_PATH + "/output-stream.mp3");
        try (FileOutputStream fos = new FileOutputStream(file)) {

            response.doFinally(
                    signal -> latch.countDown()
            ).subscribe(synthesisResponse -> {
                ByteBuffer byteBuffer = synthesisResponse.getResult().getOutput().getAudio();
                byte[] bytes = new byte[byteBuffer.remaining()];
                byteBuffer.get(bytes);
                try {
                    fos.write(bytes);
                }
                catch (IOException e) {
                    throw new RuntimeException(e);
                }
            });

            latch.await();
        }
        catch (IOException | InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

    @Override
    public void run(ApplicationArguments args) {
        File file = new File(FILE_PATH);
        if (!file.exists()) {
            file.mkdirs();
        }
    }

    @PreDestroy
    public void destroy() throws IOException {
        String example_file_path = "spring-ai-alibaba-audio-example/dashscope-audio/src/main/resources/gen/tts";
        FileUtils.deleteDirectory(new File(example_file_path));
    }
}

6. 运行与测试

1. 启动应用：运行你的 Spring Boot 主程序。

2. 使用浏览器或 API 工具（如 Postman, curl）进行测试。

测试 1：同步语音识别

访问以下 URL，将示例音频文件转换为文本：

GET http://127.0.0.1:10009/ai/transcription

预期响应：

你好世界，这是一个语音识别测试。

测试 2：流式语音识别

访问以下 URL，使用流式方式将音频文件转换为文本：

GET http://127.0.0.1:10009/ai/transcription/stream

预期响应：

白日依山尽，黄河入海流。这是测试

测试 3：异步语音识别

访问以下 URL，使用异步方式将音频文件转换为文本：

GET http://127.0.0.1:10009/ai/transcription/async

预期响应：

白日依山尽，黄河入海流。这是测试

测试 4：同步语音合成

访问以下 URL，将文本转换为音频文件：

GET http://127.0.0.1:10009/ai/speech

预期结果：

系统将在 spring-ai-alibaba-audio-example/dashscope-audio/src/main/resources/gen/tts/output.mp3 路径下生成一个 MP3 音频文件。

测试 5：流式语音合成

访问以下 URL，使用流式方式将文本转换为音频文件：

GET http://127.0.0.1:10009/ai/speech/stream

预期结果：

系统将在 spring-ai-alibaba-audio-example/dashscope-audio/src/main/resources/gen/tts/output-stream.mp3 路径下生成一个 MP3 音频文件。

7. 实现思路与扩展建议

实现思路

本案例的核心思想是"简化音频处理流程"。通过 Spring AI Alibaba 提供的音频处理接口，开发者可以轻松实现语音识别和语音合成功能，无需关心底层复杂的 API 调用和音频处理细节。这使得：

开发效率高：只需几行代码即可实现复杂的音频处理功能。
可扩展性强：支持同步、流式和异步三种处理方式，适应不同场景需求。
功能强大：基于阿里云 DashScope 通义大模型，提供高质量的语音识别和语音合成服务。

扩展建议

多语言支持：通过配置不同的模型参数，支持更多语言的语音识别和合成。
音频格式转换：集成音频格式转换功能，支持更多音频格式的输入和输出。
音频质量优化：根据应用场景调整音频参数，优化音频质量或处理速度。
集成实时通信：将语音识别和合成功能集成到实时通信应用中，实现语音聊天机器人。
语音命令控制：结合语音识别技术，实现语音命令控制系统，如智能家居控制。
语音助手开发：结合语音识别和语音合成，开发智能语音助手应用。

提示：此模块最新示例基于 spring ai alibaba 1.0.0.3，中央仓库未发布，请本地编译安装。

智能体开发者社区

中国智能体开发者社区，聚焦智能体与大模型开发，提供前沿资讯、实用工具链、开源项目及行业案例。通过技术沙龙、开发者大赛等活动，促进经验交流与协作，助力开发者快速构建创新智能应用。

更多推荐

OpenClaw 本地部署完整指南（Windows + Ollama）

本文档基于实际部署经验编写，旨在帮助你在 Windows 系统上从零开始搭建 OpenClaw，并连接本地 Ollama 模型（如 Qwen2.5 或 Qwen3），使其具备完整的智能体能力。文档包含了所有关键步骤以及常见问题的解决方案。

智能体开发者社区

OpenClaw 小白安装指南（Windows版）

（类似一个能自动执行任务的AI机器人），不是游戏。API Key只保存在你本地电脑的加密文件里，不会上传到任何地方。访问：https://github.com/miaoxworld/openclaw-manager/releases。: 一键安装脚本会自动安装Node.js 22+，如果失败，手动下载安装：https://nodejs.org/：在PowerShell中，鼠标右键就是粘贴，不需要按

智能体开发者社区

飞书 × OpenClaw 接入指南：不用服务器，用长连接把机器人跑起来

这个项目存在的意义，就是把“飞书接 OpenClaw”这件事，整理成一套的配置入口，并把官方文档没覆盖到的坑集中写成排查清单。先说清楚它的角色：OpenClaw 现在已经内置官方飞书插件 @openclaw/feishu，功能更完整、维护也更及时。，说明飞书 + AI 的接入已经走通。另外，仓库也推荐了一个新项目：把 OpenClaw 变成“多 Agent 团队”，用多个 Agent 分工，Sla