1 相关软件地址

2 硬件环境

● 16vCPU(虚拟中央处理单元)

● 60G内存

● 80G硬盘

● 24G显存A10

3 软件环境

ubuntu v24

4 安装软件要求

MiniConda3

nvdia A10显卡驱动:12.4.0

cuda:12.4.0

cudnn 9.6.0

5 软件安装过程

5.1 查看/修改安装源

# 查看当前软件源
cat /etc/apt/sources.list

# 如果没有阿里云的镜像源,添加上。
# 阿里云的软件源是:
deb https://mirrors.aliyun.com/ubuntu/ noble main restricted universe multiverse
deb-src https://mirrors.aliyun.com/ubuntu/ noble main restricted universe multiverse

deb https://mirrors.aliyun.com/ubuntu/ noble-security main restricted universe multiverse
deb-src https://mirrors.aliyun.com/ubuntu/ noble-security main restricted universe multiverse

deb https://mirrors.aliyun.com/ubuntu/ noble-updates main restricted universe multiverse
deb-src https://mirrors.aliyun.com/ubuntu/ noble-updates main restricted universe multiverse

# deb https://mirrors.aliyun.com/ubuntu/ noble-proposed main restricted universe multiverse
# deb-src https://mirrors.aliyun.com/ubuntu/ noble-proposed main restricted universe multiverse

deb https://mirrors.aliyun.com/ubuntu/ noble-backports main restricted universe multiverse
deb-src https://mirrors.aliyun.com/ubuntu/ noble-backports main restricted universe multiverse

5.2 安装miniconda3(Python虚拟环境)

使用MiniConda3安装并管理Python,便于版本依赖环境隔离。

  1. 下载安装脚本
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
  1. 执行安装脚本
chmod +x Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

安装截图:

执行安装脚本后,一直按回车,直到出现输入yes or no的选项,输入yes

是否更改默认路径,如果需要更改则输入新的安装路径后回车。

是否要在安装过程中执行初始化,默认是no,这里输入yes。

安装成功后输入source ~/.bashrc更新环境变量;

输入conda验证是否安装成功。

5.3 安装显卡驱动

  • 官网下载驱动

https://www.nvidia.cn/drivers/lookup/

对应显卡,依次选择筛选条件,版本选择为12.4,设置好条件后,点击“查找”,

点击“查看”

点击“下载”按钮,下载显卡驱动;或者,右键点击下载按钮,复制下载链接,之后在服务器上使用wget命令下载。例如:

wget https://cn.download.nvidia.com/tesla/550.127.08/NVIDIA-Linux-x86_64-550.127.08.run
  • 确保系统中已安装gcc-12和make环境
sudo apt update
sudo apt install gcc-12 make
  • 装显卡驱动
chmod +x NVIDIA-Linux-x86_64-550.127.08.run
sudo ./NVIDIA-Linux-x86_64-550.127.08.run

选择“Continue installation”,回车;之后的交互,全部选“ok”。

  • 验证

输入“nvidia-smi”查看输出

5.4 安装CUDA

  1. 下载

https://developer.nvidia.com/cuda-toolkit-archive

选择12.4.0版本下载

依次点击选项卡,设置筛选条件,筛选出所需安装包之后,页面下方即可显示CUDA的下载、安装命令,如:

wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
sudo sh cuda_12.4.0_550.54.14_linux.run
  1. 安装

执行上述命令,安装CUDA,安装截图如下:

输入accept,回车

取消“Driver”选项,因为已经安装过显卡驱动了, 这里不需要安装,然后选择“Install”。

  1. 配置环境变量

执行“vim ~/.bashrc”命令,添加如下环境变量:

export CUDA_HOME=/usr/local/cuda-12.4
export PATH=$PATH:$CUDA_HOME/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64 

保存后,执行“source ~/.bashrc”命令,更新环境变量。

  1. 验证
nvcc -V

输出:

5.5 安装CUDNN

https://developer.nvidia.com/rdp/cudnn-download

依次点击选项卡,设置筛选条件,选择所需版本后,页面下方会生成安装命令。如:

wget https://developer.download.nvidia.com/compute/cudnn/9.6.0/local_installers/cudnn-local-repo-ubuntu2404-9.6.0_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2404-9.6.0_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2404-9.6.0/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn

按上述命令,安装即可。

**注意:**需要安装对应CUDA版本的CUDNN。下面链接可查看。

https://docs.nvidia.com/deeplearning/cudnn/latest/reference/support-matrix.html#support-matrix

CUDNN历史版本下载

https://developer.nvidia.com/cudnn-archive

5.6 模型安装

在服务器上,创建模型安装目录,并下载模型文件,命令如下:

mkdir -p /xcloud/qwen2-vl-7b/model

cd /xcloud/qwen2-vl-7b/model

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/model-00001-of-00005.safetensors

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/model-00002-of-00005.safetensors

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/model-00003-of-00005.safetensors

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/model-00004-of-00005.safetensors

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/model-00005-of-00005.safetensors

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/chat_template.json

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/config.json

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/configuration.json

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/generation_config.json

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/LICENSE

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/merges.txt

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/model.safetensors.index.json

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/preprocessor_config.json

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/README.md

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/tokenizer.json

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/tokenizer_config.json

wget https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct/resolve/master/vocab.json

安装编译软件所需的工具和CMake构建系统,命令如下:

apt-get install build-essential cmake

创建并激活虚拟环境,命令如下:

# 创建名为 qwen2-vl-7b 的虚拟环境,并指定python版本
conda create -n qwen2-vl-7b python=3.11 
# 查看所有的虚拟环境
conda env list
# 激活虚拟环境 qwen2-vl-7b
conda activate qwen2-vl-7b

下载依赖,命令如下:

pip install transformers
pip install modelscope
pip install qwen-vl-utils
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu124
pip install accelerate==0.26.0
pip install ninja
pip install flash-attn -i https://mirrors.aliyun.com/pypi/simple
创建模型启动脚本

在/xcloud/qwen2-vl-7b目录中创建qwen2-vl-7b.py文件,写入以下代码:

import time

from modelscope import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
from modelscope import snapshot_download
import torch
model_dir = "/xcloud/qwen2-vl-7b/model"

# default: Load the model on the available device(s)
# model = Qwen2VLForConditionalGeneration.from_pretrained(
#     model_dir, torch_dtype="auto", device_map="auto"
# )

# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
model = Qwen2VLForConditionalGeneration.from_pretrained(
    model_dir,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="auto",
)

# default processer
# processor = AutoProcessor.from_pretrained(model_dir)

# The default range for the number of visual tokens per image in the model is 4-16384. You can set min_pixels and max_pixels according to your needs, such as a token count range of 256-1280, to balance speed and memory usage.
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_dir, min_pixels=min_pixels, max_pixels=max_pixels)

while True:
    path = input("输入图片路径:\n")
    start = time.time()

    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "image": path,
                },
                {"type": "text", "text": "请格式化提取这张图片的内容,直接回答,不需要多余的回答。"},
            ],
        }
    ]

    # Preparation for inference
    text = processor.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    image_inputs, video_inputs = process_vision_info(messages)
    inputs = processor(
        text=[text],
        images=image_inputs,
        videos=video_inputs,
        padding=True,
        return_tensors="pt",
    )
    inputs = inputs.to("cuda")

    # Inference: Generation of the output
    generated_ids = model.generate(**inputs, max_new_tokens=8192)
    generated_ids_trimmed = [
        out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
    ]
    output_text = processor.batch_decode(
        generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )
    end = time.time()
    print(f"共耗时 {end - start}s")
    print("识别结果:")
    print(output_text)
启动模型服务
cd /xcloud/qwen2-vl-7b
python qwen2_vl-7b.py

验证:准备一张写有文字的图片,将其上传到/xcloud目录下,然后在模型服务的命令中,输入图片路径,对图片进行识别,操作截图如下:

Logo

火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。

更多推荐