1. 开通阿里云PAI

https://pai.console.aliyun.com/#/quick-start/models

问题:

BladeLLM与vLLM区别是什么?

选择标准部署。

服务脚本:

{
    "cloud": {
        "computing": {
            "instances": [
                {
                    "type": "ecs.gn7i-c16g1.4xlarge"
                }
            ]
        }
    },
    "metadata": {
        "instance": 1,
        "rpc": {
            "keepalive": 9000000,
            "worker_threads": 1
        },
        "enable_webservice": true,
        "name": "quickstart_20250205_sfmn"
    },
    "containers": [
        {
            "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.6",
            "port": 8000,
            "script": "python webui/webui_server.py --model-path=/model_dir/ --model-type=qwen2"
        }
    ],
    "storage": [
        {
            "mount_path": "/model_dir/",
            "properties": {
                "resource_type": "model",
                "resource_use": "base"
            },
            "oss": {
                "path": "oss://pai-quickstart-cn-hangzhou/modelscope/models/DeepSeek-R1-Distill-Qwen-7B/",
                "endpoint": "oss-cn-hangzhou-internal.aliyuncs.com"
            }
        }
    ],
    "SupportedInstanceTypes": [
        "ecs.gn7i-c16g1.4xlarge",
        "ecs.gn7i-c32g1.16xlarge",
        "ecs.gn7i-c32g1.32xlarge",
        "ecs.gn7i-c32g1.8xlarge",
        "ecs.gn7i-c8g1.2xlarge",
        "ecs.gn7i-c8g1.2xlarge.limit",
        "ecs.gn8is-2x.8xlarge",
        "ecs.gn8is-4x.16xlarge",
        "ecs.gn8is-8x.32xlarge",
        "ecs.gn8is.2xlarge",
        "ecs.gn8is.4xlarge",
        "ecs.gn8v.6xlarge",
        "ecs.gn8v-2x.12xlarge",
        "ecs.gn8v-4x.24xlarge",
        "ecs.gn8v-8x.48xlarge",
        "ml.gu7i.c128m752.4-gu30",
        "ml.gu7i.c16m60.1-gu30",
        "ml.gu7i.c32m188.1-gu30",
        "ml.gu7i.c64m376.2-gu30",
        "ml.gu7i.c8m30.1-gu30",
        "ml.gu8is.c128m1024.8-gu60",
        "ml.gu8is.c16m128.1-gu60",
        "ml.gu8is.c32m256.2-gu60",
        "ml.gu8is.c64m512.4-gu60",
        "ml.gu8v.c192m1024.8-gu120",
        "ml.gu8v.c24m128.1-gu120",
        "ml.gu8v.c48m256.2-gu120",
        "ml.gu8v.c96m512.4-gu120"
    ]
}


监控:

Stream output 流式输出会陆续显示新的内容,而不是让enduser等十几秒再一起显示。

参考:3步,0代码!一键部署DeepSeek-V3、DeepSeek-R1

Logo

火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。

更多推荐