前言

huggingface给出的快速构建agent的框架:https://github.com/huggingface/smolagents

用几行代码运行强大的Agents

  • 简单:代理逻辑大约需要 1,000 行代码, ref:https://github.com/huggingface/smolagents/blob/main/src/smolagents/agents.py
  • 提供了CodeAgents:这个agent不是专门帮你写代码的agent,是以写代码和运行代码的形式来达到目的
  • Huggingface hub的集成:可以把写的tool进行push or pull
  • 支持模型:OpenAI、Anthropic 和许多其他模型的任何模型,也可以是本地的transformers或ollama模型
  • 支持tools:使用LangChain 、 Anthropic 的 MCP的工具,甚至可以使用Hub Space作为工具

文档:https://huggingface.co/docs/smolagents/index

一、支持的模型

1.1 Hugging Face

from smolagents import HfApiModel

model = HfApiModel(
    model_id="deepseek-ai/DeepSeek-R1",
    provider="together",
)

1.2 LiteLLMModel

from smolagents import LiteLLMModel

model = LiteLLMModel(
    model_id="anthropic/claude-3-5-sonnet-latest",
    temperature=0.2,
    api_key=os.environ["ANTHROPIC_API_KEY"]
)

1.3 OpenAI

import osfrom smolagents import OpenAIServerModel

model = OpenAIServerModel(
    model_id="deepseek-ai/DeepSeek-R1",
    api_base="https://api.together.xyz/v1/", # Leave this blank to query OpenAI servers.api_key=os.environ["TOGETHER_API_KEY"], # Switch to the API key for the server you're targeting.
)

1.4 本地transformers

from smolagents import TransformersModel

model = TransformersModel(
    model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
    max_new_tokens=4096,
    device_map="auto"
)

1.5 Azure Model

import osfrom smolagents import AzureOpenAIServerModel


model = AzureOpenAIServerModel(
    model_id = os.environ.get("AZURE_OPENAI_MODEL"),
    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    api_version=os.environ.get("OPENAI_API_VERSION")    
)

二、框架

2.1 运行方式


loop的过程基本上没什么其他的不一样的地方,也存在memory,以及ReAct Agent框架, 唯一不一样的地方,就是模型输出extract code action,并进行编写代码和执行代码来完成目的

2.2 构建Agent

无非以下几部分: model,tool,memory,history等,最简单的就只需要model和tool

model就不赘述了,上面支持的模型说过了, tool的定义和langchain的方法基本上一样的方式,有两种方法

  1. 直接装饰器
  2. 做继承

langchain的tool的使用方式可以看:https://zhuanlan.zhihu.com/p/714150769

2.2.1 tool的定义

  • name 工具名字
  • description 工具描述
  • Input types 输入参数类型和说明
  • output type 输出类型

写一个模型下载的tool

from smolagents import tool

@tool
def model_download_tool(task: str) -> str:
    """
    This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub.
    It returns the name of the checkpoint.

    Args:
        task: The task for which to get the download count.
    """
    most_downloaded_model = next(iter(list_models(filter=task, sort="downloads", direction=-1)))
    return most_downloaded_model.id

一定要存在 类型的 定义以及 注释,注释中要有 Args

主要用继承的方式

from smolagents import Tool

class ModelDownloadTool(Tool):
    name = "model_download_tool"
    description = "This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub. It returns the name of the checkpoint."
    inputs = {"task": {"type": "string", "description": "The task for which to get the download count."}}
    output_type = "string"def forward(self, task: str) -> str:
        most_downloaded_model = next(iter(list_models(filter=task, sort="downloads", direction=-1)))
        return most_downloaded_model.id

2.2.2 可以借鉴的tool

已经有三个写好的默认tool

  • DuckDuckGo 网络搜索:使用 DuckDuckGo 浏览器执行网络搜索。
  • Python code interpreter
  • Transcriber:基于 Whisper-Turbo 将音频转录为文本

这三个tool可以借鉴, 最近默认的tool还在添加,包含常见的google search等等

"PythonInterpreterTool",
"FinalAnswerTool",
"UserInputTool",
"DuckDuckGoSearchTool",
"GoogleSearchTool",
"VisitWebpageTool",
"SpeechToTextTool",

链接:https://github.com/huggingface/smolagents/blob/main/src/smolagents/default_tools.py

2.2.3 Agent例子

https://github.com/huggingface/smolagents/tree/main/examples

例如 用户问问题,执行SQL得到结果

from sqlalchemy import (
    Column,
    Float,
    Integer,
    MetaData,
    String,
    Table,
    create_engine,
    insert,
    inspect,
    text,
)


engine = create_engine("sqlite:///:memory:")
metadata_obj = MetaData()

# create city SQL table
table_name = "receipts"
receipts = Table(
    table_name,
    metadata_obj,
    Column("receipt_id", Integer, primary_key=True),
    Column("customer_name", String(16), primary_key=True),
    Column("price", Float),
    Column("tip", Float),
)
metadata_obj.create_all(engine)

rows = [
    {"receipt_id": 1, "customer_name": "Alan Payne", "price": 12.06, "tip": 1.20},
    {"receipt_id": 2, "customer_name": "Alex Mason", "price": 23.86, "tip": 0.24},
    {"receipt_id": 3, "customer_name": "Woodrow Wilson", "price": 53.43, "tip": 5.43},
    {"receipt_id": 4, "customer_name": "Margaret James", "price": 21.11, "tip": 1.00},
]
for row in rows:
    stmt = insert(receipts).values(**row)
    with engine.begin() as connection:
        cursor = connection.execute(stmt)

inspector = inspect(engine)
columns_info = [(col["name"], col["type"]) for col in inspector.get_columns("receipts")]

table_description = "Columns:\n" + "\n".join([f"  - {name}: {col_type}" for name, col_type in columns_info])
print(table_description)

from smolagents import tool


@tool
def sql_engine(query: str) -> str:
    """
    Allows you to perform SQL queries on the table. Returns a string representation of the result.
    The table is named 'receipts'. Its description is as follows:
        Columns:
        - receipt_id: INTEGER
        - customer_name: VARCHAR(16)
        - price: FLOAT
        - tip: FLOAT

    Args:
        query: The query to perform. This should be correct SQL.
    """
    output = ""
    with engine.connect() as con:
        rows = con.execute(text(query))
        for row in rows:
            output += "\n" + str(row)
    return output


from smolagents import CodeAgent, HfApiModel


agent = CodeAgent(
    tools=[sql_engine],
    model=HfApiModel(model_id="meta-llama/Meta-Llama-3.1-8B-Instruct"),
)
agent.run("Can you give me the name of the client who got the most expensive receipt?")

2.3 自定义模型

比如创建 VLLM模型, 继承模型 + 重新实现 “call


class VLLMModel(Model):
    """Model to use [vLLM](https://docs.vllm.ai/) for fast LLM inference and serving.

    Parameters:
        model_id (`str`):
            The Hugging Face model ID to be used for inference.
            This can be a path or model identifier from the Hugging Face model hub.
    """

    def __init__(self, model_id, **kwargs):
        if not _is_package_available("vllm"):
            raise ModuleNotFoundError("Please install 'vllm' extra to use VLLMModel: `pip install 'smolagents[vllm]'`")

        from vllm import LLM
        from vllm.transformers_utils.tokenizer import get_tokenizer

        super().__init__(**kwargs)

        self.model_id = model_id
        self.model = LLM(model=model_id)
        self.tokenizer = get_tokenizer(model_id)
        self._is_vlm = False  # VLLMModel does not support vision models yet.

    def cleanup(self):
        import gc

        import torch
        from vllm.distributed.parallel_state import destroy_distributed_environment, destroy_model_parallel

        destroy_model_parallel()
        if self.model is not None:
            # taken from https://github.com/vllm-project/vllm/issues/1908#issuecomment-2076870351
            del self.model.llm_engine.model_executor.driver_worker
        self.model = None
        gc.collect()
        destroy_distributed_environment()
        torch.cuda.empty_cache()

    def __call__(
        self,
        messages: List[Dict[str, str]],
        stop_sequences: Optional[List[str]] = None,
        grammar: Optional[str] = None,
        tools_to_call_from: Optional[List[Tool]] = None,
        **kwargs,
    ) -> ChatMessage:
        from vllm import SamplingParams

        completion_kwargs = self._prepare_completion_kwargs(
            messages=messages,
            flatten_messages_as_text=(not self._is_vlm),
            stop_sequences=stop_sequences,
            grammar=grammar,
            tools_to_call_from=tools_to_call_from,
            **kwargs,
        )
        messages = completion_kwargs.pop("messages")
        prepared_stop_sequences = completion_kwargs.pop("stop", [])
        tools = completion_kwargs.pop("tools", None)
        completion_kwargs.pop("tool_choice", None)

        if tools_to_call_from is not None:
            prompt = self.tokenizer.apply_chat_template(
                messages,
                tools=tools,
                add_generation_prompt=True,
                tokenize=False,
            )
        else:
            prompt = self.tokenizer.apply_chat_template(
                messages,
                tokenize=False,
            )

        sampling_params = SamplingParams(
            n=kwargs.get("n", 1),
            temperature=kwargs.get("temperature", 0.0),
            max_tokens=kwargs.get("max_tokens", 2048),
            stop=prepared_stop_sequences,
        )

        out = self.model.generate(
            prompt,
            sampling_params=sampling_params,
        )
        output_text = out[0].outputs[0].text
        self.last_input_token_count = len(out[0].prompt_token_ids)
        self.last_output_token_count = len(out[0].outputs[0].token_ids)
        chat_message = ChatMessage(
            role=MessageRole.ASSISTANT,
            content=output_text,
            raw={"out": output_text, "completion_kwargs": completion_kwargs},
        )
        if tools_to_call_from:
            chat_message.tool_calls = [
                get_tool_call_from_text(output_text, self.tool_name_key, self.tool_arguments_key)
            ]
        return chat_message

2.4 如何构建更好的Agent

2.4.1 差的tool定义例子

import datetime
from smolagents import tool

def get_weather_report_at_coordinates(coordinates, date_time):
    # Dummy function, returns a list of [temperature in °C, risk of rain on a scale 0-1, wave height in m]
    return [28.0, 0.35, 0.85]

def convert_location_to_coordinates(location):
    # Returns dummy coordinates
    return [3.3, -42.0]

@tool
def get_weather_api(location: str, date_time: str) -> str:
    """
    Returns the weather report.

    Args:
        location: the name of the place that you want the weather for.
        date_time: the date and time for which you want the report.
    """
    lon, lat = convert_location_to_coordinates(location)
    date_time = datetime.strptime(date_time)
    return str(get_weather_report_at_coordinates((lon, lat), date_time))

2.4.2 好的tool定义例子

@tool
def get_weather_api(location: str, date_time: str) -> str:
    """
    Returns the weather report.

    Args:
        location: the name of the place that you want the weather for. Should be a place name, followed by possibly a city name, then a country, like "Anchor Point, Taghazout, Morocco".
        date_time: the date and time for which you want the report, formatted as '%m/%d/%y %H:%M:%S'.
    """
    lon, lat = convert_location_to_coordinates(location)
    try:
        date_time = datetime.strptime(date_time)
    except Exception as e:
        raise ValueError("Conversion of `date_time` to datetime format failed, make sure to provide a string in format '%m/%d/%y %H:%M:%S'. Full trace:" + str(e))
    temperature_celsius, risk_of_rain, wave_height = get_weather_report_at_coordinates((lon, lat), date_time)
    return f"Weather report for {location}, {date_time}: Temperature will be {temperature_celsius}°C, risk of rain is {risk_of_rain*100:.0f}%, wave height is {wave_height}m."
  1. 对于输入格式更加的明确且细致
  2. 对于外部包运用,加入了exception捕获
  3. 输出加了很多语义的表达,更好让大模型理解

2.4.3 可以来控制思考的轮数

连续思考几轮,不进行tool的使用

from smolagents import load_tool, CodeAgent, HfApiModel, DuckDuckGoSearchTool
from dotenv import load_dotenv

load_dotenv()

# Import tool from Hub
image_generation_tool = load_tool("m-ric/text-to-image", trust_remote_code=True)

search_tool = DuckDuckGoSearchTool()

agent = CodeAgent(
    tools=[search_tool, image_generation_tool],
    model=HfApiModel(model_id="Qwen/Qwen2.5-72B-Instruct"),
    planning_interval=3 # This is where you activate planning!
)

# Run it!
result = agent.run(
    "How long would a cheetah at full speed take to run the length of Pont Alexandre III?",
)

2.4.4 其他

其他的一些方式,主要是教你怎么debug你的agent,有一些启发,可以看看:https://huggingface.co/docs/smolagents/tutorials/building_good_agents

其中的sys prompt用的是模版的写法,可以看:https://zhuanlan.zhihu.com/p/710177783

三、和GradioUI适配

from smolagents import CodeAgent, GradioUI, HfApiModel


agent = CodeAgent(
    tools=[],
    model=HfApiModel(),
    max_steps=4,
    verbosity_level=1,
    name="example_agent",
    description="This is an example agent that has no tools and uses only code.",
)

GradioUI(agent, file_upload_folder="./data").launch()
Logo

火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。

更多推荐