指南:通过 Litefuse 监控 Hugging Face 模型

本指南将向你介绍如何通过与 Litefuse 的 OpenAI SDK 集成来监控 Hugging Face 模型。这能让你协作地调试、监控和评估 LLM 应用。

借助这种集成,你可以测试和评估不同模型,监控应用的成本,并赋予例如用户反馈或人工标注等评分。

ℹ️

注意本示例使用 OpenAI SDK 访问 Hugging Face inference API。你也可以使用其他框架,比如 Langchain,或通过我们的 API 摄取数据。

设置

安装所需依赖

%pip install langfuse openai --upgrade

设置环境变量

设置好包含必要密钥的环境变量。在 Litefuse Cloud 获取你 Litefuse 项目的密钥。同时在 Hugging Face 获取 access token。

import os
 
# Get keys for your project from https://litefuse.cloud
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..." # Private Project
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..." # Private Project
os.environ["LANGFUSE_BASE_URL"] = "https://litefuse.cloud"
 
os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "hf_..."

导入所需模块

不要直接导入 openai,而是从 langfuse.openai 导入。同时导入其他需要的模块。

# Instead of: import openai
from langfuse.openai import OpenAI
from langfuse import observe

为 Hugging Face 模型初始化 OpenAI 客户端

初始化 OpenAI 客户端,但将其指向 Hugging Face 模型端点。你可以使用任何托管在 Hugging Face 上、支持 OpenAI API 格式的模型。请用你自己的模型 URL 和 access token 替换。

本示例使用 Meta-Llama-3-8B-Instruct 模型。

# Initialize the OpenAI client, pointing it to the Hugging Face Inference API
client = OpenAI(
    base_url="https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct" + "/v1/",  # replace with your endpoint url
    api_key= os.getenv('HUGGINGFACE_ACCESS_TOKEN'),  # replace with your token
)

示例

Chat completion 请求

使用 client 向 Hugging Face 模型发起一次 chat completion 请求。model 参数可以是任何标识符,因为实际模型由 base_url 决定。本示例将 model 变量设为 tgi,即 Text Generation Inference 的缩写。

completion = client.chat.completions.create(
    model="model-name",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Write a poem about language models"
        }
    ]
)
print(completion.choices[0].message.content)

Litefuse 中的示例 trace

Litefuse 中的示例 trace

通过 Litefuse 观测请求

通过使用 langfuse.openai 中的 OpenAI 客户端,你的请求会被自动追踪到 Litefuse。你也可以使用 @observe() 装饰器把多个生成步骤组织到同一个 trace 中。

@observe()  # Decorator to automatically create a trace and nest generations
def generate_rap():
    completion = client.chat.completions.create(
        name="rap-generator",
        model="tgi",
        messages=[
            {"role": "system", "content": "You are a poet."},
            {"role": "user", "content": "Compose a rap about the open source AI agent observability and evaluation platform Litefuse."}
        ],
        metadata={"category": "rap"},
    )
    return completion.choices[0].message.content
 
rap = generate_rap()
print(rap)

Litefuse 中的示例 trace

Litefuse 中的示例 trace

Interoperability with the Python SDK

You can use this integration together with the Litefuse SDKs to add additional attributes to the observation.

The @observe() decorator provides a convenient way to automatically wrap your instrumented code and add additional attributes to the observation.

from langfuse import observe, propagate_attributes, get_client
 
langfuse = get_client()
 
@observe()
def my_llm_pipeline(input):
    # Add additional attributes (user_id, session_id, metadata, version, tags) to all spans created within this execution scope
    with propagate_attributes(
        user_id="user_123",
        session_id="session_abc",
        tags=["agent", "my-observation"],
        metadata={"email": "user@litefuse.ai"},
        version="1.0.0"
    ):
 
        # YOUR APPLICATION CODE HERE
        result = call_llm(input)
 
        return result
 
# Run the function
my_llm_pipeline("Hi")

Learn more about using the Decorator in the Langfuse SDK instrumentation docs.

Troubleshooting

No observations appearing

First, enable debug mode in the Python SDK:

export LANGFUSE_DEBUG="True"

Then run your application and check the debug logs:

  • OTel observations appear in the logs: Your application is instrumented correctly but observations are not reaching Litefuse. To resolve this:
    1. Call langfuse.flush() at the end of your application to ensure all observations are exported.
    2. Verify that you are using the correct API keys and base URL.
  • No OTel spans in the logs: Your application is not instrumented correctly. Make sure the instrumentation runs before your application code.
Unwanted observations in Litefuse

The Langfuse SDK is based on OpenTelemetry. Other libraries in your application may emit OTel spans that are not relevant to you. These still count toward your billable units, so you should filter them out. See Unwanted spans in Litefuse for details.

Missing attributes

Some attributes may be stored in the metadata object of the observation rather than being mapped to the Litefuse data model. If a mapping or integration does not work as expected, please raise an issue on GitHub.

Next Steps

Once you have instrumented your code, you can manage, evaluate and debug your application:

这个页面对你有帮助吗?