Cookbook: LiteLLM (Proxy) + Litefuse OpenAI 集成 + `@observe` 装饰器

我们想和大家分享 Litefuse 社区常用的一套技术栈，让你在不修改代码的前提下快速试用 100+ 个不同厂商的模型。这套技术栈包含：

LiteLLM Proxy（GitHub）：将 100+ 个模型厂商的 API 统一到 OpenAI API schema 之下。它通过单一 endpoint 集中处理这些 API 的调用，消除了直接调用各家 API 的复杂度。LiteLLM Proxy 是开源的，你也可以自托管。
Litefuse OpenAI SDK Wrapper（Python、JS）：通过 OpenAI SDK 原生地对这 100+ 个模型的调用进行 instrument。它会自动捕获 token 数、延迟、流式响应时间（首 token 时间）、API 错误等信息。
Litefuse：开源的 LLM 可观测性平台，完整介绍见这里。

本 cookbook 是搭建并使用这套技术栈的端到端指南。本例使用 Python，我们也会使用 @observe 装饰器来创建嵌套 trace。下文会详细说明。

让我们直接开始！

安装依赖

%pip install "litellm[proxy]" langfuse openai

配置环境

import os
 
# Get keys for your project from the project settings page: https://litefuse.cloud
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..." 
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..." 
os.environ["LANGFUSE_BASE_URL"] = "https://litefuse.cloud"
 
# Your openai key
os.environ["OPENAI_API_KEY"] = "sk-proj-"

from litellm import completion
from langfuse import observe, get_client
 
langfuse = get_client()
 
@observe()
def fn():
  # set custom langfuse trace params and generation params
  response = completion(
    model="gpt-3.5-turbo",
    messages=[
      {"role": "user", "content": "Hi 👋 - i'm openai"}
    ],
    metadata={
        "existing_trace_id": langfuse.get_current_trace_id(),   # set langfuse trace ID
        "parent_observation_id": langfuse.get_current_observation_id(),
    },
  )
 
  print(response)

设置 Lite LLM Proxy

在本例中，我们会直接通过 OpenAI 使用 GPT-3.5-turbo，并通过本机的 Ollama 使用 llama3 和 mistral。

步骤

创建 litellm_config.yaml 来配置可用的模型（文档）。本例使用 gpt-3.5-turbo，以及通过 Ollama 使用的 llama3 和 mistral。请将 <openai_key> 替换为你的 OpenAI API key。

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
      api_key: <openai_key>
  - model_name: ollama/llama3
    litellm_params:
      model: ollama/llama3
  - model_name: ollama/mistral
    litellm_params:
      model: ollama/mistral

确保你已安装 Ollama 并已拉取 llama3 (8b) 和 mistral (7b) 模型：ollama pull llama3 && ollama pull mistral
运行以下 cli 命令启动 proxy：litellm --config litellm_config.yaml

此时 Lite LLM Proxy 应该已经运行在 http://0.0.0.0:4000

你可以运行 litellm --test 来验证连接。

通过 Litefuse OpenAI Wrapper 记录单次 LLM 调用

Langfuse SDK 提供了对 OpenAI SDK 的 wrapper，会自动将所有 OpenAI 调用作为 generation 记录到 Litefuse。

更多细节请参考我们的文档。

from langfuse.openai import openai
 
# Set PROXY_URL to the url of your lite_llm_proxy (by default: http://0.0.0.0:4000)
PROXY_URL="http://0.0.0.0:4000"
 
system_prompt = "You are a very accurate calculator. You output only the result of the calculation."
 
# Configure the OpenAI client to use the LiteLLM proxy
client = openai.OpenAI(base_url=PROXY_URL)
 
gpt_completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  name="gpt-3.5", # optional name of the generation in langfuse
  messages=[
      {"role": "system", "content": system_prompt},
      {"role": "user", "content": "1 + 1 = "}],
)
print(gpt_completion.choices[0].message.content)
 
llama_completion = client.chat.completions.create(
  model="ollama/llama3",
  name="llama3", # optional name of the generation in langfuse
  messages=[
      {"role": "system", "content": system_prompt},
      {"role": "user", "content": "3 + 3 = "}],
)
print(llama_completion.choices[0].message.content)

以下示例的公开 trace 链接：

通过 Litefuse OpenAI Wrapper 与 `@observe` 装饰器对嵌套的 LLM 调用进行 trace

借助 Litefuse 的 @observe() 装饰器，我们可以自动捕获任意 Python 函数的执行细节，包括输入、输出、时间等。这个装饰器以极少的代码就能让应用获得深入的可观测性，尤其是在涉及知识检索（RAG）或 API 调用（agent）等非 LLM 调用时非常有用。

关于如何使用该装饰器并自定义 trace 行为，请参考我们的文档。

我们来看一个简单示例，它会用到我们在 LiteLLM Proxy 中配置的三个模型：

from langfuse import observe
from langfuse.openai import openai
 
@observe()
def rap_battle(topic: str):
    client = openai.OpenAI(
        base_url=PROXY_URL,
    )
 
    messages = [
        {"role": "system", "content": "You are a rap artist. Drop a fresh line."},
        {"role": "user", "content": "Kick it off, today's topic is {topic}, here's the mic..."}
    ]
 
    # First model (gpt-3.5-turbo) starts the rap
    gpt_completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        name="rap-gpt-3.5-turbo", # add custom name to Litefuse observation
        messages=messages,
    )
    first_rap = gpt_completion.choices[0].message.content
    messages.append({"role": "assistant", "content": first_rap})
    print("Rap 1:", first_rap)
 
    # Second model (ollama/llama3) responds
    llama_completion = client.chat.completions.create(
        model="ollama/llama3",
        name="rap-llama3",
        messages=messages,
    )
    second_rap = llama_completion.choices[0].message.content
    messages.append({"role": "assistant", "content": second_rap})
    print("Rap 2:", second_rap)
 
    # Third model (ollama/mistral) adds the final touch
    mistral_completion = client.chat.completions.create(
        model="ollama/mistral",
        name="rap-mistral",
        messages=messages,
    )
    third_rap = mistral_completion.choices[0].message.content
    messages.append({"role": "assistant", "content": third_rap})
    print("Rap 3:", third_rap)
    
    return messages
 
# Call the function
rap_battle("typography")