通过 API/SDK 添加分数

你可以使用 Langfuse SDK 或 API 给 trace、observation、会话和数据集运行添加分数。这是一种支持自定义评估流程、扩展 Litefuse 打分能力的评估方法。数据模型见分数。

常见用例

收集用户反馈：在应用内收集用户对应用质量或性能的反馈。可通过我们的 Browser SDK 在前端采集。 -> 示例 Notebook
自定义评估数据流水线：持续监控质量 —— 从 Litefuse 拉取 trace、运行自定义评估、再把分数写回 Litefuse。 -> 示例 Notebook
护栏与安全检查：检查输出是否包含某个关键词、是否符合指定结构/格式，或者是否超过一定长度。 -> 示例 Notebook
自定义内部工作流工具：构建自定义内部工具来管理 human-in-the-loop 流程。把分数写回 Litefuse，并可选择通过引用 config 来遵循你的自定义 schema。
自定义运行时评估：例如跟踪生成的 SQL 代码是否真的能跑、或结构化输出是否是合法 JSON。
会话级别的质量跟踪：通过在 SDK/API 中附加 sessionId 给完整对话打分（例如客服对话或 agent 线程）。

通过 API/SDK 接入分数

分数可以以不同粒度附加：单个 trace、trace 内的某个 observation，或完整会话。

完整的分数与分数配置 POST/GET 端点详情请见 API 参考。

Trace 或 Observation 级别的分数

你可以通过 Langfuse SDK 或 API 添加分数。分数支持三种数据类型：数值型、分类型或布尔型。

如果通过 trace_id 手动接入分数将其链接到 trace，无需等待 trace 创建完成。分数会出现在分数表中，并在同 trace_id 的 trace 创建时自动关联。

下面是按 Score 数据类型给出的示例。

对于 trace 和 observation 级别的分数，trace_id/traceId 是必填的，observation_id/observationId 是可选的。如果你给 observation 附加分数，请始终同时提供 observation ID 和对应的 trace ID。

数值型分数值必须以 float 形式提供。

from langfuse import get_client
langfuse = get_client()
 
# Method 1: Score via low-level method
langfuse.create_score(
    name="correctness",
    value=0.9,
    trace_id="trace_id_here",
    observation_id="observation_id_here", # optional
    data_type="NUMERIC", # optional, inferred if not provided
    comment="Factually correct", # optional
)
 
# Method 2: Score current span/generation (within context)
with langfuse.start_as_current_observation(as_type="span", name="my-operation") as span:
    # Score the current span
    span.score(
        name="correctness",
        value=0.9,
        data_type="NUMERIC",
        comment="Factually correct"
    )
 
    # Score the trace
    span.score_trace(
        name="overall_quality",
        value=0.95,
        data_type="NUMERIC"
    )
 
 
# Method 3: Score via the current context
with langfuse.start_as_current_observation(as_type="span", name="my-operation"):
    # Score the current span
    langfuse.score_current_span(
        name="correctness",
        value=0.9,
        data_type="NUMERIC",
        comment="Factually correct"
    )
 
    # Score the trace
    langfuse.score_current_trace(
        name="overall_quality",
        value=0.95,
        data_type="NUMERIC"
    )

分类型分数值必须以字符串形式提供。

from langfuse import get_client
langfuse = get_client()
 
# Method 1: Score via low-level method
langfuse.create_score(
    name="accuracy",
    value="partially correct",
    trace_id="trace_id_here",
    observation_id="observation_id_here", # optional
    data_type="CATEGORICAL", # optional, inferred if not provided
    comment="Some factual errors", # optional
)
 
# Method 2: Score current span/generation (within context)
with langfuse.start_as_current_observation(as_type="span", name="my-operation") as span:
    # Score the current span
    span.score(
        name="accuracy",
        value="partially correct",
        data_type="CATEGORICAL",
        comment="Some factual errors"
    )
 
    # Score the trace
    span.score_trace(
        name="overall_quality",
        value="partially correct",
        data_type="CATEGORICAL"
    )
 
# Method 3: Score via the current context
with langfuse.start_as_current_observation(as_type="span", name="my-operation"):
    # Score the current span
    langfuse.score_current_span(
        name="accuracy",
        value="partially correct",
        data_type="CATEGORICAL",
        comment="Some factual errors"
    )
 
    # Score the trace
    langfuse.score_current_trace(
        name="overall_quality",
        value="partially correct",
        data_type="CATEGORICAL"
    )

布尔型分数必须以 float 形式提供。其字符串表示会自动生成，并在读取时可访问。POST/GET 分数端点的更多细节见 API 参考。

from langfuse import get_client
langfuse = get_client()
 
# Method 1: Score via low-level method
langfuse.create_score(
    name="helpfulness",
    value=0, # 0 or 1
    trace_id="trace_id_here",
    observation_id="observation_id_here", # optional
    data_type="BOOLEAN", # required, numeric values without data type would be inferred as NUMERIC
    comment="Incorrect answer", # optional
)
 
# Method 2: Score current span/generation (within context)
with langfuse.start_as_current_observation(as_type="span", name="my-operation") as span:
    # Score the current span
    span.score(
        name="helpfulness",
        value=1, # 0 or 1
        data_type="BOOLEAN",
        comment="Very helpful response"
    )
 
    # Score the trace
    span.score_trace(
        name="overall_quality",
        value=1, # 0 or 1
        data_type="BOOLEAN"
    )
# Method 3: Score via the current context
with langfuse.start_as_current_observation(as_type="span", name="my-operation"):
    # Score the current span
    langfuse.score_current_span(
        name="helpfulness",
        value=1, # 0 or 1
        data_type="BOOLEAN",
        comment="Very helpful response"
    )
 
    # Score the trace
    langfuse.score_current_trace(
        name="overall_quality",
        value=1, # 0 or 1
        data_type="BOOLEAN"
    )

数值型分数值必须以 float 形式提供。

import { LangfuseClient } from "@langfuse/client";
 
const langfuse = new LangfuseClient();
 
langfuse.score.create({
  id: "unique_id", // optional, can be used as an idempotency key to update the score subsequently
  traceId: message.traceId,
  observationId: message.generationId, // optional
  name: "correctness",
  value: 0.9,
  dataType: "NUMERIC", // optional, inferred if not provided
  comment: "Factually correct", // optional
});
 
// Flush the scores in short-lived environments
await langfuse.flush();

分类型分数值必须以字符串形式提供。

import { LangfuseClient } from "@langfuse/client";
 
const langfuse = new LangfuseClient();
 
langfuse.score.create({
  id: "unique_id", // optional, can be used as an idempotency key to update the score subsequently
  traceId: message.traceId,
  observationId: message.generationId, // optional
  name: "accuracy",
  value: "partially correct",
  dataType: "CATEGORICAL", // optional, inferred if not provided
  comment: "Factually correct", // optional
});
 
// Flush the scores in short-lived environments
await langfuse.flush();

布尔型分数必须以 float 形式提供。其字符串表示会自动生成，并在读取时可访问。POST/GET 分数端点的更多细节见 API 参考。

import { LangfuseClient } from "@langfuse/client";
 
const langfuse = new LangfuseClient();
 
langfuse.score.create({
  id: "unique_id", // optional, can be used as an idempotency key to update the score subsequently
  traceId: message.traceId,
  observationId: message.generationId, // optional
  name: "helpfulness",
  value: 0, // 0 or 1
  dataType: "BOOLEAN", // required, numeric values without data type would be inferred as NUMERIC
  comment: "Incorrect answer", // optional
});
 
// Flush the scores in short-lived environments
await langfuse.flush();

你也可以直接通过 REST API 创建分数。使用 HTTP Basic Auth，将 Litefuse Public Key 作为用户名、Secret Key 作为密码。

数值型分数值必须以 float 形式提供。

curl -X POST https://litefuse.cloud/api/public/scores \
  -u "pk-lf-...":"sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{
    "traceId": "trace_id_here",
    "observationId": "observation_id_here",
    "name": "correctness",
    "value": 0.9,
    "dataType": "NUMERIC",
    "comment": "Factually correct"
  }'

分类型分数值必须以字符串形式提供。

curl -X POST https://litefuse.cloud/api/public/scores \
  -u "pk-lf-...":"sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{
    "traceId": "trace_id_here",
    "observationId": "observation_id_here",
    "name": "accuracy",
    "value": "partially correct",
    "dataType": "CATEGORICAL",
    "comment": "Some factual errors"
  }'

布尔型分数必须以 float（0 或 1）形式提供。其字符串表示会自动生成，并在读取时可访问。

curl -X POST https://litefuse.cloud/api/public/scores \
  -u "pk-lf-...":"sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{
    "traceId": "trace_id_here",
    "observationId": "observation_id_here",
    "name": "helpfulness",
    "value": 0,
    "dataType": "BOOLEAN",
    "comment": "Incorrect answer"
  }'

会话级别的分数

要给整个会话打分（不附加到 trace 或 observation），只提供 session_id（Python SDK）或 sessionId（JS/TS SDK 与 API）即可。

from langfuse import get_client
langfuse = get_client()
 
langfuse.create_score(
    name="session_quality",
    value=0.85,
    session_id="session_id_here",
    data_type="NUMERIC",
    comment="Overall conversation quality"
)

import { LangfuseClient } from "@langfuse/client";
 
const langfuse = new LangfuseClient();
 
langfuse.score.create({
  name: "session_quality",
  value: 0.85,
  sessionId: "session_id_here",
  dataType: "NUMERIC",
  comment: "Overall conversation quality",
});
 
await langfuse.flush();

curl -X POST https://litefuse.cloud/api/public/scores \
  -u "pk-lf-...":"sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{
    "sessionId": "session_id_here",
    "name": "session_quality",
    "value": 0.85,
    "dataType": "NUMERIC",
    "comment": "Overall conversation quality"
  }'

进阶

防止重复分数

默认情况下，Litefuse 允许同一 trace 上存在多个同名分数。这在你想跟踪某个分数随时间变化、或同一 trace 上收到多次用户反馈时非常有用。

某些场景下你希望避免这种行为或更新已有分数。可以为分数创建一个幂等键，在创建分数时把它作为 id（JS/TS）/ score_id（Python）传入，例如 <trace_id>-<score_name>。

注意，如果你预计同一分数的 API 调用相隔超过 60 天，应同时使用相同的 timestamp。详见如何更新 trace、observation 和分数。

强制使用分数配置

当你希望为后续分析标准化分数时，分数配置非常有用。

要强制使用分数配置，可以在创建分数时提供一个 configId，引用之前创建好的 ScoreConfig。分数配置可以在 Litefuse UI 中或通过我们的 API 定义。详见分数配置。

只要你提供了 ScoreConfig，分数数据就会针对该配置进行校验。规则如下：

分数名称：必须等于 config 的 name
分数数据类型：如果提供，必须与 config 的数据类型一致
数值型分数的取值：必须落在 config 中定义的最小值和最大值之间（如果提供的话；min 和 max 是可选的，未定义时分别视为 -∞ 和 +∞）
分类型分数的取值：必须映射到 config 中定义的某个类别
布尔型分数的取值：必须等于 0 或 1

接入数值型分数时，可以以 float 提供值。如果提供 configId，分数值会按照 config 的数值范围（可由最小值和/或最大值定义）进行校验。

from langfuse import get_client
langfuse = get_client()
 
# Method 1: Score via low-level method
langfuse.create_score(
    trace_id="trace_id_here",
    observation_id="observation_id_here", # optional
    session_id="session_id_here", # optional, ID of the session the score relates to
    name="accuracy",
    value=0.9,
    comment="Factually correct", # optional
    score_id="unique_id", # optional, can be used as an idempotency key to update the score subsequently
    config_id="78545-6565-3453654-43543", # optional, to ensure that the score follows a specific min/max value range
    data_type="NUMERIC" # optional, possibly inferred
)
 
# Method 2: Score within context
with langfuse.start_as_current_observation(as_type="span", name="my-operation") as span:
    span.score(
        name="accuracy",
        value=0.9,
        comment="Factually correct",
        config_id="78545-6565-3453654-43543",
        data_type="NUMERIC"
    )

分类型分数用于评估属于特定类别的数据。接入分类型分数时，以字符串提供值。如果提供 configId，分数值会按照 config 中的类别进行校验。

from langfuse import get_client
langfuse = get_client()
 
# Method 1: Score via low-level method
langfuse.create_score(
    trace_id="trace_id_here",
    observation_id="observation_id_here", # optional
    name="correctness",
    value="correct",
    comment="Factually correct", # optional
    score_id="unique_id", # optional, can be used as an idempotency key to update the score subsequently
    config_id="12345-6565-3453654-43543", # optional, to ensure that the score maps to a specific category defined in a score config
    data_type="CATEGORICAL" # optional, possibly inferred
)
 
# Method 2: Score within context
with langfuse.start_as_current_observation(as_type="span", name="my-operation") as span:
    span.score(
        name="correctness",
        value="correct",
        comment="Factually correct",
        config_id="12345-6565-3453654-43543",
        data_type="CATEGORICAL"
    )

接入布尔型分数时，以 float 提供值。如果提供 configId，分数的 name 必须与 config 的 name 一致，数据类型也必须一致。

from langfuse import get_client
langfuse = get_client()
 
# Method 1: Score via low-level method
langfuse.create_score(
    trace_id="trace_id_here",
    observation_id="observation_id_here", # optional
    name="helpfulness",
    value=1,
    comment="Factually correct", # optional
    score_id="unique_id", # optional, can be used as an idempotency key to update the score subsequently
    config_id="93547-6565-3453654-43543", # optional, can be used to infer the score data type and validate the score value
    data_type="BOOLEAN" # optional, possibly inferred
)
 
# Method 2: Score within context
with langfuse.start_as_current_observation(as_type="span", name="my-operation") as span:
    span.score(
        name="helpfulness",
        value=1,
        comment="Factually correct",
        config_id="93547-6565-3453654-43543",
        data_type="BOOLEAN"
    )

接入数值型分数时，可以以 float 提供值。如果提供 configId，分数值会按照 config 的数值范围（可由最小值和/或最大值定义）进行校验。

import { LangfuseClient } from "@langfuse/client";
 
const langfuse = new LangfuseClient();
 
langfuse.score.create({
  traceId: message.traceId,
  observationId: message.generationId, // optional
  name: "accuracy",
  value: 0.9,
  comment: "Factually correct", // optional
  id: "unique_id", // optional, can be used as an idempotency key to update the score subsequently
  configId: "78545-6565-3453654-43543", // optional, to ensure that the score follows a specific min/max value range
  dataType: "NUMERIC", // optional, possibly inferred
});
 
// Flush the scores in short-lived environments
await langfuse.flush();

分类型分数用于评估属于特定类别的数据。接入分类型分数时，以字符串提供值。如果提供 configId，分数值会按照 config 中的类别进行校验。

import { LangfuseClient } from "@langfuse/client";
 
const langfuse = new LangfuseClient();
 
langfuse.score.create({
  traceId: message.traceId,
  observationId: message.generationId, // optional
  name: "correctness",
  value: "correct",
  comment: "Factually correct", // optional
  id: "unique_id", // optional, can be used as an idempotency key to update the score subsequently
  configId: "12345-6565-3453654-43543", // optional, to ensure that a score maps to a specific category defined in a score config
  dataType: "CATEGORICAL", // optional, possibly inferred
});
 
// Flush the scores in short-lived environments
await langfuse.flush();

接入布尔型分数时，以 float 提供值。如果提供 configId，分数的 name 必须与 config 的 name 一致，数据类型也必须一致。

import { LangfuseClient } from "@langfuse/client";
 
const langfuse = new LangfuseClient();
 
langfuse.score.create({
  traceId: message.traceId,
  observationId: message.generationId, // optional
  name: "helpfulness",
  value: 1,
  comment: "Factually correct", // optional
  id: "unique_id", // optional, can be used as an idempotency key to update the score subsequently
  configId: "93547-6565-3453654-43543", // optional, can be used to infer the score data type and validate the score value
  dataType: "BOOLEAN", // optional, possibly inferred
});
 
// Flush the scores in short-lived environments
await langfuse.flush();

你也可以通过 REST API 提供 configId 来强制使用分数配置。

接入数值型分数时，可以以 float 提供值。如果提供 configId，分数值会按照 config 的数值范围进行校验。

curl -X POST https://litefuse.cloud/api/public/scores \
  -u "pk-lf-...":"sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{
    "id": "unique_id",
    "traceId": "trace_id_here",
    "observationId": "observation_id_here",
    "name": "accuracy",
    "value": 0.9,
    "dataType": "NUMERIC",
    "configId": "78545-6565-3453654-43543",
    "comment": "Factually correct"
  }'

分类型分数用于评估属于特定类别的数据。如果提供 configId，分数值会按照 config 中的类别进行校验。

curl -X POST https://litefuse.cloud/api/public/scores \
  -u "pk-lf-...":"sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{
    "id": "unique_id",
    "traceId": "trace_id_here",
    "observationId": "observation_id_here",
    "name": "correctness",
    "value": "correct",
    "dataType": "CATEGORICAL",
    "configId": "12345-6565-3453654-43543",
    "comment": "Factually correct"
  }'

接入布尔型分数时，以 float 提供值。如果提供 configId，分数的 name 必须与 config 的 name 一致，数据类型也必须一致。

curl -X POST https://litefuse.cloud/api/public/scores \
  -u "pk-lf-...":"sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{
    "id": "unique_id",
    "traceId": "trace_id_here",
    "observationId": "observation_id_here",
    "name": "helpfulness",
    "value": 1,
    "dataType": "BOOLEAN",
    "configId": "93547-6565-3453654-43543",
    "comment": "Factually correct"
  }'

POST/GET 分数配置端点的更多细节见 API 参考。

推断的分数属性

某些分数属性可能根据你的输入推断得出：

如果你不提供分数数据类型，它总是会被推断。详见下方表格。
对于布尔型和分类型分数，我们会尽可能同时提供数值和字符串两种格式的值。未作为输入提供的那一种格式，即下表所称的”推断值”。
读取布尔型分数时，会同时返回数值和字符串两种表示，例如 1 和 True。
对于分类型分数，字符串表示总会提供；只有当提供了 ScoreConfig 时，才会生成类别的数值映射。

详细示例：

假设你想接入一个数值型分数来衡量accuracy。下表列出了若干可能的接入场景。

Value	Data Type	Config Id	说明	推断的数据类型	是否有效
`0.9`	`Null`	`Null`	推断数据类型	`NUMERIC`	是
`0.9`	`NUMERIC`	`Null`	无属性被推断		是
`depth`	`NUMERIC`	`Null`	错误：值的数据类型与提供的数据类型不匹配		否
`0.9`	`NUMERIC`	`78545`	无属性被推断		取决于 config 校验
`0.9`	`Null`	`78545`	推断数据类型	`NUMERIC`	取决于 config 校验
`depth`	`NUMERIC`	`78545`	错误：值的数据类型与提供的数据类型不匹配		否

假设你想接入一个分类型分数来衡量correctness。下表列出了若干可能的接入场景。

Value	Data Type	Config Id	说明	推断的数据类型	推断的值表示	是否有效
`correct`	`Null`	`Null`	推断数据类型	`CATEGORICAL`		是
`correct`	`CATEGORICAL`	`Null`	无属性被推断			是
`1`	`CATEGORICAL`	`Null`	错误：值的数据类型与提供的数据类型不匹配			否
`correct`	`CATEGORICAL`	`12345`	推断数值		`4`，按 config 的类别映射	取决于 config 校验
`correct`	`NULL`	`12345`	推断数据类型	`CATEGORICAL`		取决于 config 校验
`1`	`CATEGORICAL`	`12345`	错误：值的数据类型与提供的数据类型不匹配			否

假设你想接入一个布尔型分数来衡量helpfulness。下表列出了若干可能的接入场景。

Value	Data Type	Config Id	说明	推断的数据类型	推断的值表示	是否有效
`1`	`BOOLEAN`	`Null`	推断对应的字符串值		`True`	是
`true`	`BOOLEAN`	`Null`	错误：值的数据类型与提供的数据类型不匹配			否
`3`	`BOOLEAN`	`Null`	错误：布尔类型只接受 `0` 或 `1`			否
`0.9`	`Null`	`93547`	推断数据类型和对应的字符串值	`BOOLEAN`	`True`	取决于 config 校验
`depth`	`BOOLEAN`	`93547`	错误：值的数据类型与提供的数据类型不匹配			否

通过 API/SDK 更新已有分数

创建分数时，你可以提供一个可选的 id（JS/TS）/ score_id（Python）参数。如果该分数在你的项目中已存在，则会执行更新。

如果你不想为更新分数而先去 Litefuse 拉取已有分数列表，可以在最初创建分数时把你自己的 id 作为幂等键传入。

在 UI 中打分分数分析

这个页面对你有帮助吗？

支持