Hermes Agent Tracing with Litefuse

Hermes Agent is Nous Research’s terminal-based AI assistant with a pluggable tool-calling and gateway architecture. This integration installs as a Hermes plugin under ~/.hermes/plugins/litefuse/. The plugin runs in-process inside Hermes’ Python interpreter, subscribes to ten lifecycle hook events, and emits one Litefuse trace per user turn (the same trace shape as the Pi and Claude Code integrations, so one dashboard query works across all your agents).

The plugin works on every Hermes surface — CLI (hermes -z, hermes chat -q), Gateway (Feishu, Discord, Slack, Telegram, WhatsApp, IRC), and TUI.

For AI — automated install

If you’re chatting with Hermes Agent right now, paste this prompt and the agent will handle the whole install end-to-end:

Read https://litefuse.ai/SKILL.md and follow the instructions to install and configure Litefuse for Hermes Agent.

The skill will ask for your Litefuse API keys (or walk you through signing up if you don’t have an account yet), then configure everything in place. For step-by-step manual setup instead, continue below.

What gets captured

Data	Captured as	Notes
User prompt	trace `input`	header reaches Litefuse with the first observation that completes — no waiting for the turn to end
Each LLM API call	`plan (n tools) #N` / `response` / `think #N` generation	named by what the model did, never by model name; real request latency from `pre_api_request` → `post_api_request`
Full request/response content	generation `input` (complete messages) / `output` (thinking + text + toolCall blocks)	backfilled from the conversation history at end of turn — thinking is captured
Token usage	`usage_details` on each generation	Anthropic-style keys (`input`, `output`, `cache_read_input_tokens`, `cache_creation_input_tokens`) so Litefuse cost mapping works
Tool invocation	`tool: <name> (<key info>) #N` tool observation	input = args object, output = result, `agent_duration_ms`; key info = command word / file basename / pattern, truncated to 24 chars
Tool errors	tool observation `level=ERROR` + `status_message`	JSON-aware: `"error": null` / `exit_code: 0` do not false-positive
Subagent delegation (`delegate_task`)	`tool (n subagents) #N` → nested `subagent` AGENT container → child steps	three-level subtree; child usage rolls into the parent trace’s total cost
Final assistant response	the `response` generation is the final LLM call	carries its real usage and latency; no duplicate summary observation
Model name	`model` on each generation, trace tag `model:<name>`	used by Litefuse for cost computation
Session grouping	trace `session_id`	Hermes’ `YYYYMMDD_HHMMSS_<6or8hex>` session id
User identity	trace `user_id`	`$LITEFUSE_USER_ID`, falls back to `$USER`
Turn number	trace name `Hermes Agent — Turn N` + `agent_turn_number`	counts user messages, continues across session resume
Agent metadata	flat `agent_*` keys in metadata	same key set as every other agent integration — cross-agent dashboard filters just work

Trace structure

A multi-tool turn that also delegates to a subagent produces a trace shaped like:

Hermes Agent — Turn 7                     (AGENT, root, real turn duration)
├── plan (2 tools) #1                     (GENERATION, usage + real latency, thinking + toolCalls)
├── tool: terminal (grep) #2              (TOOL, input=args, output=result)
├── tool: read_file (index.ts) #3         (TOOL)
├── plan (1 tool) #4                      (GENERATION, usage)
├── tool (1 subagent) #5                  (TOOL — the delegate_task batch container)
│   └── subagent                          (AGENT — child container, input=task, output=child answer)
│       ├── plan (1 tool) #1              (step numbering restarts per container)
│       ├── tool: read_file (README.md) #2
│       └── subagent response             (GENERATION — child usage rolls into trace cost)
└── response                              (GENERATION — the final LLM call itself, usage + latency)

Notes on the design:

Behaviour-based generation names. plan (n tools) #N when the model requested tool calls, response for the final answer, think #N for thinking-only steps. The model name is the generation’s model attribute — switching models never breaks your name-based filters or dashboards.
One shared step counter. Generations and tools draw #N from the same per-container sequence, so the number in the name is agent_step_index in metadata — one numbering scheme, not two.
Tools link to their plan via metadata, not tree depth. Each tool carries agent_plan_step (the step index of the generation that requested it) and agent_tool_call_id. The tree stays flat; the join is queryable.
Real wall-clock timestamps. Generation spans open at pre_api_request and close with the time captured at post_api_request — the timeline shows actual LLM latency. Tool spans run pre_tool_call → post_tool_call.
Full I/O via end-of-turn backfill. Hermes’ hook payloads carry counts, not content — so the plugin holds generation spans open and backfills input (the complete message array) and output (thinking / text / toolCall blocks) from post_llm_call’s conversation history before exporting. Each span is still sent exactly once.
Trace header rides on every span. The trace appears in Litefuse (name / session / input / tags) as soon as the first observation completes, with no extra bootstrap observation.
Subagent subtrees. A delegate_task call produces a tool (n subagents) #N span wrapping one subagent AGENT container per child. The gap between the tool span and the container is your real delegation overhead (process spawn, result parsing).

Quick Start

Prerequisites

A Litefuse project at https://litefuse.cloud with public + secret keys.
Hermes Agent installed — check hermes --version. Recent versions (≥ v0.12) load plugins on every surface including TUI and oneshot.

Install Langfuse SDK v4 into Hermes’ own venv

The plugin runs in-process inside Hermes’ Python interpreter, so the SDK has to live in Hermes’ venv (not a separate one):

~/.hermes/hermes-agent/venv/bin/pip install 'langfuse>=4,<5'
 
# Verify
~/.hermes/hermes-agent/venv/bin/python3 -c "import langfuse; print(langfuse.__version__)"
# Expect: 4.x.y

Drop the plugin files into `~/.hermes/plugins/litefuse/`

mkdir -p ~/.hermes/plugins/litefuse
curl -fsSL https://litefuse.ai/integrations/hermes-agent/plugin.yaml \
  -o ~/.hermes/plugins/litefuse/plugin.yaml
curl -fsSL https://litefuse.ai/integrations/hermes-agent/__init__.py \
  -o ~/.hermes/plugins/litefuse/__init__.py

The plugin source is at the same URLs — read it before deploying if you want.

Add Litefuse credentials to `~/.hermes/.env`

The plugin reads credentials from the process environment (which Hermes seeds from its own .env file at startup):

cat >> ~/.hermes/.env <<'EOF'
 
# Litefuse observability
LITEFUSE_PUBLIC_KEY=pk-lf-xxx
LITEFUSE_SECRET_KEY=sk-lf-xxx
LITEFUSE_BASE_URL=https://litefuse.cloud
EOF

Replace the placeholder values. LITEFUSE_* variables take precedence; the same-name LANGFUSE_* variables work as a fallback, so existing installs configured with LANGFUSE_PUBLIC_KEY / LANGFUSE_HOST keep working unchanged. The plugin also loads ~/.hermes/state/litefuse.env (then ~/.hermes/litefuse.env) as a fallback env file.

Enable the plugin

Hermes opt-in is required:

hermes plugins enable litefuse
hermes plugins list   # confirm: status = enabled

Restart any running gateway

The gateway daemon caches its plugin manager at startup, so a restart is needed for the new plugin to load:

hermes gateway restart  # only if you have `hermes gateway` running

CLI invocations load the plugin fresh on each run — no restart needed.

Verify

hermes -z "use the shell tool to count the lines in ~/.hermes/config.yaml, then tell me the number"
tail -3 ~/.hermes/state/litefuse_plugin.log
# Expected: "litefuse plugin v0.2.0 registered (10 hooks, spec v1.2)"
#         + "turn closed session=YYYYMMDD_HHMMSS_xxxxxx turn=1 steps=3 api=2 tools=1 final=True"

Open https://litefuse.cloud and look at the latest trace named Hermes Agent — Turn 1. It should have:

correct name / sessionId / input / output and tags hermes-agent, model:<name>
one AGENT root; everything else flat underneath
[GENERATION] plan (1 tool) #1 with token usage, real latency, and the model’s thinking in the output
[TOOL] tool: terminal (wc) #2 with input args, output result, and agent_plan_step: 1
[GENERATION] response carrying the final answer and its own usage

Environment variables

Variable	Required	Description
`LITEFUSE_PUBLIC_KEY`	Yes	Litefuse project public key (`pk-lf-...`). Falls back to `LANGFUSE_PUBLIC_KEY`.
`LITEFUSE_SECRET_KEY`	Yes	Litefuse project secret key (`sk-lf-...`). Falls back to `LANGFUSE_SECRET_KEY`.
`LITEFUSE_BASE_URL`	No	Defaults to `https://cloud.langfuse.com`. Set to `https://litefuse.cloud` (or your self-hosted URL). Aliases: `LITEFUSE_HOST`, `LANGFUSE_BASE_URL`, `LANGFUSE_HOST`.
`LITEFUSE_TRACING_ENVIRONMENT`	No	Litefuse `environment` for emitted traces (e.g. `test` to keep experiments out of production dashboards). Falls back to `LANGFUSE_TRACING_ENVIRONMENT`.
`LITEFUSE_USER_ID`	No	Overrides the trace `user_id`. Falls back to `$USER` then `"hermes-user"`.
`LITEFUSE_RELEASE`	No	Release tag on emitted traces. Falls back to `LANGFUSE_RELEASE`.
`HERMES_LITEFUSE_DEBUG`	No	Set to `"true"` for verbose plugin logging at `~/.hermes/state/litefuse_plugin.log`.
`HERMES_LITEFUSE_MAX_CHARS`	No	Truncation threshold (in characters) for span inputs/outputs. Default `1000000` (~1MB of text).

Credential lookup order: process env → ~/.hermes/state/litefuse.env → ~/.hermes/litefuse.env (first existing file wins; env always takes precedence over files).

How it works

The plugin subscribes to ten Hermes plugin-hook events via ctx.register_hook(...):

Hook	Role
`on_session_start`	Caches session model / platform metadata.
`pre_llm_call`	Opens the turn’s AGENT root container, computes the turn number from the conversation history (so resumed sessions keep counting), and attaches the trace header. Also detects subagent sessions (see below).
`pre_api_request`	Opens a generation span at the real request start time and assigns the next shared step number.
`post_api_request`	Records the call’s outcome on the open generation — usage, finish_reason, tool-call count, real end time — and names it provisionally (`plan (n tools) #N` / `response`). The span stays open for content backfill.
`pre_tool_call`	Opens a `tool: <name> (<key info>) #N` span with input = the args object, `agent_plan_step` pointing at the requesting generation, and the next shared step number. For `delegate_task`, registers a pending delegation so child sessions can attach.
`post_tool_call`	Closes the matching tool span with output, `agent_duration_ms`, `agent_tool_call_id`, and ERROR level + `status_message` on failure. Matched via a per-session FIFO queue (Hermes v0.12’s `pre_tool_call` call sites don’t pass `session_id` / `tool_call_id`).
`post_llm_call`	End of turn: backfills every held-open generation’s `input` (full messages) and `output` (thinking / text / toolCall blocks) from the conversation history, finalises names, ends each at its recorded real end time, then closes the root with the final answer and turn stats.
`on_session_end`	Defensive cleanup for interrupted turns — leaked tool spans get WARNING (`"turn ended before tool completed"`), the root closes with WARNING if no final text was produced.
`on_session_reset` / `on_session_finalize`	Drop in-memory state when the session is truly torn down.

Subagent subtrees. Hermes runs delegate_task children in-process on worker threads, and its hooks don’t carry a parent-session link — so the plugin uses a heuristic: a session whose first turn starts while a delegate_task tool span is open is treated as that delegation’s child. The child’s container (subagent, AGENT type) is parented under the delegate tool span via the trace context; child steps renumber from #1; the child’s generations carry their own usage, which Litefuse rolls into the parent trace’s total cost. The linkage is recorded as agent_subagent_link: "heuristic" in the container metadata.

The plugin is fail-open: any unexpected error is logged to ~/.hermes/state/litefuse_plugin.log and the callback returns silently so it never blocks Hermes’ main loop. A single Langfuse client is held in module-level state across the lifetime of the Hermes process, guarded by a thread lock so multiple concurrent gateway sessions share it safely.

Trace metadata reference

All metadata is flat, agent_-prefixed scalars — the same key set as every other Litefuse agent integration, so metadata.agent_duration_ms > 60000 filters work across Hermes, Pi, and Claude Code traces alike. Fields absent from the source data are omitted entirely (no null padding).

Trace root (AGENT observation):

agent_turn_number, agent_session_id, agent_cwd, agent_model, agent_platform, agent_history_messages
closed with: agent_api_calls, agent_tool_calls, agent_steps, agent_message_count, agent_duration_ms

Generations (plan / response / think):

agent_turn_number, agent_step_index, agent_provider, agent_api (api_mode), agent_stop_reason, agent_api_duration_ms, agent_tool_call_count, agent_message_count
agent_thinking_chars when the message carried reasoning

Tools (tool: ...):

agent_tool_name, agent_step_index, agent_plan_step (join key: equals the requesting generation’s agent_step_index), agent_tool_call_id, agent_duration_ms, agent_is_error
agent_output_truncated / agent_output_orig_len when truncation hit

Subagent containers (subagent):

agent_subagent: true, agent_subagent_link: "heuristic", agent_model, agent_platform, agent_session_id (the child’s)

Troubleshooting

No traces appear in Litefuse. Tail ~/.hermes/state/litefuse_plugin.log:

tail -20 ~/.hermes/state/litefuse_plugin.log

Expected: at startup "litefuse plugin v0.2.0 registered (10 hooks, spec v1.2)", then "Litefuse client ready" and per-turn "turn closed session=... turn=N steps=N api=N tools=N final=True". If the file is empty, the plugin isn’t loading:

hermes plugins list           # confirm: litefuse → enabled
hermes plugins enable litefuse  # if not enabled
hermes gateway restart        # if gateway is running

Plugin log says Litefuse credentials not set. The plugin couldn’t find keys in the environment or fallback env files. Add them to ~/.hermes/.env:

cat >> ~/.hermes/.env <<'EOF'
LITEFUSE_PUBLIC_KEY=pk-lf-...
LITEFUSE_SECRET_KEY=sk-lf-...
LITEFUSE_BASE_URL=https://litefuse.cloud
EOF

Then re-run any hermes -z "..." or restart the gateway. (LANGFUSE_* names also work as a fallback.)

Plugin log says langfuse SDK not importable. The Langfuse SDK isn’t installed in Hermes’ venv:

~/.hermes/hermes-agent/venv/bin/pip install --upgrade 'langfuse>=4,<5'

Generations show up only when the turn ends. By design: generation content (full messages, thinking) is only available at end of turn, so generation spans export then — with their real recorded timestamps. Tool spans export live as each tool finishes, and the trace header appears with the first completed observation.

Old trace shape (api: <model> #N, user message events, metadata.hermes_agent.*). You’re running plugin v0.1.0. Re-fetch both files from https://litefuse.ai/integrations/hermes-agent/ and restart the gateway.

TUI or hermes -z oneshot doesn’t produce traces. Only affects Hermes versions older than v0.12 — upgrade Hermes, or use hermes chat -q "..." which has always loaded plugins.

Limitations

In-flight steps are invisible until they end. Each span is sent exactly once, at its end — a long-running tool or streaming LLM call won’t show in the live view until it completes. During a delegation, child steps can appear before their parent container (“orphan” rows at the list tail); the tree is complete and correct once the turn ends.
Subagent linking is heuristic. On a busy multi-user gateway, an unrelated session starting in the same instant as a delegation could be mis-attached (it would be tagged agent_subagent_link: "heuristic"). Single-user setups are unaffected.
No TTFT / sampling params. Hermes hooks don’t expose first-token timing or temperature/top_p, so completion_start_time and full model_parameters (beyond max_tokens) are absent.

Resources

Hermes Agent hooks user guide
Hermes Agent plugin reference
Langfuse Python SDK v4
Litefuse Cloud
Plugin source: plugin.yaml · __init__.py

OpenTelemetry OpenClaw

Was this page helpful?

Support