Kimi Code Tracing with Litefuse

Kimi Code is Moonshot AI’s terminal-based coding agent (the kimi CLI, running Kimi-k2 models). Kimi Code has no plugin or hook system, but it writes a complete event log — wire.jsonl — for every session. This integration is a polling collector: a single Python file with zero dependencies (standard library only) that runs every 30 seconds via launchd (macOS) or cron (Linux), parses new wire events, and ships spans straight to Litefuse’s OTLP endpoint. No Kimi Code source changes, no SDK, no virtualenv.

The collector emits one Litefuse trace per user turn: one generation per LLM API call, one tool observation per tool execution, and a full subtree for every subagent delegation.

For AI — automated install

If you’re chatting with an AI agent right now (Kimi Code itself works fine), paste this prompt and the agent will handle the whole install end-to-end:

Read https://litefuse.ai/SKILL.md and follow the instructions to install and configure Litefuse for Kimi Code.

The skill will ask for your Litefuse API keys (or walk you through signing up if you don’t have an account yet), then configure everything in place. For step-by-step manual setup instead, continue below.

What gets captured

Data	Captured as	Notes
User prompt	trace input	text; image/video inputs leave a block count in metadata
Each LLM API call	generation observation	`plan (n tools) #N` / `response` / `think #N`, named after what the model did; thinking / text / toolCall block structure preserved in the output
Tool executions (input + output)	tool observation	`tool: bash (grep) #N` — key info in the name, full args in the input
Subagents (`Agent` tool)	subtree	`tool (1 subagent) #N` → `subagent` container → the child’s own plan/tool/response steps, parsed from the child’s own `wire.jsonl`; child usage rolls up into the parent trace
Token usage	`usage_details` on generation	Anthropic-style keys (`input` / `output` / `cache_read_input_tokens` / `cache_creation_input_tokens`), mapped from Kimi’s `inputOther` / `output` / `inputCacheRead` / `inputCacheCreation`
Model name	`model` on generation	e.g. `kimi-code/kimi-for-coding` — used by Litefuse for cost computation
Time to first token	`completion_start_time` on generation	from Kimi’s `llmFirstTokenLatencyMs`
Tool errors (`isError`)	tool observation, `level=ERROR`	with a status-message preview
Cancelled turns (`turn.cancel`)	root span `level=WARNING`	status message `turn cancelled by user`
Interrupted turns	root span `level=WARNING`	a turn whose LLM call never completed (e.g. expired credentials, killed process)
Session grouping	trace `session_id`	Kimi Code session id (`session_<uuid>`)
User identity	trace `user_id`	`$LITEFUSE_USER_ID`, falls back to the OS username
Working directory	trace metadata (`agent_cwd`)	from Kimi Code’s session index

Trace structure

A turn that delegates to a subagent produces a trace shaped like this (real example):

Kimi Code — Turn 7                          (AGENT root span, trace headers)
├── plan (3 tools) #1                       (generation — usage, real latency, TTFT)
├── tool: read (overview.txt) #2
├── tool: read (events.jsonl) #3
├── tool: read (queue.jsonl) #4
├── plan (1 tool) #5
├── tool (1 subagent) #6                    (tool — the delegation as seen by the parent)
│   └── subagent                            (AGENT container — parsed from the child's wire.jsonl)
│       ├── plan (1 tool) #1                (child-local numbering restarts at #1)
│       ├── tool: read (overview.txt) #2
│       ├── plan (1 tool) #3
│       ├── tool: bash (grep) #4
│       └── subagent response               (generation — the child's final answer)
└── response                                (generation — the final answer, ends the turn)

Design notes:

One trace per user turn, complete turns only. The collector only emits turns that have finished — the last step ended with end_turn, the turn was cancelled, or a newer prompt has superseded it. An in-progress turn is left in place and re-read on the next poll, so a turn is never split across two traces and every span is sent exactly once.
Honest time semantics. Kimi’s step.end timestamp includes tool execution (it marks the end of the agent step, not the LLM call). The collector reconstructs the real LLM duration as step.begin + llmFirstTokenLatencyMs + llmStreamDurationMs, and times each tool span from its tool.call event to its tool.result event. Permission-approval wait shows up as a real gap between the generation and the tool — which is exactly what happened.
One step counter per agent container. #N is a single chronological sequence shared by generations and tools; each subagent container restarts at #1. A tool’s agent_plan_step metadata points at the agent_step_index of the generation that requested it.
Subagent subtrees. The Agent tool’s result carries an agent_id: header that the collector resolves to the child’s own wire.jsonl under <session>/agents/agent-<n>/. The delegation tool span deliberately wraps the container: tool-span duration − container duration = the real overhead of delegating. The resolution is recursive, though note that Kimi Code currently does not give subagents the Agent tool, so trees deeper than two agent levels cannot occur in practice.
Deterministic IDs. Trace and span IDs are derived from the session id, turn number, and event UUIDs — a re-run after a state reset upserts instead of duplicating.
Flat agent_* metadata. All integration fields live at the metadata top level with an agent_ prefix (agent_step_index, agent_plan_step, agent_duration_ms…) — the same keys as every other Litefuse agent integration, so one dashboard query works across all of them.
Turn-local generation input. wire.jsonl does not record the full request payload of each API call, so generation inputs are reconstructed from the messages of the current turn (marked agent_input_scope: "turn" in metadata); cross-turn history and the system prompt are not included.

When do traces appear?

The collector polls every 30 seconds and only uploads finished turns, so a trace appears up to ~30 s after the turn’s final answer — nothing is visible mid-turn. Keep this in mind for long-running turns (a multi-subagent delegation can run for many minutes before anything shows up). This is a deliberate difference from event-driven integrations like Pi, which send each observation the moment it ends — Kimi Code offers no in-process extension point to do that.

Two special cases: a turn abandoned mid-flight is emitted (with a WARNING root) as soon as a newer prompt supersedes it, and a turn idle for more than 30 minutes (configurable) is emitted as interrupted.

Quick Start

Prerequisites

Python ≥ 3.8 — any python3 works, including macOS’s system Python. The collector has zero third-party dependencies: no SDK, no virtualenv, no pip install.
Kimi Code installed (~/.kimi-code/ exists).
A Litefuse project at https://litefuse.cloud with public + secret keys.

Download the collector script

mkdir -p ~/.kimi-code/hooks
curl -fsSL https://litefuse.ai/integrations/kimi-code/litefuse_hook.py \
  -o ~/.kimi-code/hooks/litefuse_hook.py
chmod +x ~/.kimi-code/hooks/litefuse_hook.py

The source is also browseable at the same URL — feel free to read it before deploying.

Configure `~/.kimi-code/litefuse.env`

The collector runs under launchd/cron with an empty environment, so credentials live in a dedicated env file that it reads on every run (no restart needed after edits):

cat > ~/.kimi-code/litefuse.env <<'EOF'
TRACE_TO_LITEFUSE=true
LITEFUSE_PUBLIC_KEY=pk-lf-xxx
LITEFUSE_SECRET_KEY=sk-lf-xxx
LITEFUSE_BASE_URL=https://litefuse.cloud
EOF
chmod 600 ~/.kimi-code/litefuse.env

Schedule the collector

macOS (launchd) — runs every 30 seconds:

cat > ~/Library/LaunchAgents/com.kimi.litefuse.plist <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key><string>com.kimi.litefuse</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/bin/env</string>
        <string>python3</string>
        <string>$HOME/.kimi-code/hooks/litefuse_hook.py</string>
    </array>
    <key>StartInterval</key><integer>30</integer>
    <key>RunAtLoad</key><true/>
</dict>
</plist>
EOF
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.kimi.litefuse.plist

Linux (cron) — runs every minute:

(crontab -l 2>/dev/null; echo "* * * * * python3 \$HOME/.kimi-code/hooks/litefuse_hook.py") | crontab -

Verify

Send a message in Kimi Code, wait for the answer plus one poll interval, then watch the collector log:

tail -f ~/.kimi-code/state/litefuse_hook.log
# Expected: "Emitted 1 turn(s) in X.XXs -> https://litefuse.cloud"

Open the project in Litefuse — each user turn becomes one trace with the structure shown above.

Note: on its first run the collector catches up on all existing sessions and emits one trace per past turn. If you want a clean slate instead, delete ~/.kimi-code/state/litefuse_state.json is not the way (that re-emits everything) — simply start tracing from a fresh Kimi session.

Upgrading from v1

The previous version of this hook used the Langfuse Python SDK and flattened every turn into synthetic Kimi Response (step N) generations. v2 needs no SDK and follows the shared trace structure:

Download the new script over the old one (back up first if you’ve customized it). The launchd/cron entry can stay — only the script path matters.
Keep ~/.kimi-code/litefuse.env; the same keys work. LANGFUSE_* names keep working as a fallback, but LITEFUSE_* takes precedence.
v2 renames observations (plan (n tools) #N / response instead of Kimi Response (step N), lowercase tool: bash (…) #N instead of Tool: Bash #N) and flattens metadata to agent_* keys — update any saved dashboard filters.
v2 only emits complete turns; if v1 left a half-consumed session behind, the first v2 run resynchronizes from its saved offset automatically.

Environment variables

All variables live in ~/.kimi-code/litefuse.env (or the process environment — the env file never overrides variables that are already set). LITEFUSE_* takes precedence; the equivalent LANGFUSE_* names are accepted as an ecosystem-compatible fallback.

Variable	Required	Description
`TRACE_TO_LITEFUSE`	Yes	Must be `true` for the collector to do anything.
`LITEFUSE_PUBLIC_KEY`	Yes	Litefuse project public key (`pk-lf-...`).
`LITEFUSE_SECRET_KEY`	Yes	Litefuse project secret key (`sk-lf-...`).
`LITEFUSE_BASE_URL`	No	Defaults to `https://litefuse.cloud`. Alias: `LITEFUSE_HOST`.
`LITEFUSE_TRACING_ENVIRONMENT`	No	Litefuse environment for emitted traces. Defaults to `production`; use `development` for experiments.
`LITEFUSE_USER_ID`	No	Overrides the trace `user_id`. Falls back to the OS username.
`LITEFUSE_EXTRA_TARGETS`	No	JSON array of extra targets (`[{"publicKey", "secretKey", "baseUrl", "environment"}]`) to double-write traces to (e.g. self-hosted + cloud).
`KIMI_LITEFUSE_DEBUG`	No	Set to `"true"` for verbose collector logging.
`KIMI_LITEFUSE_MAX_CHARS`	No	Truncation threshold (in characters) for span inputs/outputs. Default `1000000`.
`KIMI_LITEFUSE_STALE_MINUTES`	No	Idle minutes after which an unfinished turn is emitted as interrupted. Default `30`.
`KIMI_LITEFUSE_BATCH_BYTES`	No	OTLP request-body batch limit. Default `800000`; oversized batches also split-and-retry automatically on HTTP 413.
`KIMI_LITEFUSE_STATE_DIR`	No	Overrides the state directory (`~/.kimi-code/state`). Mainly for testing.

Metadata reference

All integration fields are flat top-level metadata keys with an agent_ prefix (shared across Litefuse agent integrations). Fields absent from the source data are omitted entirely, never padded with null.

Trace root: agent_turn_number, agent_session_id, agent_cwd, agent_model, agent_provider, agent_transcript_path, agent_api_calls, agent_tool_calls, agent_steps, agent_duration_ms; agent_image_blocks when the prompt contains media; agent_cancelled on cancelled turns; truncation markers (agent_prompt_truncated + _orig_len).

Generation: agent_turn_number, agent_step_index, agent_provider, agent_stop_reason, agent_api_duration_ms, agent_time_to_first_token_ms, agent_stream_duration_ms, agent_tool_call_count, agent_thinking_chars, agent_step_uuid, agent_input_scope, truncation markers.

Tool: agent_turn_number, agent_step_index, agent_plan_step (join key: tool.agent_plan_step == generation.agent_step_index), agent_tool_name (original casing, e.g. Bash), agent_tool_call_id, agent_duration_ms, agent_is_error, agent_subagent_id on delegations, truncation markers.

Subagent container: agent_subagent: true, agent_subagent_id, agent_subagent_type, agent_subagent_status, plus that run’s agent_api_calls / agent_tool_calls / agent_steps / agent_duration_ms.

How it works

On every scheduled run the script:

Loads ~/.kimi-code/litefuse.env, then lists all sessions from ~/.kimi-code/session_index.jsonl.
Reads new lines from each session’s agents/main/wire.jsonl since the last offset (state in ~/.kimi-code/state/litefuse_state.json, keyed by sha256(session_id::wire_path), guarded by a file lock).
Splits events into turns at turn.prompt boundaries and assembles steps (step.begin / content.part / tool.call / tool.result / step.end / usage.record).
Emits finished turns only (final end_turn, turn.cancel, superseded by a newer prompt, or stale past the idle threshold); the offset stops before any in-progress turn so it is re-read in full next time.
Resolves Agent-tool results to child wire.jsonl files and recursively expands subagent subtrees.
Sends everything as OTLP/HTTP JSON to <base_url>/api/public/otel/v1/traces (batched under the endpoint’s body-size limit with split-and-retry on 413, Basic auth, 10 s timeout). Trace headers ride on every span.

The collector is fail-open: any unexpected error is logged to ~/.kimi-code/state/litefuse_hook.log and the script exits 0, so it never affects Kimi Code itself.

Troubleshooting

No traces appear in Litefuse. Tail ~/.kimi-code/state/litefuse_hook.log. An empty log means the scheduler isn’t running the script — check launchctl print gui/$(id -u)/com.kimi.litefuse (or your crontab). A silent exit with no log lines usually means TRACE_TO_LITEFUSE isn’t true or keys are missing from ~/.kimi-code/litefuse.env. A send failed: line means keys or network.

The latest turn is missing. It probably hasn’t finished — the collector only uploads complete turns, then waits for the next 30 s poll. Multi-subagent turns can take minutes before anything appears.

A trace has a WARNING root saying “interrupted”. That turn genuinely never completed — Kimi was killed, restarted, or the LLM call hung (an expired login is a classic: the turn dies on the first call, you run /login and resend, and the dead attempt is recorded as its own interrupted trace). It is not a collection error.

Cost shows 0. Litefuse computes cost from the model name; add a price entry matching the model (e.g. kimi-code/kimi-for-coding) under Settings → Models in your Litefuse project.

Test the collector manually (uses a development environment so production stays clean):

LITEFUSE_TRACING_ENVIRONMENT="development" \
KIMI_LITEFUSE_DEBUG=true \
python3 ~/.kimi-code/hooks/litefuse_hook.py
tail ~/.kimi-code/state/litefuse_hook.log

Resources

Kimi Code
Litefuse Cloud
Collector script source: litefuse_hook.py

OpenCode Pi

Was this page helpful?

Support