Kimi Code Tracing with Litefuse
Kimi Code is Moonshot AI’s terminal-based coding agent (the kimi CLI, running Kimi-k2 models). Kimi Code has no plugin or hook system, but it writes a complete event log — wire.jsonl — for every session. This integration is a polling collector: a single Python file with zero dependencies (standard library only) that runs every 30 seconds via launchd (macOS) or cron (Linux), parses new wire events, and ships spans straight to Litefuse’s OTLP endpoint. No Kimi Code source changes, no SDK, no virtualenv.
The collector emits one Litefuse trace per user turn: one generation per LLM API call, one tool observation per tool execution, and a full subtree for every subagent delegation.
For AI — automated install
If you’re chatting with an AI agent right now (Kimi Code itself works fine), paste this prompt and the agent will handle the whole install end-to-end:
Read https://litefuse.ai/SKILL.md and follow the instructions to install and configure Litefuse for Kimi Code.
The skill will ask for your Litefuse API keys (or walk you through signing up if you don’t have an account yet), then configure everything in place. For step-by-step manual setup instead, continue below.
What gets captured
| Data | Captured as | Notes |
|---|---|---|
| User prompt | trace input | text; image/video inputs leave a block count in metadata |
| Each LLM API call | generation observation | plan (n tools) #N / response / think #N, named after what the model did; thinking / text / toolCall block structure preserved in the output |
| Tool executions (input + output) | tool observation | tool: bash (grep) #N — key info in the name, full args in the input |
Subagents (Agent tool) | subtree | tool (1 subagent) #N → subagent container → the child’s own plan/tool/response steps, parsed from the child’s own wire.jsonl; child usage rolls up into the parent trace |
| Token usage | usage_details on generation | Anthropic-style keys (input / output / cache_read_input_tokens / cache_creation_input_tokens), mapped from Kimi’s inputOther / output / inputCacheRead / inputCacheCreation |
| Model name | model on generation | e.g. kimi-code/kimi-for-coding — used by Litefuse for cost computation |
| Time to first token | completion_start_time on generation | from Kimi’s llmFirstTokenLatencyMs |
Tool errors (isError) | tool observation, level=ERROR | with a status-message preview |
Cancelled turns (turn.cancel) | root span level=WARNING | status message turn cancelled by user |
| Interrupted turns | root span level=WARNING | a turn whose LLM call never completed (e.g. expired credentials, killed process) |
| Session grouping | trace session_id | Kimi Code session id (session_<uuid>) |
| User identity | trace user_id | $LITEFUSE_USER_ID, falls back to the OS username |
| Working directory | trace metadata (agent_cwd) | from Kimi Code’s session index |
Trace structure
A turn that delegates to a subagent produces a trace shaped like this (real example):
Kimi Code — Turn 7 (AGENT root span, trace headers)
├── plan (3 tools) #1 (generation — usage, real latency, TTFT)
├── tool: read (overview.txt) #2
├── tool: read (events.jsonl) #3
├── tool: read (queue.jsonl) #4
├── plan (1 tool) #5
├── tool (1 subagent) #6 (tool — the delegation as seen by the parent)
│ └── subagent (AGENT container — parsed from the child's wire.jsonl)
│ ├── plan (1 tool) #1 (child-local numbering restarts at #1)
│ ├── tool: read (overview.txt) #2
│ ├── plan (1 tool) #3
│ ├── tool: bash (grep) #4
│ └── subagent response (generation — the child's final answer)
└── response (generation — the final answer, ends the turn)Design notes:
- One trace per user turn, complete turns only. The collector only emits turns that have finished — the last step ended with
end_turn, the turn was cancelled, or a newer prompt has superseded it. An in-progress turn is left in place and re-read on the next poll, so a turn is never split across two traces and every span is sent exactly once. - Honest time semantics. Kimi’s
step.endtimestamp includes tool execution (it marks the end of the agent step, not the LLM call). The collector reconstructs the real LLM duration asstep.begin + llmFirstTokenLatencyMs + llmStreamDurationMs, and times each tool span from itstool.callevent to itstool.resultevent. Permission-approval wait shows up as a real gap between the generation and the tool — which is exactly what happened. - One step counter per agent container.
#Nis a single chronological sequence shared by generations and tools; each subagent container restarts at#1. A tool’sagent_plan_stepmetadata points at theagent_step_indexof the generation that requested it. - Subagent subtrees. The
Agenttool’s result carries anagent_id:header that the collector resolves to the child’s ownwire.jsonlunder<session>/agents/agent-<n>/. The delegation tool span deliberately wraps the container: tool-span duration − container duration = the real overhead of delegating. The resolution is recursive, though note that Kimi Code currently does not give subagents theAgenttool, so trees deeper than two agent levels cannot occur in practice. - Deterministic IDs. Trace and span IDs are derived from the session id, turn number, and event UUIDs — a re-run after a state reset upserts instead of duplicating.
- Flat
agent_*metadata. All integration fields live at the metadata top level with anagent_prefix (agent_step_index,agent_plan_step,agent_duration_ms…) — the same keys as every other Litefuse agent integration, so one dashboard query works across all of them. - Turn-local generation input.
wire.jsonldoes not record the full request payload of each API call, so generation inputs are reconstructed from the messages of the current turn (markedagent_input_scope: "turn"in metadata); cross-turn history and the system prompt are not included.
When do traces appear?
The collector polls every 30 seconds and only uploads finished turns, so a trace appears up to ~30 s after the turn’s final answer — nothing is visible mid-turn. Keep this in mind for long-running turns (a multi-subagent delegation can run for many minutes before anything shows up). This is a deliberate difference from event-driven integrations like Pi, which send each observation the moment it ends — Kimi Code offers no in-process extension point to do that.
Two special cases: a turn abandoned mid-flight is emitted (with a WARNING root) as soon as a newer prompt supersedes it, and a turn idle for more than 30 minutes (configurable) is emitted as interrupted.
Quick Start
Prerequisites
- Python ≥ 3.8 — any
python3works, including macOS’s system Python. The collector has zero third-party dependencies: no SDK, no virtualenv, no pip install. - Kimi Code installed (
~/.kimi-code/exists). - A Litefuse project at https://litefuse.cloud with public + secret keys.
Download the collector script
mkdir -p ~/.kimi-code/hooks
curl -fsSL https://litefuse.ai/integrations/kimi-code/litefuse_hook.py \
-o ~/.kimi-code/hooks/litefuse_hook.py
chmod +x ~/.kimi-code/hooks/litefuse_hook.pyThe source is also browseable at the same URL — feel free to read it before deploying.
Configure ~/.kimi-code/litefuse.env
The collector runs under launchd/cron with an empty environment, so credentials live in a dedicated env file that it reads on every run (no restart needed after edits):
cat > ~/.kimi-code/litefuse.env <<'EOF'
TRACE_TO_LITEFUSE=true
LITEFUSE_PUBLIC_KEY=pk-lf-xxx
LITEFUSE_SECRET_KEY=sk-lf-xxx
LITEFUSE_BASE_URL=https://litefuse.cloud
EOF
chmod 600 ~/.kimi-code/litefuse.envSchedule the collector
macOS (launchd) — runs every 30 seconds:
cat > ~/Library/LaunchAgents/com.kimi.litefuse.plist <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key><string>com.kimi.litefuse</string>
<key>ProgramArguments</key>
<array>
<string>/usr/bin/env</string>
<string>python3</string>
<string>$HOME/.kimi-code/hooks/litefuse_hook.py</string>
</array>
<key>StartInterval</key><integer>30</integer>
<key>RunAtLoad</key><true/>
</dict>
</plist>
EOF
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.kimi.litefuse.plistLinux (cron) — runs every minute:
(crontab -l 2>/dev/null; echo "* * * * * python3 \$HOME/.kimi-code/hooks/litefuse_hook.py") | crontab -Verify
Send a message in Kimi Code, wait for the answer plus one poll interval, then watch the collector log:
tail -f ~/.kimi-code/state/litefuse_hook.log
# Expected: "Emitted 1 turn(s) in X.XXs -> https://litefuse.cloud"Open the project in Litefuse — each user turn becomes one trace with the structure shown above.
Note: on its first run the collector catches up on all existing sessions and emits one trace per past turn. If you want a clean slate instead, delete ~/.kimi-code/state/litefuse_state.json is not the way (that re-emits everything) — simply start tracing from a fresh Kimi session.
Upgrading from v1
The previous version of this hook used the Langfuse Python SDK and flattened every turn into synthetic Kimi Response (step N) generations. v2 needs no SDK and follows the shared trace structure:
- Download the new script over the old one (back up first if you’ve customized it). The
launchd/cronentry can stay — only the script path matters. - Keep
~/.kimi-code/litefuse.env; the same keys work.LANGFUSE_*names keep working as a fallback, butLITEFUSE_*takes precedence. - v2 renames observations (
plan (n tools) #N/responseinstead ofKimi Response (step N), lowercasetool: bash (…) #Ninstead ofTool: Bash #N) and flattens metadata toagent_*keys — update any saved dashboard filters. - v2 only emits complete turns; if v1 left a half-consumed session behind, the first v2 run resynchronizes from its saved offset automatically.
Environment variables
All variables live in ~/.kimi-code/litefuse.env (or the process environment — the env file never overrides variables that are already set). LITEFUSE_* takes precedence; the equivalent LANGFUSE_* names are accepted as an ecosystem-compatible fallback.
| Variable | Required | Description |
|---|---|---|
TRACE_TO_LITEFUSE | Yes | Must be true for the collector to do anything. |
LITEFUSE_PUBLIC_KEY | Yes | Litefuse project public key (pk-lf-...). |
LITEFUSE_SECRET_KEY | Yes | Litefuse project secret key (sk-lf-...). |
LITEFUSE_BASE_URL | No | Defaults to https://litefuse.cloud. Alias: LITEFUSE_HOST. |
LITEFUSE_TRACING_ENVIRONMENT | No | Litefuse environment for emitted traces. Defaults to production; use development for experiments. |
LITEFUSE_USER_ID | No | Overrides the trace user_id. Falls back to the OS username. |
LITEFUSE_EXTRA_TARGETS | No | JSON array of extra targets ([{"publicKey", "secretKey", "baseUrl", "environment"}]) to double-write traces to (e.g. self-hosted + cloud). |
KIMI_LITEFUSE_DEBUG | No | Set to "true" for verbose collector logging. |
KIMI_LITEFUSE_MAX_CHARS | No | Truncation threshold (in characters) for span inputs/outputs. Default 1000000. |
KIMI_LITEFUSE_STALE_MINUTES | No | Idle minutes after which an unfinished turn is emitted as interrupted. Default 30. |
KIMI_LITEFUSE_BATCH_BYTES | No | OTLP request-body batch limit. Default 800000; oversized batches also split-and-retry automatically on HTTP 413. |
KIMI_LITEFUSE_STATE_DIR | No | Overrides the state directory (~/.kimi-code/state). Mainly for testing. |
Metadata reference
All integration fields are flat top-level metadata keys with an agent_ prefix (shared across Litefuse agent integrations). Fields absent from the source data are omitted entirely, never padded with null.
Trace root: agent_turn_number, agent_session_id, agent_cwd, agent_model, agent_provider, agent_transcript_path, agent_api_calls, agent_tool_calls, agent_steps, agent_duration_ms; agent_image_blocks when the prompt contains media; agent_cancelled on cancelled turns; truncation markers (agent_prompt_truncated + _orig_len).
Generation: agent_turn_number, agent_step_index, agent_provider, agent_stop_reason, agent_api_duration_ms, agent_time_to_first_token_ms, agent_stream_duration_ms, agent_tool_call_count, agent_thinking_chars, agent_step_uuid, agent_input_scope, truncation markers.
Tool: agent_turn_number, agent_step_index, agent_plan_step (join key: tool.agent_plan_step == generation.agent_step_index), agent_tool_name (original casing, e.g. Bash), agent_tool_call_id, agent_duration_ms, agent_is_error, agent_subagent_id on delegations, truncation markers.
Subagent container: agent_subagent: true, agent_subagent_id, agent_subagent_type, agent_subagent_status, plus that run’s agent_api_calls / agent_tool_calls / agent_steps / agent_duration_ms.
How it works
On every scheduled run the script:
- Loads
~/.kimi-code/litefuse.env, then lists all sessions from~/.kimi-code/session_index.jsonl. - Reads new lines from each session’s
agents/main/wire.jsonlsince the last offset (state in~/.kimi-code/state/litefuse_state.json, keyed bysha256(session_id::wire_path), guarded by a file lock). - Splits events into turns at
turn.promptboundaries and assembles steps (step.begin/content.part/tool.call/tool.result/step.end/usage.record). - Emits finished turns only (final
end_turn,turn.cancel, superseded by a newer prompt, or stale past the idle threshold); the offset stops before any in-progress turn so it is re-read in full next time. - Resolves
Agent-tool results to childwire.jsonlfiles and recursively expands subagent subtrees. - Sends everything as OTLP/HTTP JSON to
<base_url>/api/public/otel/v1/traces(batched under the endpoint’s body-size limit with split-and-retry on 413, Basic auth, 10 s timeout). Trace headers ride on every span.
The collector is fail-open: any unexpected error is logged to ~/.kimi-code/state/litefuse_hook.log and the script exits 0, so it never affects Kimi Code itself.
Troubleshooting
No traces appear in Litefuse. Tail ~/.kimi-code/state/litefuse_hook.log. An empty log means the scheduler isn’t running the script — check launchctl print gui/$(id -u)/com.kimi.litefuse (or your crontab). A silent exit with no log lines usually means TRACE_TO_LITEFUSE isn’t true or keys are missing from ~/.kimi-code/litefuse.env. A send failed: line means keys or network.
The latest turn is missing. It probably hasn’t finished — the collector only uploads complete turns, then waits for the next 30 s poll. Multi-subagent turns can take minutes before anything appears.
A trace has a WARNING root saying “interrupted”. That turn genuinely never completed — Kimi was killed, restarted, or the LLM call hung (an expired login is a classic: the turn dies on the first call, you run /login and resend, and the dead attempt is recorded as its own interrupted trace). It is not a collection error.
Cost shows 0. Litefuse computes cost from the model name; add a price entry matching the model (e.g. kimi-code/kimi-for-coding) under Settings → Models in your Litefuse project.
Test the collector manually (uses a development environment so production stays clean):
LITEFUSE_TRACING_ENVIRONMENT="development" \
KIMI_LITEFUSE_DEBUG=true \
python3 ~/.kimi-code/hooks/litefuse_hook.py
tail ~/.kimi-code/state/litefuse_hook.logResources
- Kimi Code
- Litefuse Cloud
- Collector script source:
litefuse_hook.py