Metrics¶

Mooncake exposes live system utilization — CPU, GPU, memory, load, network — as a metrics surface separate from facts.

Facts vs. metrics:

Facts describe what the machine is: cores, total memory, installed tools, package manager. Cached for the lifetime of the process.
Metrics describe what it's doing right now: how busy the CPU is, how much VRAM is used, current load average. Sampled on demand with short TTLs.

Both flow into the same variable namespace, so a when: expression can read either kind without caring which:

- name: Only train if the GPU is idle
  shell: python train.py
  when: gpu_usage_pct < 20

- name: Only run heavy installs on multi-core boxes
  apt: { name: build-essential, state: present }
  when: cpu_cores >= 8 and load_avg_1m < 4

CLI¶

mooncake metrics                              # full JSON/text dump
mooncake metrics --format json
mooncake metrics --query cpu_usage_pct
mooncake metrics --query cpu_usage_pct --query load_avg_1m
mooncake metrics --fields cpu_usage_pct,gpus_metrics
mooncake metrics --refresh                    # force re-sample, bypass TTL

--fields filters the output to specified keys and adds a sibling _collected_at map of each key's last-sample timestamp (RFC3339) so callers can see freshness without leaking TTL internals.

Available metrics¶

Each metric has a TTL — within that window, repeated reads serve the cached value rather than re-sampling. TTLs are tuned for the kinds of decisions agents make on these values.

Key	Type	Description	TTL
`cpu_usage_pct`	float (0–100)	System-wide CPU utilization	2s
`cpu_usage_per_core`	[]float	Per-core CPU utilization (Linux only)	2s
`load_avg_1m`	float	1-minute load average	5s
`load_avg_5m`	float	5-minute load average	5s
`load_avg_15m`	float	15-minute load average	5s
`memory_used_mb`	int64	Resident memory in MB	5s
`memory_used_pct`	float (0–100)	Resident memory as % of total	5s
`swap_used_mb`	int64	Swap used in MB	5s
`net_rx_bps`	int64	Bytes/sec received (non-loopback)	2s
`net_tx_bps`	int64	Bytes/sec transmitted (non-loopback)	2s
`gpus_metrics`	array	Per-GPU live metrics (NVIDIA on Linux only)	2s
`temperatures`	array	Hardware temperature sensors (Linux only — hwmon)	2s
`cpu_temp_c`	float	Derived CPU package temperature in °C, 0 if unavailable	2s

Each temperatures entry has:

{
  "Chip": "coretemp",
  "Label": "Package id 0",
  "TempC": 50,
  "CritC": 105
}

The collector reads /sys/class/hwmon/ (the canonical Linux sensor surface that lm_sensors exposes), so anything visible there — CPU package and per-core temps, NVMe SSD, WiFi card, motherboard sensors — shows up here.

cpu_temp_c is a derived convenience picking the most authoritative CPU sensor available. Priority order (Linux):

AMD Tctl (throttle-control temp from k10temp / zenpower)
Intel Package id 0 (from coretemp)
AMD Tdie (die temp fallback)
max(Core *) when only per-core sensors are exposed
cpu_thermal (ARM)

Returns 0 when no CPU sensor is detectable.

macOS¶

On macOS, temperatures come from powermetrics --samplers smc, which requires root. Two modes:

User-shell invocation (mooncake metrics) — no root, no temps. The collector silently returns an empty array rather than prompting for a password (interactive sudo from a metrics call would be terrible UX).
Daemon invocation (Spec 18 mooncaked) — the agent daemon runs as root, so temperatures come through automatically. This is the common case for fleet observability.

The parser extracts SMC entries with chip smc and label CPU die, GPU die, CPU heat sink, Battery, etc. cpu_temp_c prefers CPU die, falling back to CPU heat sink.

Apple Silicon caveat: Apple's powermetrics does not expose die temperatures on M-series chips — the SMC sampler returns thermal pressure state only. Expect temperatures to be empty and cpu_temp_c to be 0 even when running as root on an Apple Silicon Mac. The underlying data is available via private IOReport APIs (used by asitop, mactop, stats) but requires cgo and is out of scope for v1.

Each gpus_metrics entry has:

{
  "Index": 0,
  "UsagePct": 87,
  "MemoryUsedMB": 6400,
  "MemoryUsedPct": 78.1,
  "TemperatureC": 72
}

Index matches the corresponding entry in the static gpus fact, so you can correlate name/driver/model from facts with live load from metrics.

Sampling notes¶

CPU: 100ms sample window on Linux. First read costs ~100ms; subsequent reads within TTL are free.
Network: 1s sample window on both Linux and macOS. The collector measures a delta to compute bytes/sec.
GPU: NVIDIA only in v1, on Linux. Requires nvidia-smi in PATH. If nvidia-smi is present but no GPUs are visible (driver not loaded), the array is empty rather than missing.
macOS per-core CPU: not exposed in v1 (top doesn't print it without cgo). cpu_usage_pct is still accurate; cpu_usage_per_core is empty.
Apple Silicon GPU: not in v1. powermetrics needs root and is hostile to parse.

Polling pattern (daemon / fleet)¶

When mooncake agent is running, an external client can poll metrics via the get_metrics MCP tool:

{
  "method": "tools/call",
  "params": {
    "name": "get_metrics",
    "arguments": {
      "fields": ["cpu_usage_pct", "gpus_metrics"],
      "refresh": false
    }
  }
}

Set refresh: true to bypass TTL — useful when you've just kicked off a job and want a clean baseline.

The response includes _collected_at with a per-key timestamp so the agent can decide whether the cached value is fresh enough for its decision.