Resource Tracker in Rust: Single-Binary Monitoring

Gergely Daroczi profile picture

Published on Apr 23, 2026 by Gergely Daroczi. Read time: 0 mins.

When we released the Spare Cores Resource Tracker as a Python package last year, the feedback was great -- but two things kept nagging at us.

The first was the CLI wrapper story. The Python tracker works well when you are already inside a Python process, but wrapping an arbitrary binary -- say a compiled C++ simulation or a Rust ML model training application -- means spawning Python just to babysit another process. That felt upside-down.

The second was GPU monitoring. The Python implementation collected GPU metrics by periodically calling out to nvidia-smi pmon via subprocess. At a one-second sampling interval that overhead is tolerable; push it to sub-second and the subprocess spawning starts showing up in the numbers you are trying to measure. We wanted to talk to NVML directly, which is not something you do comfortably from Python without pulling in heavier dependencies.

So we built resource-tracker-rs, a Rust port that compiles down to a single ~2 MB binary with no runtime dependencies beyond a reasonably recent (e.g. no more than 10 years old) glibc. Drop the binary onto any Linux machine, point it at your process, and you're done.

Why Rust? #

Honestly, Go would have been the more obvious choice for a "just ship a binary" tool, but Go binaries carry a non-trivial runtime and tend to land in the 10-20 MB range for something like this -- we wanted to stay under 2 MB. Rust gave us that, plus near-zero overhead polling (important when you're measuring a process, not just watching it), and the ability to call into NVML and the AMD GPU libraries directly.

The result is almost feature-equivalent to the Python version, with CPU, memory, disk, and network metrics at both the system level and the per-process-tree level -- with a couple of additions we could not easily back-port to Python without pulling in more dependencies.

Getting Started #

Precompiled binaries are published as GitHub Releases for both x86_64 and arm64. Download, mark executable, and run:

# auto-guess target architecture
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/aarch64/arm64/')"

# find most recent release
URL="$(curl -fsSL https://api.github.com/repos/SpareCores/resource-tracker-rs/releases/latest | sed -n "s#.*\"browser_download_url\": \"\\([^\"]*resource-tracker-[^\"]*-linux-${ARCH}.tar.gz\\)\".*#\\1#p" | sed -n '1p')"

# download and extract
curl -fsSL "$URL" | tar -xvzf - resource-tracker

# run
./resource-tracker --help

No Python, no pip, no virtual environment. The glibc (minimum 2.17) requirement means it works out of the box on any modern distro -- Debian, Ubuntu, RHEL, Amazon Linux, and so on.

We also experimented with statically-linked builds using musl instead of glibc for better support for truly minimal environments like Alpine, but GPU monitoring requires dynamic linking to the NVIDIA and AMD GPU libraries, which is not supported by musl.

CLI Usage #

The most common pattern is the shell-wrapper mode: pass your command (preferably explicitly after -- to avoid shell expansion) and the tracker monitors the child process tree from start to finish, by default using a one-second interval, then exits with the same exit code as the child process so it is transparent to CI and job schedulers:

./resource-tracker python train.py --epochs 50

Each second a JSON line is emitted to stderr (or to a file with --output):

{
  "cpu": {
    "per_core_pct": [7.65, 0.0, 2.53, 0.0, 5.12, 1.0, 6.06, 0.0],
    "process_child_count": null,
    "process_cores_used": null,
    "process_count": 628,
    "process_disk_read_bytes": null,
    "process_disk_write_bytes": null,
    "process_gpu_usage": 0.1,
    "process_gpu_utilized": 1,
    "process_gpu_vram_mib": 811.69,
    "process_rss_mib": null,
    "process_stime_secs": null,
    "process_utime_secs": null,
    "stime_secs": 0.39,
    "utilization_pct": 0.43,
    "utime_secs": 0.47
  },
  "disk": [{...}, {...}, ...],
  "network": [{...}, {...}, ...],
  "gpu": [{...}, {...}, ...],
  "memory": {
    "active_mib": 25233,
    "available_mib": 72990,
    "buffers_mib": 4,
    "cached_mib": 24943,
    "free_mib": 51127,
    "inactive_mib": 12526,
    "swap_total_mib": 0,
    "swap_used_mib": 0,
    "swap_used_pct": 0.0,
    "total_mib": 96313,
    "used_mib": 20239,
    "used_pct": 21.01
  }
}

If you prefer tracking an already-running process, pass its PID with --pid. The tracker will walk the full /proc tree and attribute CPU usage from the root PID down through all its descendants, which means multi-process workloads like PyTorch data-loader workers or Spark executors are attributed correctly under a single root.

For recurring jobs a small TOML config file next to the job definition is cleaner than repeating flags every time:

[job]
name = "nightly-feature-pipeline"

[tracker]
interval_secs = 5

Alternatively, all CLI flags can be set via environment variables as well.

Streaming to Sentinel #

For teams running many batch jobs across multiple machines, tailing a local JSONL file per run does not scale well. The tracker has optional streaming built in: set the SENTINEL_API_TOKEN environment variable and every run is registered with the Spare Cores Sentinel service. Metrics are batched, gzip-compressed, and uploaded to S3 in the background every 60 seconds (configurable via TRACKER_UPLOAD_INTERVAL). On exit, the final batch is flushed inline so nothing is lost.

export SENTINEL_API_TOKEN="your-token-here"
export TRACKER_JOB_NAME="gpu-benchmark"
./resource-tracker python train.py

From there, Sentinel aggregates the runs centrally and surfaces right-sizing recommendations -- the same idea as the Python tracker's recommend_server() call, but for the whole team and without any per-job instrumentation code.

Next Steps #

The full Usage Guide covers all CLI flags, the TOML config reference, output format details, and more shell-wrapper examples. The source code is on GitHub, as usual, under MPL-2.0.

Thanks to Greneta Solutions and Avram Aelony, who actually worked on the Rust implementation, for their contributions to the project! 🙇

Stay tuned for more details on the Sentinel service! It is currently in closed beta -- if you want early access, leave a note in the comments or reach out directly.