AI agent enabled CLI tooling

AI agent enabled CLI tooling

Experiment

This is a Claude-generated document that I have read, but not edited. It captures the essence of a series of conversations about CLI tooling and AI agent tool use, but was not directly written by me.

Making CLIs speak natively to AI agents

There is a version of this story where the command line gets replaced. Where AI agents, armed with APIs and browser tools and function-calling interfaces, simply stop using terminals. Where the CLI becomes a legacy interface — tolerated, not designed for.

murli is a bet against that version.

The command line is not a relic. It is precise, composable, scriptable, auditable, and universal. Every serious piece of software ships one. The question is not whether agents will use CLIs — they already do, constantly, because the CLI is often the only interface that exposes the full capability of a tool. The question is whether those CLIs are any good at being used by agents.

Most of them are not. Not because they were built badly, but because they were built for humans. And humans and agents want the same things from a CLI in theory — speed, clarity, predictability — but express those needs very differently in practice.

murli is a thin Go library that wraps existing CLI frameworks and makes them speak natively to AI agents. It is not a new CLI framework. It is not a protocol. It does not try to replace anything. It adds a layer — output shaping, structured errors, runtime introspection, token efficiency — and gets out of the way.

This post explains why murli exists, what principles guide it, and where it is going.


Where it started

The original problem was simple and specific: agents running CLI tools were producing terrible results.

Not because the agents were bad at reasoning. Because the tools they were calling were built for human eyes. Progress bars that meant nothing to a parser. Error messages written for sympathy, not diagnosis. Help text designed for reading, not querying. Output that mixed data with decoration and dared anything downstream to separate them.

The first version of murli solved the most obvious problem: it looked at whether stdout was connected to a terminal, and if not, it switched to clean JSON. Humans got their formatted tables. Agents got structured data. That single change made a measurable difference.

But it also surfaced the deeper problem. The issue was not just output format. It was that CLIs, as a category, had never been designed with a second audience in mind. Every convention — how help text was written, how errors were phrased, how flags were named, how progress was reported — had been optimised entirely for human consumption. Agents were an afterthought at best.

murli started as a formatting fix and became a philosophy question: what would it mean to build a CLI that is genuinely good for both audiences?


The research

That question turned out to have a lot of recent thinking behind it.

By early 2026, a cluster of projects and posts had converged on essentially the same answer, arriving from different directions.

Cloudflare rebuilt Wrangler — its primary developer CLI — with explicit agent-first principles baked into the schema layer. Dane Knecht, Cloudflare’s CTO, framed the shift plainly: “Increasingly, agents are the primary customer of our APIs.” The practical output was a set of naming conventions enforced at the source: always get, never info; always --json, never --format=json; always --force, never --skip-confirmations. Consistent naming is not aesthetics — it is the difference between an agent that can generalise across tools and one that has to learn each tool from scratch.

Google published a Workspace CLI (gws) built agents-first from the ground up. The engineering post behind it — “You Need to Rewrite Your CLI for AI Agents” by Justin Poehnelt, a Senior Developer Relations Engineer at Google — is the most direct practitioner account of what that means in practice. The repo reached over 20,000 stars after a Hacker News front page. The core insight: agents do not read READMEs. They query schemas, parse output, and infer from structure. A CLI that cannot describe itself at runtime is a CLI that agents will use badly.

Trevin Chow published “10 Principles for Agent-Native CLIs”, the most complete synthesis of the emerging field. It extended an earlier defensive list — stop outputting noise, handle non-TTY, use --json — with a compounding-value layer: capability declaration, typed I/O contracts, profiles for stateful agents, async-aware execution. The principles read as a natural progression from “don’t break agents” to “actively help them.”

RTK (Rust Token Killer) approached the same problem from the cost angle. Its core observation: a typical agent session runs 60 or more shell commands. At ~3,500 raw tokens per command, that is 210,000 tokens of CLI output before the agent has done any reasoning. RTK measured compression rates on real commands — 91.8% noise reduction on cargo test, 80.8% on git status — and made the economic argument explicit: output noise is not an inconvenience, it is a budget problem.

Speakeasy documented the retrofit experience — what it looks like to add agent-native behaviour to an existing CLI rather than building one from scratch. Their retrospective covers the specific tensions: how do you add structured output without breaking scripts that depend on the current format? How do you version an output contract that was never meant to be a contract?

Anthropic published data on the token efficiency of tool design. Its Tool Search Tool — a mechanism for lazy-loading tool definitions rather than injecting all of them upfront — demonstrated an 85% reduction in token usage while maintaining full tool access. The principle transferred directly to CLI design: a tool that can describe its own capabilities on demand is cheaper to use than one that requires its full schema in every context.

Across these sources, five areas of clear agreement emerged:

  • Output decoupling: structured data for machines, formatted output for humans, automatic switching based on context.
  • Self-description: CLIs should be able to describe their own capabilities at runtime without external documentation.
  • Conventional vocabulary: consistent verb and flag naming across tools so agents can generalise.
  • Structured errors: errors should tell an agent what failed, why, and what to do next — not just that something went wrong.
  • Token efficiency: noise in CLI output is a real cost, and good design minimises it.

These five areas, expanded and sharpened, became the ten principles that murli is built around.


The ten principles

These are not rules murli enforces. They are a design framework it enables. The goal is to make following them easy and departing from them visible — not to block anyone from making different choices. A CLI that implements three of these well is better than a CLI that implements none. The principles exist to provide direction, not gates.

1. Two-audience output

Every command produces two renderings: one for human terminals, one for machine consumers. The default is automatic — murli detects whether stdout is a TTY and switches accordingly. Humans get formatted tables, colours, and progress indicators. Agents get clean JSON on stdout, diagnostics on stderr, and nothing mixed between the two.

The rule from Cloudflare’s schema layer applies here: the flag for opting into machine output is always --json, never --format=json, never --output json. Consistent naming across tools is itself a form of documentation.

2. Introspectable capability

Agents do not read READMEs. They query the tool. A CLI that cannot describe its own capabilities at runtime forces agents to rely on training data that may be stale, incomplete, or simply wrong.

murli adds per-command schema via --schema automatically. The roadmap adds a describe subcommand that dumps the entire command tree in one pass — auto-mounted, zero engineer effort. An agent running mytool describe gets every subcommand, every flag, every output shape, without ever consulting documentation.

3. Conventional vocabulary

The verbs and flags an agent encounters on one CLI shape its expectations of every CLI. Consistent vocabulary across tools compounds in value: get not fetch, list not show-all, delete not remove, --json not --format, --force not --skip-confirmations, --quiet not --silent.

murli cannot choose these names for you — they belong to the commands you are writing or wrapping. But it can make the right names easy to register consistently, and in later versions it surfaces advisory warnings when non-conventional names are detected.

4. Typed I/O contracts

Inputs and outputs have a defined shape. Flags are typed. Output structures are declared. An agent should be able to know, from the schema alone, what a command expects and what it returns — without running it first.

murli’s FlagSchema and ReturnSchema are the early expression of this. The roadmap moves them toward JSON Schema draft-2020-12 compatibility, so the output of --schema can be consumed directly by MCP hosts, OpenAI function-calling, and Anthropic tool-use interfaces without translation.

5. Token frugality

Agents pay per token. CLI output that was designed for human reading is expensive for machine consumption. Progress bars, repeated warnings, ANSI codes, verbose status messages — all of it has a cost that compounds across an agent session.

murli addresses this today with consecutive line deduplication in its logger — the simplest and highest-value compression primitive. The roadmap adds structured progress events (typed, not prose), NDJSON streaming for long-running operations, and a --quiet flag that suppresses everything except the final result.

6. Actionable structured errors

An error that says “something went wrong” is useless to an agent. An error that says what failed, why, and what to do next is a recoverable state.

murli’s AgentError type provides the baseline: a code, a message, a suggestion, and a recoverability flag. The roadmap adds the fields that turn errors into self-correcting signals: ValidValues (the agent knows what to try instead), RetryAfterMs (the agent knows when to retry), exit codes that distinguish timeout from permission failure from conflict.

7. Safe-by-default mutation

Agents should not accidentally destroy things. Any command that mutates state needs mechanisms that allow an agent to understand what it is about to do before doing it.

The baseline is --yes/--force for bypassing confirmation in non-TTY mode, and a non-interactive guard that fails fast with a recoverable error when confirmation is expected but cannot be obtained. The roadmap adds --dry-run with a typed plan envelope: what operations would run, what would be affected, what the risk level is. Safety metadata in the command schema — ReadOnly, Destructive, RiskTier — gives agents the information to make sensible decisions before invoking.

8. Stateful friendliness

Agents are not one-shot callers. They return to the same tool repeatedly, often with the same configuration and different parameters. A CLI that requires eight flags on every invocation is paying a tax on every agent session.

The answer is a profile system: named configurations that persist, discoverable via introspection, selectable with a single --profile flag. murli will auto-mount profile management commands in a later version. The engineer annotates which flags are profile-able; murli handles the storage and recall.

9. Async-aware execution

Not every command completes synchronously. Long-running operations — deployments, migrations, builds — need a way to report progress during execution, not just at the end.

murli adds WriteEvent() for NDJSON streaming to stdout: each intermediate state is one JSON object on one line, parseable as it arrives. Structured progress events replace free-text strings with typed fields: stage, current, total, percent, eta_ms. Context cancellation maps to a structured ExitCancelled response rather than a silent exit.

10. Stable contract

CLI output that is consumed by agents is an API. It deserves the same treatment: a version, a commitment to backwards compatibility, and explicit notice when breaking changes occur.

murli adds schema_version and tool_version to every envelope. The roadmap includes a conformance test suite — a Go package that any murli-wrapped CLI can run to verify its output contract is intact. Additive changes are non-breaking; anything that removes or renames a field in the envelope requires a version bump and a migration path.


How we apply them: enable, not enforce

The ten principles are not a checklist murli runs against your CLI. They are a framework it makes easy to follow.

Some things murli does automatically, requiring nothing from the engineer: TTY detection, output routing, line deduplication, structured error formatting, exit code conventions. These are on by default because there is no reasonable case for turning them off.

Some things murli guides: flag naming conventions, verb vocabulary, which flags to mark as profile-able. These are decisions that belong to the engineer — murli cannot know what delete means in the context of your tool, or whether --region is a per-call or per-session concern. It can surface the decision and provide the convention; it cannot make the choice.

Some things murli scaffolds: dry-run plan envelopes, idempotency key handling, structured progress events, typed examples in metadata. The infrastructure is there. The logic that populates it — what operations a dry run would perform, what a partial success looks like for this command — belongs to the application.

If you want to ignore all of this and just use murli for clean JSON output, that is fine. The library does not gate features behind compliance. It provides the surface; you decide how much of it to use.


What murli is not

Before the roadmap, one boundary worth drawing clearly.

Several of the more ambitious ideas that emerged during murli’s design — job ledgers, artifact lifecycle management, server-sent event streaming, protocol version negotiation over a channel — are MCP concepts. They are excellent ideas in the context of an MCP server. They are the wrong ideas for a CLI.

A CLI is a process. It starts, it does something, it exits. Its interface is flags and arguments in, text out. Its contract is exit codes and stdout. Adding a persistent daemon, a socket listener, or a stateful session to a CLI does not make it a better CLI — it makes it a worse MCP server.

The test murli applies: can this be expressed as a flag, a subcommand, or an output envelope field? If yes, it may belong. If it requires a running process between invocations, it belongs in a different layer.

MCP servers are not competitors to well-designed CLIs. They are a different tool for a different job. A well-designed CLI is, in fact, easier to wrap in an MCP server — or an A2A agent, or an ADK skill — than a poorly designed one. murli’s v1 makes the CLI good. The bridge to those other layers is a separate concern.


Roadmap

The roadmap splits into two chapters.

The first — v0.1 through v1.0 — is the CLI done right. Ten principles, applied cleanly, within the conventions of what a CLI actually is. No protocol emulation. No daemon processes. Just a CLI that is genuinely good for both humans and agents.

The second chapter — v2.x — is where murli begins to grow a considered bridge toward the agent ecosystem. Selected, appropriate borrowings from the MCP and A2A world, always expressed in CLI terms. That chapter does not start until v1.0 is solid.


Chapter 1: CLI done right (v0.1 → v1.0)

v0.1 — Current state (honest baseline)

murli today implements the defensive half of the ten principles. An engineer adds the library, wraps their CLI framework (Cobra, urfave/cli v2 or v3), and gets:

  • TTY auto-detection on stdout with automatic routing of WriteSuccess() and WriteError() to the appropriate format
  • stdout for data, stderr for diagnostics — enforced, not advisory
  • Structured AgentError with code, message, suggestion, and recoverability flag
  • NewUserError and NewToolError constructors with correct exit code semantics
  • Four exit codes: ExitOK (0), ExitUserError (1), ExitToolError (2), ExitPartial (3)
  • Per-command schema via --schema flag and EmitSchema()
  • Metadata, FlagSchema, ReturnSchema populated by the engineer via Annotate()
  • Consecutive log line deduplication in Logger
  • Metadata.Idempotent bool surfaced in schema output

What murli does not yet have: a whole-tool manifest, structured progress events, NDJSON streaming, an expanded exit code taxonomy, richer error fields, output contract versioning, or any of the compounding-value features.


v0.2 — Defensive layer complete

The goal for v0.2 is straightforward: close every gap that the field now considers table-stakes for agent-native CLIs. An engineer adding murli v0.2 to a new project should immediately satisfy the Composio “agent-CLI” criteria without additional work.

Batteries included (on by default):

  • --agent flag auto-registered on every command as an explicit agent-mode override, complementing the existing TTY detection
  • WriteEvent(any) added for NDJSON streaming — one minified JSON object per line to stdout; goroutine-safe; the foundation for async progress
  • Structured progress events emitted to stderr in agent mode: {stage, current, total, percent, eta_ms, message} — typed, not a free string
  • Logger emits {ts, level, msg, fields} NDJSON to stderr in agent mode; human-readable format unchanged for TTY
  • schema_version and tool_version auto-populated in every success, error, and event envelope
  • Exit codes 4–9 added: ExitTimeout, ExitNotFound, ExitPermission, ExitConflict, ExitRateLimited, ExitCancelled
  • Non-interactive guard: in non-TTY mode, mutating commands (marked via Metadata.Mutating: true) fail fast with a structured confirmation_required error rather than hanging on a prompt

Design choices (murli guides, engineer decides):

  • --json as the conventional explicit flag name — murli recommends it, the engineer registers it; advisory warnings in dev mode when non-conventional names are used
  • Metadata.Mutating bool annotation — the engineer marks which commands mutate state; murli uses this to drive the non-interactive guard

Code required (murli scaffolds, engineer builds):

  • AgentError gains ValidValues []string, RetryAfterMs int, DocURL string, Field string — available to populate; none required

v0.3 — Introspection and contracts

v0.3 completes the self-description story and moves I/O contracts toward ecosystem interoperability.

Batteries included:

  • describe subcommand auto-mounted on the root command — returns the full command tree as a single JSON document; zero engineer effort required; same mechanism as --schema auto-registration today
  • describe output includes a capabilities block auto-populated by murli: {streaming, dry_run, output_formats, schema_version, tool_version} — reflects what the binary actually supports
  • --output flag auto-registered supporting json|ndjson|yaml|text; writer routes accordingly; default is json in agent mode
  • --protocol-version flag auto-registered; murli emits older envelope shapes for older agents; negotiation result included in describe output
  • ANSI codes stripped automatically in non-TTY mode (hardened; currently implicit)

Design choices:

  • Naming convention advisory: dev-mode stderr warning when non-conventional verbs or flag names are detected; suppressed in agent mode; never blocks
  • conventions block in describe output lists the recommended vocabulary — agents can use it; engineers can read it

Code required:

  • FlagSchema gains Env, Sensitive, Persistent, MutuallyExclusiveWith, Enum, Pattern fields; engineer populates via Annotate(); murli surfaces in --schema and describe
  • Metadata.Examples accepts []Example{Command, Description, ExpectedExitCode} structs; engineer writes examples; murli validates format and includes in schema output
  • --schema and describe emit JSON Schema draft-2020-12 for flags and return types when OutputSchema *jsonschema.Schema is populated; engineer provides the schema object; murli serialises it

v0.4 — Safe mutation and async

v0.4 adds the infrastructure for commands that mutate state and operations that take time.

Batteries included:

  • --dry-run flag auto-registered on commands marked Metadata.Mutating: true; WritePlan() helper emits the plan envelope to stdout
  • --yes / --force flag auto-registered on mutating commands with conventional semantics; bypass of the non-interactive guard
  • context.Context plumbed through Writer; SIGINT maps to ExitCancelled with a structured error envelope; in-flight WriteEvent() calls flush before exit
  • SafetyMetadata block in describe output: ReadOnly, Idempotent, Destructive, Reversible, RiskTier — auto-inferred where possible (HTTP-GET-mapped commands = ReadOnly: true); engineer fills the rest

Design choices:

  • Metadata.Safety SafetyMetadata annotation — engineer marks the fields murli cannot infer; advisory lint in dev mode for mutating commands missing RiskTier

Code required:

  • PlanEnvelope type provided: {operations: [], paths_touched: [], risk_tier: ""}; engineer populates the operations list in their handler; WritePlan() handles serialisation and routing
  • --idempotency-key flag available to register on commands that support it; Writer.IdempotencyKey() returns the value for the engineer to pass downstream

v0.5 — Stateful friendliness

v0.5 adds profiles: the infrastructure that makes repeated agent use cheaper.

Batteries included:

  • profile save|use|list|show|delete subcommands auto-mounted on the root command
  • --profile <name> flag auto-registered at the root level
  • Profile storage at ~/.<cli>/profiles.json; format is stable and human-readable
  • Profiles surfaced in describe output so agents can discover and select them without reading documentation

Design choices:

  • Engineer annotates flags with Profileable: true in FlagSchema; murli stores and restores only those flags; default is false — opt-in, not opt-out

Code required:

Nothing for the profile system itself. If the engineer wants custom profile validation or storage location overrides, hooks are provided.


v1.0 — Stable contract

v1.0 freezes the output contract and commits to it.

Batteries included:

  • schema_version: "1.0" frozen; subsequent breaking changes require 2.0 and a documented migration path
  • murli/conformance package: a standalone Go test suite any murli-wrapped CLI can run to verify output contract compliance; covers envelope shape, exit code semantics, error field presence, schema validity
  • Golden-file regression tests for every envelope shape across every adapter (Cobra, urfave v2, urfave v3); failures in CI mean a contract break
  • Dev-mode lint summary: a mytool doctor command auto-mounted in dev builds that runs the conformance checks locally and reports missing annotations, non-conventional names, and unset safety metadata
  • AGENTS.md stub auto-generated by describe --agents-md; skeleton only — engineer reviews and extends; sized to stay under ~500 tokens

Design choices:

  • Semver commitment on the output contract is a project decision, not a library feature; murli provides the tooling (conformance suite, schema versioning); the engineer commits to the discipline

Chapter 2: The agent bridge (v2.x)

v2.x does not start until v1.0 is stable. The test for whether something belongs here: can it be expressed as a flag, a subcommand, or an output envelope field? If it requires a running process between invocations, it belongs in a different layer.

The v2.x chapter is exploratory. These are directions, not commitments.

v2.1 — MCP-compatible schema output

The describe command’s output, already in JSON with typed schemas, is adapted to be directly consumable by MCP hosts. An MCP adapter can pipe mytool describe and generate a working MCP server from it with no manual mapping. murli does not become an MCP server; it makes wrapping one trivial.

v2.2 — A2A AgentCard emission

describe --agent-card emits an A2A-compatible AgentCard JSON document from the same metadata that drives describe. The AgentCard format — name, description, capabilities, skills, authentication — maps cleanly onto what murli already tracks. This is a serialisation choice, not a protocol implementation.

v2.3 — Skill co-emission

describe --skill emits a structured skill definition consumable by agent orchestrators (Claude’s skills format, Vercel’s skills format). Generated from the same metadata. Engineers can override sections; murli generates the skeleton. The skill stays under ~5,000 tokens — sized for the constraint, not padded to fill it.

v2.4 — Multi-language specification

murli is a Go library. The principles it implements are language-agnostic. v2.4 publishes a formal specification — not implementations — of the murli conventions as they would apply to Python/Click, Node/Commander, Rust/Clap, and other major CLI frameworks. The Go library remains the reference implementation. Ports follow the spec.


A note on scope

murli’s scope is the CLI. Not the MCP server. Not the agent orchestrator. Not the function-calling interface.

This is a choice, not a limitation. The CLI is the interface that every serious tool ships. It is the interface that already works everywhere, that requires no additional infrastructure, that integrates naturally into every agent loop. Making it better is a high-leverage, low-friction intervention.

The bet murli makes is that a CLI done right — genuinely good for both humans and agents, honest about what it is, disciplined about its output contract — is more useful than a CLI that has grown protocol machinery in an attempt to become something it is not.

Build the CLI well. Let the bridges to other layers be built from that solid foundation.


murli is open source at github.com/allank/murli.