Learn Python Series (#42) - Building CLI Applications

in #stemyesterday

Learn Python Series (#42) - Building CLI Applications

python-logo.png

Repository

What will I learn?

  • You will learn what makes a good CLI and how it differs from GUI design;
  • why argument parsing libraries exist and what problem they solve;
  • the mental model behind click's decorator approach vs argparse's imperative style;
  • building a multi-command CLI tool with click command groups, context passing, and file validation;
  • professional terminal output with rich: tables, progress bars, directory trees, and styled panels;
  • the dual-output pattern: pretty for humans, JSON for machines.

Requirements

  • A working modern computer running macOS, Windows or Ubuntu;
  • An installed Python 3(.11+) distribution;
  • The ambition to learn Python programming.

Difficulty

  • Intermediate

Curriculum (of the Learn Python Series):

Learn Python Series (#42) - Building CLI Applications

Command-line interfaces are everywhere in development. git, docker, npm, pip - powerful tools accessed through text commands. Building good CLIs requires understanding both technical implementation and user experience.

This episode is about CLI design and implementation - not just parsing arguments, but building tools developers enjoy using.

Nota bene: A good CLI respects conventions, provides helpful errors, and feels intuitive. Bad CLIs are technically correct but frustrating to use.

Why CLI UX matters

GUIs are discoverable. Menus and buttons show what's possible. CLIs aren't - you must know or guess commands.

This makes conventions critical. When a user types mytool --help, they expect help. When they use -v, they expect either "verbose" or "version". Breaking conventions creates friction.

Good CLIs follow the principle of least surprise. Common flags work as expected. Error messages are helpful, not cryptic. Output is readable.

The mental model: CLIs are conversations between user and tool. The user gives commands, the tool responds. Make that conversation natural.

The problem argument parsing solves

Without a parsing library, you handle sys.argv manually:

import sys
args = sys.argv[1:]  # Skip script name

Now what? Is args[0] a command, a flag, or a value? Does -v take a value or is it boolean? How do you handle --config=file.json vs --config file.json?

You end up writing parsing logic, validation, error handling, help text generation. This is undifferentiated work - every CLI needs it, it's not specific to your tool's purpose.

Argument parsing libraries solve this once. You declare what arguments exist, the library handles parsing and validation.

argparse: the standard library approach

Python includes argparse - comprehensive but verbose. You create a parser, add arguments declaratively, then parse:

parser = argparse.ArgumentParser(description="Process data")
parser.add_argument("input", help="Input file")
parser.add_argument("--format", choices=["json", "csv"], default="json")
args = parser.parse_args()

The parser handles: parsing sys.argv, validating choices, generating help text, providing error messages.

argparse is powerful but imperative. You build a parser object, configure it, then run it. This works but feels ceremonial for simple tools.

click: the decorator approach

Click takes a different philosophy - decorators turn functions into CLI commands:

@click.command()
@click.argument("input")
@click.option("--format", type=click.Choice(["json", "csv"]), default="json")
def process(input, format):
    """Process data from input file."""
    pass

Same functionality, less boilerplate. The function signature defines the CLI interface. Decorators add behavior.

This feels more Pythonic for simple tools - the code reads like what it does. For complex multi-command CLIs (like git), click's command groups shine.

When to use which

Use argparse when:

  • You're in a project already using it (consistency)
  • You need something in stdlib (no dependencies)
  • You have very complex argument patterns argparse handles better
  • You prefer imperative configuration

Use click when:

  • You're building a new CLI from scratch
  • You want cleaner decorator-based syntax
  • You need command groups (subcommands like git commit, git push)
  • You value developer experience (click is more pleasant to work with)

Both solve the same problem. Click is generally more ergonomic, but argparse is battle-tested and stdlib.

CLI conventions and best practices

Respect these conventions:

Help flags: -h and --help show usage. Always.

Version flag: --version shows version. Use semantic versioning.

Quiet/verbose: -q suppresses output, -v increases it. Often stackable (-vvv for very verbose).

Force flag: -f or --force skips confirmations. Use carefully.

Config files: Support --config path/to/file for complex configuration.

Environment variables: Allow critical options via env vars (API keys, endpoints).

POSIX compatibility: Use - for short flags, -- for long flags. Allow flag bundling (-rf = -r -f).

Breaking these creates confused users. They type --help and nothing happens? Frustration.

Output formatting: more than print()

Raw print() works but looks amateur. Professional CLIs format output thoughtfully:

Progress indication: For long operations, show progress bars or spinners. Silent CLIs feel broken.

Color sparingly: Highlight errors (red), success (green), warnings (yellow). Don't rainbow everything - it's distracting.

Tables for data: Aligned columns are readable. Library: rich provides beautiful tables with minimal code.

Structured output option: Support --json for machine-readable output. Makes your tool scriptable.

Respect NO_COLOR: Environment variable NO_COLOR disables colors. Respect it - CI environments often set this.

The goal: provide feedback without overwhelming the user.

Building a real CLI: click command groups

Let's move beyond toy examples and build a multi-command CLI tool — a file utility that handles multiple operations through subcommands, just like git has commit, push, pull as separate subcommands under one binary.

Click's @click.group() turns a function into a group that owns subcommands. Each subcommand is a separate function decorated with @group.command():

import click
import json
import csv
import hashlib
import os
from pathlib import Path

@click.group()
@click.version_option(version="1.0.0")
@click.option("-v", "--verbose", count=True, help="Increase output verbosity (-vv for debug)")
@click.pass_context
def cli(ctx, verbose):
    """fileutil - a Swiss army knife for file operations."""
    ctx.ensure_object(dict)
    ctx.obj["verbose"] = verbose

@cli.command()
@click.argument("path", type=click.Path(exists=True))
@click.option("--format", "fmt", type=click.Choice(["json", "text"]), default="text")
@click.option("--checksum/--no-checksum", default=False, help="Include SHA-256 hash")
@click.pass_context
def info(ctx, path, fmt, checksum):
    """Show detailed file information."""
    p = Path(path)
    stat = p.stat()
    data = {
        "name": p.name,
        "size_bytes": stat.st_size,
        "size_human": _human_size(stat.st_size),
        "type": "directory" if p.is_dir() else p.suffix or "no extension",
        "permissions": oct(stat.st_mode)[-3:],
    }

    if checksum and p.is_file():
        sha = hashlib.sha256(p.read_bytes()).hexdigest()
        data["sha256"] = sha

    if ctx.obj["verbose"] >= 1:
        data["absolute_path"] = str(p.resolve())
        data["is_symlink"] = p.is_symlink()

    if fmt == "json":
        click.echo(json.dumps(data, indent=2))
    else:
        for key, val in data.items():
            click.echo(f"  {key:.<20s} {val}")

@cli.command()
@click.argument("source", type=click.Path(exists=True))
@click.argument("dest", type=click.Path())
@click.option("--format", "fmt", type=click.Choice(["json", "csv"]),
              required=True, help="Target format")
@click.option("--overwrite", is_flag=True, help="Overwrite existing output file")
def convert(source, dest, fmt, overwrite):
    """Convert data files between JSON and CSV formats."""
    if Path(dest).exists() and not overwrite:
        raise click.ClickException(
            f"Output file '{dest}' already exists. Use --overwrite to replace."
        )

    source_data = Path(source).read_text()

    if fmt == "csv":
        records = json.loads(source_data)
        if not isinstance(records, list) or not records:
            raise click.ClickException("JSON source must be a non-empty list of objects.")
        with open(dest, "w", newline="") as f:
            writer = csv.DictWriter(f, fieldnames=records[0].keys())
            writer.writeheader()
            writer.writerows(records)
    elif fmt == "json":
        with open(source, newline="") as f:
            reader = csv.DictReader(f)
            records = list(reader)
        Path(dest).write_text(json.dumps(records, indent=2))

    click.echo(f"Converted {source}{dest} ({fmt}, {len(records)} records)")

@cli.command()
@click.argument("directory", type=click.Path(exists=True, file_okay=False))
@click.option("--ext", multiple=True, help="Filter by extension (repeatable: --ext .py --ext .md)")
@click.option("--min-size", type=int, default=0, help="Minimum file size in bytes")
@click.option("--sort", "sort_by", type=click.Choice(["name", "size", "modified"]),
              default="name")
def scan(directory, ext, min_size, sort_by):
    """Scan a directory and report file statistics."""
    root = Path(directory)
    files = [f for f in root.rglob("*") if f.is_file()]

    if ext:
        files = [f for f in files if f.suffix in ext]
    if min_size:
        files = [f for f in files if f.stat().st_size >= min_size]

    sort_keys = {
        "name": lambda f: f.name.lower(),
        "size": lambda f: f.stat().st_size,
        "modified": lambda f: f.stat().st_mtime,
    }
    files.sort(key=sort_keys[sort_by])

    total_size = sum(f.stat().st_size for f in files)
    click.echo(f"Found {len(files)} files ({_human_size(total_size)} total)\n")

    for f in files:
        size = _human_size(f.stat().st_size)
        rel = f.relative_to(root)
        click.echo(f"  {size:>8s}  {rel}")

def _human_size(nbytes):
    for unit in ["B", "KB", "MB", "GB"]:
        if nbytes < 1024:
            return f"{nbytes:.1f} {unit}"
        nbytes /= 1024
    return f"{nbytes:.1f} TB"

if __name__ == "__main__":
    cli()

Notice several things happening here. @click.pass_context threads state (like the verbose flag) from the group to subcommands without global variables. click.Path(exists=True) validates that a file exists before your code even runs — the error message is automatic and clear. The --ext option uses multiple=True, allowing --ext .py --ext .md to collect into a tuple. And click.ClickException provides clean user-facing error messages without stack traces.

Running this tool:

$ python fileutil.py --help
Usage: fileutil.py [OPTIONS] COMMAND [ARGS]...

  fileutil - a Swiss army knife for file operations.

Options:
  --version      Show the version and exit.
  -v, --verbose  Increase output verbosity (-vv for debug)
  --help         Show this message and exit.

Commands:
  convert  Convert data files between JSON and CSV formats.
  info     Show detailed file information.
  scan     Scan a directory and report file statistics.

$ python fileutil.py info --checksum mydata.json
  name.................. mydata.json
  size_bytes............ 4823
  size_human............ 4.7 KB
  type.................. .json
  permissions........... 644
  sha256................ a1b2c3d4e5...

$ python fileutil.py scan ./src --ext .py --sort size
Found 23 files (142.3 KB total)

     0.2 KB  __init__.py
     1.1 KB  config.py
     3.4 KB  utils.py
    12.8 KB  main.py

That's a properly structured CLI tool. The group/subcommand pattern scales to dozens of commands without becoming unwieldy.

Professional output with rich

print() gives you plain text. For CLI tools that humans interact with, plain text often isn't enough. The rich library transforms terminal output from functional to beautiful — with tables, colored text, progress bars, and tree views.

Install it: pip install rich

The core concept: rich's Console object replaces print() with styled output. But the real power is in its high-level components:

from rich.console import Console
from rich.table import Table
from rich.progress import Progress, SpinnerColumn, BarColumn, TextColumn
from rich.panel import Panel
from rich.tree import Tree
from rich import print as rprint
import time
import os

console = Console()

# === Tables ===
def show_process_table():
    """Display system info as a formatted table."""
    table = Table(title="Python Environment")
    table.add_column("Property", style="cyan", no_wrap=True)
    table.add_column("Value", style="green")

    import sys, platform
    rows = [
        ("Python version", sys.version.split()[0]),
        ("Platform", platform.system()),
        ("Architecture", platform.machine()),
        ("Executable", sys.executable),
        ("Working dir", os.getcwd()),
        ("PID", str(os.getpid())),
    ]
    for prop, val in rows:
        table.add_row(prop, val)

    console.print(table)

# === Progress bars ===
def process_files(file_list):
    """Process files with a detailed progress bar."""
    with Progress(
        SpinnerColumn(),
        TextColumn("[progress.description]{task.description}"),
        BarColumn(),
        TextColumn("[progress.percentage]{task.percentage:>3.0f}%"),
        TextColumn("({task.completed}/{task.total})"),
        console=console,
    ) as progress:
        task = progress.add_task("Processing files...", total=len(file_list))

        for filepath in file_list:
            progress.update(task, description=f"Processing {filepath.name}...")
            time.sleep(0.1)  # simulate work
            progress.advance(task)

    console.print("[bold green]✓[/] All files processed successfully.")

# === Styled error handling ===
def report_error(message, suggestion=None):
    """Display a formatted error panel."""
    body = f"[bold red]Error:[/] {message}"
    if suggestion:
        body += f"\n[dim]Suggestion: {suggestion}[/]"
    console.print(Panel(body, title="[red]Problem[/]", border_style="red"))

# === Directory trees ===
def show_directory_tree(path, max_depth=2):
    """Display directory structure as a tree."""
    root = Path(path)
    tree = Tree(f"[bold blue]{root.name}/[/]")

    def walk(directory, branch, depth=0):
        if depth >= max_depth:
            return
        try:
            entries = sorted(directory.iterdir(), key=lambda e: (e.is_file(), e.name))
        except PermissionError:
            branch.add("[red]Permission denied[/]")
            return

        for entry in entries:
            if entry.name.startswith("."):
                continue
            if entry.is_dir():
                sub = branch.add(f"[bold blue]{entry.name}/[/]")
                walk(entry, sub, depth + 1)
            else:
                size = _human_size(entry.stat().st_size)
                branch.add(f"{entry.name} [dim]({size})[/]")

    walk(root, tree)
    console.print(tree)

# === Combining rich with click ===
@click.command()
@click.argument("directory", type=click.Path(exists=True, file_okay=False))
@click.option("--depth", default=2, help="Maximum tree depth")
def tree_cmd(directory, depth):
    """Display a directory as a styled tree."""
    show_directory_tree(directory, max_depth=depth)

Rich's markup syntax ([bold red]text[/]) lets you inline styles without complex escape sequences. The Progress class handles all the terminal rewriting — drawing the bar, updating percentages, respecting terminal width. And Panel wraps content in a box that visually separates it from surrounding output.

One particularly powerful pattern: combining --json output for machines with rich output for humans. Same data, different presentation:

@click.command()
@click.option("--json-output", "as_json", is_flag=True, help="Output as JSON")
def status(as_json):
    """Show system status."""
    data = {
        "cpu_count": os.cpu_count(),
        "pid": os.getpid(),
        "cwd": os.getcwd(),
    }

    if as_json:
        import json
        click.echo(json.dumps(data))  # plain JSON, no colors, pipeable
    else:
        table = Table(title="System Status")
        table.add_column("Metric", style="cyan")
        table.add_column("Value", style="green")
        for k, v in data.items():
            table.add_row(k, str(v))
        console.print(table)

Run it as status and you get a pretty table. Pipe it as status --json-output | jq .cpu_count and downstream tools get clean JSON. That's the mark of a CLI built by someone who understands how these tools get used in practice — both interactively and in scripts.

Error handling in CLIs

CLIs should fail gracefully:

Validation errors: Show what's wrong and how to fix it. Not "Error", but "Error: --format must be 'json' or 'csv', got 'xml'".

Exit codes: 0 = success, non-zero = failure. Scripts check exit codes. Be consistent.

Helpful messages: If a file doesn't exist, suggest checking the path. If an API fails, show the HTTP status.

Stack traces: Only in verbose/debug mode. Normal errors should be friendly, not scary.

Good error messages turn failures into learning moments. Bad errors leave users stuck.

Bonus deep dive: building a plugin system for your CLI

Here's something most CLI tutorials never show you. Real-world tools like pip, pytest, and flask support plugins — third-party packages that register new subcommands without modifying the core tool. You install a plugin, and suddenly your CLI has new capabilities. How does that work?

The mechanism is Python's entry points system, combined with importlib.metadata for discovery and click's ability to dynamically register commands. Let's build it from scratch.

The architecture: your CLI defines a named entry point group (like "fileutil.plugins"). Any installed Python package can declare that it provides commands for that group. At startup, your CLI scans all installed packages, finds the ones that registered commands, loads them, and adds them as subcommands. Zero configuration. Zero imports. Just install a plugin package and it appears.

First, the plugin loader:

import click
import importlib.metadata
import importlib

PLUGIN_GROUP = "fileutil.plugins"

class PluginLoader:
    """Discover and load CLI plugins from installed packages."""

    def __init__(self, group=PLUGIN_GROUP):
        self.group = group
        self._plugins = {}

    def discover(self):
        """Find all installed plugins via entry points."""
        eps = importlib.metadata.entry_points()

        # Python 3.12+: eps is a dict-like; 3.9-3.11: use select()
        if hasattr(eps, "select"):
            plugin_eps = eps.select(group=self.group)
        else:
            plugin_eps = eps.get(self.group, [])

        for ep in plugin_eps:
            try:
                command = ep.load()  # imports the module and returns the object
                if isinstance(command, click.BaseCommand):
                    self._plugins[ep.name] = command
                else:
                    click.echo(f"Warning: plugin '{ep.name}' is not a click command", err=True)
            except Exception as e:
                click.echo(f"Warning: failed to load plugin '{ep.name}': {e}", err=True)

        return self._plugins

    def register_all(self, group):
        """Add all discovered plugins to a click group."""
        for name, cmd in self._plugins.items():
            group.add_command(cmd, name)

The ep.load() call is where the magic happens. It reads the entry point metadata ("mycommand = myplugin.cli:my_command"), imports myplugin.cli, and returns the my_command object — all through standard Python packaging infrastructure. No file system scanning, no naming conventions, no monkey-patching.

Now integrate it with our CLI:

@click.group()
@click.version_option(version="1.0.0")
@click.pass_context
def cli(ctx, **kwargs):
    """fileutil - extensible file operations tool."""
    ctx.ensure_object(dict)

# Register built-in commands
cli.add_command(info)
cli.add_command(convert)
cli.add_command(scan)

# Discover and register plugins
loader = PluginLoader()
plugins = loader.discover()
loader.register_all(cli)

if plugins:
    # Optional: show loaded plugins in verbose mode
    @cli.command(hidden=True)
    def plugins_list():
        """List loaded plugins."""
        for name in sorted(plugins):
            click.echo(f"  {name} (from {plugins[name].__module__})")

A plugin author creates a normal Python package. The only special part is the entry point declaration in pyproject.toml:

# pyproject.toml of a plugin package called "fileutil-encrypt"
[project]
name = "fileutil-encrypt"
version = "0.1.0"
dependencies = ["click", "cryptography"]

[project.entry-points."fileutil.plugins"]
encrypt = "fileutil_encrypt.cli:encrypt_cmd"
decrypt = "fileutil_encrypt.cli:decrypt_cmd"

And the plugin's code is just normal click commands:

# fileutil_encrypt/cli.py
import click
from pathlib import Path
from cryptography.fernet import Fernet

@click.command()
@click.argument("path", type=click.Path(exists=True))
@click.option("--key", envvar="FILEUTIL_KEY", help="Encryption key (or set FILEUTIL_KEY)")
def encrypt_cmd(path, key):
    """Encrypt a file using Fernet symmetric encryption."""
    if not key:
        key = Fernet.generate_key().decode()
        click.echo(f"Generated key (save this!): {key}")

    f = Fernet(key.encode() if isinstance(key, str) else key)
    data = Path(path).read_bytes()
    encrypted = f.encrypt(data)
    out_path = path + ".enc"
    Path(out_path).write_bytes(encrypted)
    click.echo(f"Encrypted: {path}{out_path}")

@click.command()
@click.argument("path", type=click.Path(exists=True))
@click.option("--key", envvar="FILEUTIL_KEY", required=True, help="Decryption key")
def decrypt_cmd(path, key):
    """Decrypt a previously encrypted file."""
    f = Fernet(key.encode())
    data = Path(path).read_bytes()
    decrypted = f.decrypt(data)
    out_path = path.removesuffix(".enc") if path.endswith(".enc") else path + ".dec"
    Path(out_path).write_bytes(decrypted)
    click.echo(f"Decrypted: {path}{out_path}")

After pip install fileutil-encrypt, the commands appear automatically:

$ python fileutil.py --help
Commands:
  convert  Convert data files between JSON and CSV formats.
  decrypt  Decrypt a previously encrypted file.
  encrypt  Encrypt a file using Fernet symmetric encryption.
  info     Show detailed file information.
  scan     Scan a directory and report file statistics.

No code changes to the core tool. No imports. The plugin registered itself through packaging metadata, and the loader picked it up at runtime.

This is the same mechanism pytest uses for plugins (pytest11 entry point group), flask uses for extensions, and pip uses for its own subcommands. It's production-grade infrastructure that separates core tool maintenance from plugin development — different people, different packages, different release cycles, one unified CLI.

Why does this matter beyond "it's clever"? Because it's a real-world application of several Python concepts working together: packaging and distribution (pyproject.toml), the import system (importlib.metadata), duck typing (any click command works, regardless of what package defined it), and the open-closed principle (the tool is open for extension but closed for modification). Understanding plugin architectures like this is what separates script writers from systems engineers.

What you should remember

In this episode, we covered CLI application development:

  • Why UX matters in CLIs despite being text-based — conventions like --help, -v, and exit codes exist for a reason
  • The problem argument parsing libraries solve (consistent parsing, validation, help generation)
  • argparse's imperative style vs click's decorator approach — both valid, click generally more ergonomic
  • Click command groups for building multi-command tools with shared options and context passing
  • Professional output formatting with rich: tables, progress bars, directory trees, and styled error panels
  • The dual-output pattern: pretty for humans (rich), JSON for machines (--json-output)
  • Plugin architectures using entry points and importlib.metadata for extensible CLIs — the same pattern pytest and flask use
  • Error handling that guides users instead of confusing them

Building CLIs isn't just parsing arguments. It's creating tools that respect user expectations, look professional, and work seamlessly in both interactive and scripted contexts.

If you made it this far, you're doing great. Thanks for your time!

@scipio

Sort:  

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

Consider setting @stemsocial as a beneficiary of this post's rewards if you would like to support the community and contribute to its mission of promoting science and education on Hive.