Skip to content

Role: caddy

Caddy reverse proxy on the edge LXC (CT 107, 192.168.1.244). Single static binary, Caddyfile templated from inventory, automatic HTTPS via Let's Encrypt DNS-01 against Cloudflare, no Docker.

Hosts: edge (only).

Phase 5D executed — 2026-05-02

Role landed alongside the edge LXC bring-up. Caddy serving four hosts (requests, dash, n8n, kosync) with a single LE wildcard *.rampancy.cloud. Replaces the hand-clicked NPM install on retired nginx VM 102. See edge-cutover runbook for the cutover procedure and execution-time lessons.

Validation gap

Apply succeeded against edge from WSL on 2026-05-02. Site-wide --check --diff from CT 104 has not yet run as of this docs commit — pending the homelab-ansible commit + CT 104 pull. Will report changed=0 once code lands on the drift runner.

Architecture

flowchart LR
    Internet[Public client] --> UDM[UDM<br/>port-forward 80/443]
    UDM --> Caddy[Caddy on edge<br/>192.168.1.244]
    Caddy -->|TLS terminate<br/>LE wildcard| n8n[n8n VM<br/>:5678]
    Caddy --> beszel[Beszel hub<br/>:8090]
    Caddy --> seerr[Overseerr<br/>arrstack:5055]
    Caddy --> kosync[korrosync<br/>arrstack:3030]
    Caddy <-.->|ACME DNS-01| CF[Cloudflare API<br/>TXT record]

Cert issuance and renewal happen via Cloudflare DNS-01 — Caddy POSTs a TXT record to _acme-challenge.rampancy.cloud, Let's Encrypt validates, Caddy deletes the record. No inbound port 80 dependency for renewal.

Tasks

Task Tag
Ensure caddy group + user (system, nologin) caddy, install
Create /etc/caddy, /var/lib/caddy, /var/log/caddy caddy, install
Install pinned binary from roles/caddy/files/caddy-<version> caddy, install
Render systemd unit (no --environ — see Gotchas) caddy, service
Render /etc/caddy/caddy.env with vaulted CF token (no_log: true) caddy, config
Render /etc/caddy/Caddyfile from inventory caddy, config
Validate Caddyfile (with env var passed via environment:) caddy, config
Enable + start caddy.service caddy, service

Key variables

Defaults at roles/caddy/defaults/main.yml. Per-host overrides at inventory/host_vars/edge.yml.

Variable Source Default / value
caddy_version defaults 2.11.2
caddy_build_revision defaults cs1 (bump when xcaddy plugin set changes; see Binary build flow)
caddy_binary_local defaults caddy-{{ caddy_version }}-{{ caddy_build_revision }} (in roles/caddy/files/)
caddy_install_dir defaults /usr/local/bin
caddy_config_dir defaults /etc/caddy
caddy_data_dir defaults /var/lib/caddy
caddy_acme_email defaults newlingd@gmail.com
caddy_acme_dns_provider defaults cloudflare
caddy_listen_alt_ports defaults true (bind 8080/8443) — flipped to false in host_vars/edge.yml post-cutover
caddy_cf_dns_token_var defaults vault_caddy_cf_dns_token
caddy_crowdsec_enabled defaults false — flipped per-host. Gates the bouncer block in Caddyfile.j2 and the CROWDSEC_BOUNCER_API_KEY line in caddy.env.j2.
caddy_crowdsec_bouncer_key_var defaults vault_caddy_crowdsec_bouncer_key
caddy_crowdsec_lapi_url defaults http://127.0.0.1:8080 — stock LAPI listen address. The agent's local_api_credentials.yaml is hardcoded to this by the installer; moving LAPI requires re-templating that too, so don't optimise for hypothetical alt-port collisions.
caddy_zone host_vars rampancy.cloud
caddy_proxy_hosts host_vars list of {name, fqdn, upstream_scheme, upstream_host, upstream_port}
caddy_matrix_enabled defaults false — flipped to true on edge for Phase 6E. Gates an apex {{ caddy_zone }} block (well-known matrix delegation) plus a matrix.{{ caddy_zone }} block in the template, neither expressible via caddy_proxy_hosts (apex needs its own cert; matrix subdomain needs path-routed CrowdSec exclusion on /_matrix/federation/*).
caddy_matrix_upstream host_vars 192.168.1.243:81 on edge — single upstream because matrix_federation_public_port: 443 in the matrix-deploy vars.yml collapses Traefik's federation entrypoint into the web entrypoint, so client + federation share the host-bind port.

The vaulted CF API token (vault_caddy_cf_dns_token) is scoped to Zone:Read + DNS:Edit on rampancy.cloud and nothing else.

Binary build flow

The binary is not built on the target host. Pre-built on WSL via xcaddy, version-pinned, committed to roles/caddy/files/:

mkdir -p ~/build/caddy && cd ~/build/caddy
docker run --rm -v "$PWD:/output" -w /output caddy:2.11.2-builder \
    xcaddy build v2.11.2 \
        --with github.com/caddy-dns/cloudflare \
        --with github.com/hslatman/caddy-crowdsec-bouncer/http \
        --output caddy
./caddy version                          # confirm v2.11.2
./caddy list-modules | grep cloudflare   # confirm dns.providers.cloudflare
./caddy list-modules | grep crowdsec     # confirm http.handlers.crowdsec + crowdsec app

cp caddy ~/homelab-ansible/roles/caddy/files/caddy-2.11.2-cs1

Why pre-built rather than xcaddy on edge: - Reproducibility — committed binary is bit-identical across deploys - Speed — xcaddy build on a 1 GiB LXC is slow or OOMs during go build - Footprint — no Go toolchain on edge (~500 MB saved)

The -cs1 filename suffix is caddy_build_revision — bumped when the xcaddy plugin set changes (e.g. adding the CrowdSec bouncer module for Phase 7D). Caddy upstream version stays accurate; the suffix is the audit trail for plugin-set changes. Ansible's copy module would also detect content changes via checksum, but a new filename makes the rebuild visible in git diff.

To bump Caddy: rebuild via xcaddy with a new version, drop the binary into roles/caddy/files/, bump caddy_version in defaults/main.yml, re-apply. The role's Install caddy binary task will detect the change and notify the Restart caddy handler.

To change the plugin set without bumping Caddy: rebuild with the new --with flags, drop the new file into roles/caddy/files/, bump caddy_build_revision in defaults/main.yml. Old binary stays in tree for one revision (handy for rollback).

Caddyfile template

roles/caddy/templates/Caddyfile.j2 produces (with caddy_crowdsec_enabled: true on edge for Phase 7D):

{
    email newlingd@gmail.com
    acme_dns cloudflare {env.CADDY_CLOUDFLARE_API_TOKEN}

    # CrowdSec bouncer. order is required -- the directive isn't in
    # Caddy's standard order list. api_key uses {$ENV} (parse-time
    # substitution) -- {env.X} ends up as a literal in this module's
    # loaded config (verified via /config/ admin API).
    order crowdsec first
    crowdsec {
        api_url http://127.0.0.1:8080
        api_key {$CROWDSEC_BOUNCER_API_KEY}
        ticker_interval 15s
    }
}

*.rampancy.cloud {
    # HTTP access log -- site-block `log`, not global. Goes to journal.
    log {
        output stdout
        format json
    }

    @requests host requests.rampancy.cloud
    handle @requests {
        crowdsec
        reverse_proxy http://192.168.1.252:5055
    }
    # ... one matcher per host ...
    handle {
        respond "Host not configured" 404
    }
}

The *.rampancy.cloud block + per-host matchers is the modern idiom for "one wildcard cert covers many hosts." Each matcher pattern (@name host fqdn) routes by SNI; a final handle { respond ... 404 } catches unconfigured subdomains. Caddy issues a single LE wildcard cert for *.rampancy.cloud rather than one per host.

The crowdsec directive in each handle block is gated on caddy_crowdsec_enabled in the Jinja template — flip it false in host_vars/edge.yml to disable enforcement without touching the engine. See the crowdsec_engine role page for the engine side.

Indented with tabs to match caddy fmt canonical style. If caddy fmt /etc/caddy/Caddyfile reports differences after a template change, mirror the changes back into the Jinja template.

Adding a new proxied host

  1. Add an entry to caddy_proxy_hosts in inventory/host_vars/edge.yml:

    - name: newservice
      fqdn: newservice.rampancy.cloud
      upstream_scheme: http
      upstream_host: 192.168.1.xxx
      upstream_port: 1234
    
  2. Add the corresponding CNAME in Cloudflare DNS (point to apex rampancy.cloud, gray-cloud).

  3. Re-apply: ansible-playbook playbooks/edge.yml --limit edge --tags caddy.

The Caddyfile reload is non-disruptive (no connection drop). The wildcard cert already covers the new hostname — no additional ACME flow.

Gotchas captured during execution

Phase 6E: caddy_matrix_enabled special-case isn't reusable via caddy_proxy_hosts

Matrix routing has two characteristics that don't fit the caddy_proxy_hosts data model:

  1. Apex site blockrampancy.cloud (no subdomain prefix) needs its own LE cert; the wildcard *.rampancy.cloud doesn't cover the apex. Has to be a sibling Caddy site block, not a @host matcher inside the wildcard block.
  2. Path-routed CrowdSec exclusion — federation traffic to /_matrix/federation/* is server-to-server (other homeservers connecting). Putting CrowdSec on it risks blocking legitimate federation peers whose IPs land on community blocklists. CrowdSec on /_matrix/federation/* is off; on every other path it's on. This requires path-based matchers, not the single-reverse_proxy-per-host shape caddy_proxy_hosts enforces.

The role's template adds both blocks behind the caddy_matrix_enabled gate, kept out of the standard wildcard handler. If a similar future service has the same shape (apex serving statics + a subdomain with path-routed middleware exclusion), copy this pattern rather than trying to generalise caddy_proxy_hosts.

Caddy upstream systemd unit ships --environ flag — leaks env vars to journalctl

Caddy's official systemd unit includes --environ on ExecStart. With EnvironmentFile=/etc/caddy/caddy.env containing the CF API token, the token gets printed to journalctl on every restart. Detected during the 5D cutover; the role's systemd template drops --environ for this reason.

If you need to debug env-var resolution, run Caddy manually with --environ once (e.g. sudo -u caddy CADDY_CLOUDFLARE_API_TOKEN=test caddy run --environ --config /etc/caddy/Caddyfile) rather than baking it into the systemd unit.

caddy validate doesn't see EnvironmentFile

systemd's EnvironmentFile= directive only scopes to ExecStart (and ExecStop/ExecReload). Ad-hoc caddy validate invocations don't pick up caddy.env — the {env.CADDY_CLOUDFLARE_API_TOKEN} in the Caddyfile substitutes to empty, and Caddy errors out with API token '' appears invalid. Same applies to CROWDSEC_BOUNCER_API_KEY once the bouncer is enabled — first hit during Phase 7D 2026-05-04 with crowdsec API key must not be empty.

The role's Validate Caddyfile task passes both tokens explicitly via Ansible's environment: parameter:

- name: Validate Caddyfile
  ansible.builtin.command: >
    {{ caddy_install_dir }}/caddy validate
    --config {{ caddy_config_dir }}/Caddyfile
    --adapter caddyfile
  environment:
    CADDY_CLOUDFLARE_API_TOKEN: "{{ lookup('vars', caddy_cf_dns_token_var) }}"
    CROWDSEC_BOUNCER_API_KEY: "{{ lookup('vars', caddy_crowdsec_bouncer_key_var) if caddy_crowdsec_enabled else '' }}"
  changed_when: false
  check_mode: false
  when: not ansible_check_mode

Why a separate command task rather than template.validate: — first-run --check --diff would fail before the binary is in place. Keeping validate separate (and gated when: not ansible_check_mode) keeps the role idempotent across check-mode and real runs.

{env.X} works for some directives, {$X} is the safe choice for module config

Caddy substitutes {env.X} at runtime, but only for directives that explicitly support it (e.g. the ACME acme_dns argument does; the crowdsec module's api_key field does not). The crowdsec module reads its config before Caddy's runtime substitution pass fires, so {env.CROWDSEC_BOUNCER_API_KEY} ends up as the literal string in the loaded config — verified via curl localhost:2019/config/apps/crowdsec/ showing the literal {env.X} text.

The fix: use {$X} (parse-time substitution by Caddy's caddyfile adapter), which expands before the module sees its config. The Caddyfile template uses {$CROWDSEC_BOUNCER_API_KEY} for this reason. The CF token stays on {env.X} because the ACME directive supports it and runtime substitution is preferred there (no token in the on-disk JSON config dump).

no_log: true on the env-file render task

The env-file template task renders the CF API token. With --diff enabled (the default for site-wide drift runs), Ansible would print the rendered template to stdout — including the token. no_log: true suppresses the diff output. Trade-off: you can't see what changed in the env file from a drift run; if you need to debug, run the task manually with -v and inspect the file directly.

Alt-port mode (Phase 1 parallel-run)

caddy_listen_alt_ports: true in defaults makes Caddy bind 8080/8443 instead of 80/443. Used during the cutover so Caddy could parallel-run alongside NPM (which still owned 80/443 on .249 until the UDM swap). Flipped to false in host_vars/edge.yml for production binding. Leave the toggle in defaults — useful for any future migration that wants the same parallel-run pattern.

Validation path post-deploy

For ongoing drift, the user's pattern is ansible-playbook playbooks/site.yml --check --diff from CT 104 — see Drift Detection and Ansible homelab-ansible CLAUDE.md. The first-time apply was from WSL (vault + keys present, edge brand-new). Once homelab-ansible commits land on CT 104, drift will validate the role end-to-end.