Role: caddy¶

Caddy reverse proxy on the edge LXC (CT 107, 192.168.1.244). Single static binary, Caddyfile templated from inventory, automatic HTTPS via Let's Encrypt DNS-01 against Cloudflare, no Docker.

Hosts: edge (only).

Phase 5D executed — 2026-05-02

Role landed alongside the edge LXC bring-up. Caddy serving four hosts (requests, dash, n8n, kosync) with a single LE wildcard *.rampancy.cloud. Replaces the hand-clicked NPM install on retired nginx VM 102. See edge-cutover runbook for the cutover procedure and execution-time lessons.

Validation gap

Apply succeeded against edge from WSL on 2026-05-02. Site-wide --check --diff from CT 104 has not yet run as of this docs commit — pending the homelab-ansible commit + CT 104 pull. Will report changed=0 once code lands on the drift runner.

Architecture¶

flowchart LR
    Internet[Public client] --> UDM[UDM<br/>port-forward 80/443]
    UDM --> Caddy[Caddy on edge<br/>192.168.1.244]
    Caddy -->|TLS terminate<br/>LE wildcard| n8n[n8n VM<br/>:5678]
    Caddy --> beszel[Beszel hub<br/>:8090]
    Caddy --> seerr[Overseerr<br/>arrstack:5055]
    Caddy --> kosync[korrosync<br/>arrstack:3030]
    Caddy <-.->|ACME DNS-01| CF[Cloudflare API<br/>TXT record]

Cert issuance and renewal happen via Cloudflare DNS-01 — Caddy POSTs a TXT record to _acme-challenge.rampancy.cloud, Let's Encrypt validates, Caddy deletes the record. No inbound port 80 dependency for renewal.

Tasks¶

Task	Tag
Ensure caddy group + user (system, nologin)	`caddy`, `install`
Create `/etc/caddy`, `/var/lib/caddy`, `/var/log/caddy`	`caddy`, `install`
Install pinned binary from `roles/caddy/files/caddy-<version>`	`caddy`, `install`
Render systemd unit (no `--environ` — see Gotchas)	`caddy`, `service`
Render `/etc/caddy/caddy.env` with vaulted CF token (`no_log: true`)	`caddy`, `config`
Render `/etc/caddy/Caddyfile` from inventory	`caddy`, `config`
Validate Caddyfile (with env var passed via `environment:`)	`caddy`, `config`
Enable + start `caddy.service`	`caddy`, `service`

Key variables¶

Defaults at roles/caddy/defaults/main.yml. Per-host overrides at inventory/host_vars/edge.yml.

Variable	Source	Default / value
`caddy_version`	defaults	`2.11.2`
`caddy_build_revision`	defaults	`cs1` (bump when xcaddy plugin set changes; see Binary build flow)
`caddy_binary_local`	defaults	`caddy-{{ caddy_version }}-{{ caddy_build_revision }}` (in `roles/caddy/files/`)
`caddy_install_dir`	defaults	`/usr/local/bin`
`caddy_config_dir`	defaults	`/etc/caddy`
`caddy_data_dir`	defaults	`/var/lib/caddy`
`caddy_acme_email`	defaults	`newlingd@gmail.com`
`caddy_acme_dns_provider`	defaults	`cloudflare`
`caddy_listen_alt_ports`	defaults	`true` (bind 8080/8443) — flipped to `false` in `host_vars/edge.yml` post-cutover
`caddy_cf_dns_token_var`	defaults	`vault_caddy_cf_dns_token`
`caddy_crowdsec_enabled`	defaults	`false` — flipped per-host. Gates the bouncer block in `Caddyfile.j2` and the `CROWDSEC_BOUNCER_API_KEY` line in `caddy.env.j2`.
`caddy_crowdsec_bouncer_key_var`	defaults	`vault_caddy_crowdsec_bouncer_key`
`caddy_crowdsec_lapi_url`	defaults	`http://127.0.0.1:8080` — stock LAPI listen address. The agent's `local_api_credentials.yaml` is hardcoded to this by the installer; moving LAPI requires re-templating that too, so don't optimise for hypothetical alt-port collisions.
`caddy_zone`	host_vars	`rampancy.cloud`
`caddy_proxy_hosts`	host_vars	list of `{name, fqdn, upstream_scheme, upstream_host, upstream_port}`
`caddy_matrix_enabled`	defaults	`false` — flipped to `true` on edge for Phase 6E. Gates an apex `{{ caddy_zone }}` block (well-known matrix delegation) plus a `matrix.{{ caddy_zone }}` block in the template, neither expressible via `caddy_proxy_hosts` (apex needs its own cert; matrix subdomain needs path-routed CrowdSec exclusion on `/_matrix/federation/*`).
`caddy_matrix_upstream`	host_vars	`192.168.1.243:81` on edge — single upstream because `matrix_federation_public_port: 443` in the matrix-deploy vars.yml collapses Traefik's federation entrypoint into the web entrypoint, so client + federation share the host-bind port.

The vaulted CF API token (vault_caddy_cf_dns_token) is scoped to Zone:Read + DNS:Edit on rampancy.cloud and nothing else.

Binary build flow¶

The binary is not built on the target host. Pre-built on WSL via xcaddy, version-pinned, committed to roles/caddy/files/:

mkdir -p ~/build/caddy && cd ~/build/caddy
docker run --rm -v "$PWD:/output" -w /output caddy:2.11.2-builder \
    xcaddy build v2.11.2 \
        --with github.com/caddy-dns/cloudflare \
        --with github.com/hslatman/caddy-crowdsec-bouncer/http \
        --output caddy
./caddy version                          # confirm v2.11.2
./caddy list-modules | grep cloudflare   # confirm dns.providers.cloudflare
./caddy list-modules | grep crowdsec     # confirm http.handlers.crowdsec + crowdsec app

cp caddy ~/homelab-ansible/roles/caddy/files/caddy-2.11.2-cs1

Why pre-built rather than xcaddy on edge: - Reproducibility — committed binary is bit-identical across deploys - Speed — xcaddy build on a 1 GiB LXC is slow or OOMs during go build - Footprint — no Go toolchain on edge (~500 MB saved)

The -cs1 filename suffix is caddy_build_revision — bumped when the xcaddy plugin set changes (e.g. adding the CrowdSec bouncer module for Phase 7D). Caddy upstream version stays accurate; the suffix is the audit trail for plugin-set changes. Ansible's copy module would also detect content changes via checksum, but a new filename makes the rebuild visible in git diff.

To bump Caddy: rebuild via xcaddy with a new version, drop the binary into roles/caddy/files/, bump caddy_version in defaults/main.yml, re-apply. The role's Install caddy binary task will detect the change and notify the Restart caddy handler.

To change the plugin set without bumping Caddy: rebuild with the new --with flags, drop the new file into roles/caddy/files/, bump caddy_build_revision in defaults/main.yml. Old binary stays in tree for one revision (handy for rollback).

Caddyfile template¶

roles/caddy/templates/Caddyfile.j2 produces (with caddy_crowdsec_enabled: true on edge for Phase 7D):

{
    email newlingd@gmail.com
    acme_dns cloudflare {env.CADDY_CLOUDFLARE_API_TOKEN}

    # CrowdSec bouncer. order is required -- the directive isn't in
    # Caddy's standard order list. api_key uses {$ENV} (parse-time
    # substitution) -- {env.X} ends up as a literal in this module's
    # loaded config (verified via /config/ admin API).
    order crowdsec first
    crowdsec {
        api_url http://127.0.0.1:8080
        api_key {$CROWDSEC_BOUNCER_API_KEY}
        ticker_interval 15s
    }
}

*.rampancy.cloud {
    # HTTP access log -- site-block `log`, not global. Goes to journal.
    log {
        output stdout
        format json
    }

    @requests host requests.rampancy.cloud
    handle @requests {
        crowdsec
        reverse_proxy http://192.168.1.252:5055
    }
    # ... one matcher per host ...
    handle {
        respond "Host not configured" 404
    }
}

The *.rampancy.cloud block + per-host matchers is the modern idiom for "one wildcard cert covers many hosts." Each matcher pattern (@name host fqdn) routes by SNI; a final handle { respond ... 404 } catches unconfigured subdomains. Caddy issues a single LE wildcard cert for *.rampancy.cloud rather than one per host.

The crowdsec directive in each handle block is gated on caddy_crowdsec_enabled in the Jinja template — flip it false in host_vars/edge.yml to disable enforcement without touching the engine. See the crowdsec_engine role page for the engine side.

Indented with tabs to match caddy fmt canonical style. If caddy fmt /etc/caddy/Caddyfile reports differences after a template change, mirror the changes back into the Jinja template.

Adding a new proxied host¶

Add an entry to caddy_proxy_hosts in inventory/host_vars/edge.yml:

- name: newservice
  fqdn: newservice.rampancy.cloud
  upstream_scheme: http
  upstream_host: 192.168.1.xxx
  upstream_port: 1234

Add the corresponding CNAME in Cloudflare DNS (point to apex rampancy.cloud, gray-cloud).
Re-apply: ansible-playbook playbooks/edge.yml --limit edge --tags caddy.

The Caddyfile reload is non-disruptive (no connection drop). The wildcard cert already covers the new hostname — no additional ACME flow.

Gotchas captured during execution¶

Phase 6E: `caddy_matrix_enabled` special-case isn't reusable via `caddy_proxy_hosts`¶

Matrix routing has two characteristics that don't fit the caddy_proxy_hosts data model:

Apex site block — rampancy.cloud (no subdomain prefix) needs its own LE cert; the wildcard *.rampancy.cloud doesn't cover the apex. Has to be a sibling Caddy site block, not a @host matcher inside the wildcard block.
Path-routed CrowdSec exclusion — federation traffic to /_matrix/federation/* is server-to-server (other homeservers connecting). Putting CrowdSec on it risks blocking legitimate federation peers whose IPs land on community blocklists. CrowdSec on /_matrix/federation/* is off; on every other path it's on. This requires path-based matchers, not the single-reverse_proxy-per-host shape caddy_proxy_hosts enforces.

The role's template adds both blocks behind the caddy_matrix_enabled gate, kept out of the standard wildcard handler. If a similar future service has the same shape (apex serving statics + a subdomain with path-routed middleware exclusion), copy this pattern rather than trying to generalise caddy_proxy_hosts.

Caddy upstream systemd unit ships `--environ` flag — leaks env vars to journalctl¶

Caddy's official systemd unit includes --environ on ExecStart. With EnvironmentFile=/etc/caddy/caddy.env containing the CF API token, the token gets printed to journalctl on every restart. Detected during the 5D cutover; the role's systemd template drops --environ for this reason.

If you need to debug env-var resolution, run Caddy manually with --environ once (e.g. sudo -u caddy CADDY_CLOUDFLARE_API_TOKEN=test caddy run --environ --config /etc/caddy/Caddyfile) rather than baking it into the systemd unit.

`caddy validate` doesn't see `EnvironmentFile`¶

systemd's EnvironmentFile= directive only scopes to ExecStart (and ExecStop/ExecReload). Ad-hoc caddy validate invocations don't pick up caddy.env — the {env.CADDY_CLOUDFLARE_API_TOKEN} in the Caddyfile substitutes to empty, and Caddy errors out with API token '' appears invalid. Same applies to CROWDSEC_BOUNCER_API_KEY once the bouncer is enabled — first hit during Phase 7D 2026-05-04 with crowdsec API key must not be empty.

The role's Validate Caddyfile task passes both tokens explicitly via Ansible's environment: parameter:

- name: Validate Caddyfile
  ansible.builtin.command: >
    {{ caddy_install_dir }}/caddy validate
    --config {{ caddy_config_dir }}/Caddyfile
    --adapter caddyfile
  environment:
    CADDY_CLOUDFLARE_API_TOKEN: "{{ lookup('vars', caddy_cf_dns_token_var) }}"
    CROWDSEC_BOUNCER_API_KEY: "{{ lookup('vars', caddy_crowdsec_bouncer_key_var) if caddy_crowdsec_enabled else '' }}"
  changed_when: false
  check_mode: false
  when: not ansible_check_mode

Why a separate command task rather than template.validate: — first-run --check --diff would fail before the binary is in place. Keeping validate separate (and gated when: not ansible_check_mode) keeps the role idempotent across check-mode and real runs.

`{env.X}` works for some directives, `{$X}` is the safe choice for module config¶

Caddy substitutes {env.X} at runtime, but only for directives that explicitly support it (e.g. the ACME acme_dns argument does; the crowdsec module's api_key field does not). The crowdsec module reads its config before Caddy's runtime substitution pass fires, so {env.CROWDSEC_BOUNCER_API_KEY} ends up as the literal string in the loaded config — verified via curl localhost:2019/config/apps/crowdsec/ showing the literal {env.X} text.

The fix: use {$X} (parse-time substitution by Caddy's caddyfile adapter), which expands before the module sees its config. The Caddyfile template uses {$CROWDSEC_BOUNCER_API_KEY} for this reason. The CF token stays on {env.X} because the ACME directive supports it and runtime substitution is preferred there (no token in the on-disk JSON config dump).

`no_log: true` on the env-file render task¶

The env-file template task renders the CF API token. With --diff enabled (the default for site-wide drift runs), Ansible would print the rendered template to stdout — including the token. no_log: true suppresses the diff output. Trade-off: you can't see what changed in the env file from a drift run; if you need to debug, run the task manually with -v and inspect the file directly.

Alt-port mode (Phase 1 parallel-run)¶

caddy_listen_alt_ports: true in defaults makes Caddy bind 8080/8443 instead of 80/443. Used during the cutover so Caddy could parallel-run alongside NPM (which still owned 80/443 on .249 until the UDM swap). Flipped to false in host_vars/edge.yml for production binding. Leave the toggle in defaults — useful for any future migration that wants the same parallel-run pattern.

Validation path post-deploy¶

For ongoing drift, the user's pattern is ansible-playbook playbooks/site.yml --check --diff from CT 104 — see Drift Detection and Ansible homelab-ansible CLAUDE.md. The first-time apply was from WSL (vault + keys present, edge brand-new). Once homelab-ansible commits land on CT 104, drift will validate the role end-to-end.

edge-cutover runbook — cutover procedure + lessons
Roadmap §Phase 5D
Network overview — public-services map
crowdsec_engine role — Phase 7D engine side; pairs with the bouncer module loaded into the Caddy binary
Accepted risk: edge security gap until CrowdSec — why no orange-cloud
Roadmap §Phase 7D — CrowdSec on edge — closes the gap
Caddy upstream docs · caddy-dns/cloudflare module