Runbook: CrowdSec bouncer end-to-end validation¶
Proves the hslatman Caddy bouncer module is actually bouncing — not just that the engine is running. Use after the crowdsec_engine bootstrap and after flipping caddy_crowdsec_enabled: true in inventory/host_vars/edge.yml.
Active proof must run from outside the LAN
Hairpin NAT on the UDM rewrites the source IP for any LAN-internal request to the public hostname (requests.rampancy.cloud etc.) — Caddy sees client_ip: 192.168.1.1 instead of your real public IP. The bouncer correctly allows that LAN-side traffic, so a curl from WSL/CT104 will always show "not blocking" even when everything is wired right. Use a cellular phone (WiFi off) or a remote SSH host for the active test.
Prerequisites¶
crowdsec_enginerole applied to edge,crowdsec.serviceactivecaddy_crowdsec_enabled: trueininventory/host_vars/edge.yml, applied- Bouncer-enabled Caddy binary in
roles/caddy/files/(built per caddy.md §Binary build flow with the--with github.com/hslatman/caddy-crowdsec-bouncer/httpflag) - SSH access to edge as root
- A device on cellular (or any network whose egress IP is not your home WAN IP)
Step 1 — Read-only sanity (drift-safe, run from edge)¶
Confirm engine + bouncer + inflow before touching the active path.
ssh root@192.168.1.244
# Engine + Caddy both running.
systemctl is-active crowdsec caddy
# Expected: active / active
# Bouncer is registered and pulling decisions every ~15s.
cscli bouncers list
# Expected: caddy-edge, Valid ✔️, Last API pull within ~30s, Type caddy-cs-bouncer
# Engine has decisions to enforce (community CAPI pull on first start).
cscli decisions list | head
# Expected: a non-empty list eventually (CAPI ships ~thousands by default).
# On a fresh install, allow up to 10 min for the first community pull.
# Caddy bouncer module loaded.
/usr/local/bin/caddy list-modules | grep crowdsec
# Expected: admin.api.crowdsec, crowdsec, http.handlers.crowdsec
Hold point: if any of those fail, stop. Don't proceed to active proof.
Step 2 — Capture your phone's cellular egress IP¶
On the phone (WiFi off, cellular on):
- Visit https://api.ipify.org (just shows the IP) or any whatsmyip site
- Note it (call it $PHONE_IP)
- Confirm it's plausibly a cellular range — NOT your home WAN IP
Sanity-load https://requests.rampancy.cloud — should reach Overseerr's login screen normally.
Step 3 — Add a test decision (5m TTL, from WSL)¶
ssh root@192.168.1.244 "cscli decisions add --ip $PHONE_IP --duration 5m --reason 'phase-7d-validation'"
# Expected: "Decision successfully added"
ssh root@192.168.1.244 "cscli decisions list --ip $PHONE_IP"
# Expected: one entry, type ban, ~5m remaining
5-minute TTL gives breathing room — the streaming bouncer ticker is 15s, so the decision needs at least one poll cycle to land in Caddy's cache.
Step 4 — Confirm the block (on phone)¶
Wait ~20s for the bouncer poll, then on the phone reload https://requests.rampancy.cloud.
Expected: browser shows blank page (Firefox) or "Forbidden" (curl). The bouncer's default deny response is 403 Forbidden with no body.
If still loading normally after 30s, see troubleshooting below.
Step 5 — Confirm the deny event landed in Caddy logs¶
On edge:
ssh root@192.168.1.244 "journalctl -u caddy --since '2 minutes ago' --no-pager | grep -E 'client_ip.*$PHONE_IP|crowdsec.*$PHONE_IP' | head"
# Expected: at least one access-log line with the phone IP and status 403,
# or a crowdsec logger line referencing the IP.
Step 6 — Expire the decision¶
Step 7 — Confirm restored access (on phone)¶
Wait ~20s for the next bouncer poll (decisions are streamed as deltas, including deletions), then reload on the phone.
Expected: Overseerr loads normally again.
Step 8 — Soft-fail behaviour (optional, riskier)¶
To prove the bouncer fails open if the engine is unreachable (so a CrowdSec outage doesn't black-hole the public apps), with no decisions blocking your phone IP:
ssh root@192.168.1.244 "systemctl stop crowdsec"
# On phone: reload requests.rampancy.cloud -- expected: still 200 (Caddy logs
# warn the bouncer can't reach LAPI; module fails open per hslatman default).
ssh root@192.168.1.244 "systemctl start crowdsec"
If the phone gets 5xx with engine stopped, enable_hard_fails got flipped on somewhere — check the Caddyfile global block.
Troubleshooting¶
| Symptom | Likely cause | Diagnostic |
|---|---|---|
| Phone reload still 200 after 30s | Bouncer stream cache hasn't picked up the decision | ssh root@edge cscli bouncers list — check Last API pull is recent. If it isn't, restart crowdsec.service |
Phone reload still 200 even after cscli bouncers list shows recent pull |
Caddy client_ip isn't what you think |
Check journal access log: journalctl -u caddy --since '1 min ago' --no-pager \| grep client_ip. If client_ip ≠ $PHONE_IP, your phone is on a different egress (corporate VPN? CGNAT?) — re-capture |
| All requests blocked (LAN included) | A decision exists for 192.168.1.1 (the UDM hairpin source) |
cscli decisions list \| grep 192.168.1.1; delete it |
caddy validate fails with "crowdsec API key must not be empty" |
caddy_crowdsec_bouncer_key_var resolved to empty (vault key missing) |
Check vault_caddy_crowdsec_bouncer_key exists in vault: ansible-vault view inventory/group_vars/all/vault.yml \| grep crowdsec |
Caddy errors with unknown module: http.handlers.crowdsec on reload |
Running Caddy is the OLD binary; new binary is on disk but the process didn't restart | ssh root@edge systemctl restart caddy (full restart, not reload — reload reloads config into the running process which doesn't have the module) |
Lessons from the 2026-05-04 run¶
The first execution caught six bugs / gotchas, all now codified into the role + this runbook + a feedback memory on hairpin NAT.
- packagecloud
any/anyURL gotcha. The upstream-documented workaround for the broken trixie repo ishttps://packagecloud.io/crowdsec/crowdsec/any(path component =any) + suiteany. NOTcrowdsec/crowdsec/debian+ suiteany. The/debiandistro path rejects theanysuite (returns HTTP 422 / "repository not signed"). Verified empirically:/any/dists/any/InRelease→ 302 (works);/debian/dists/any/InRelease→ 422. - Don't move LAPI off port 8080 without re-templating the agent. The CrowdSec agent's
/etc/crowdsec/local_api_credentials.yamlis wired to127.0.0.1:8080by the installer. Overriding LAPI'slisten_uriviaconfig.yaml.localwithout also re-templating the agent's credentials file leaves the agent unable to authenticate, and the engine fails to start. Lesson: don't optimise for hypothetical alt-port collisions; stay on stock unless there's a real reason. {env.X}doesn't substitute in the crowdsec module'sapi_key. Caddy's runtime env-var substitution doesn't fire for this field — the module reads its config before that pass. Use{$X}(parse-time substitution) instead. Verified by querying the running Caddy via/config/apps/crowdsec/admin API and seeing the literal string{env.CROWDSEC_BOUNCER_API_KEY}in the loaded config.caddy validateneeds every env var passed via Ansible'senvironment:. EnvironmentFile is systemd-only; ad-hoc validate doesn't see it. The validate task inroles/caddy/tasks/main.ymlalready passedCADDY_CLOUDFLARE_API_TOKEN; needed to addCROWDSEC_BOUNCER_API_KEYtoo.- Handler-failure cascade leaves binaries unsynced. When the
Restart crowdsechandler failed (due to the LAPI bug), Ansible aborted subsequent handlers — including caddy'sRestart caddy. The new Caddy binary on disk wasn't loaded by the running process, so the next reload tried to load a Caddyfile with thecrowdsecdirective into a Caddy instance without the module → "unknown module" error. Fix:systemctl restart caddy(full restart loads the new binary). Future-proof: when a binary's content changes, prefer Restart over Reload, and check that handlers actually fired with--check --diffbefore assuming clean state. - HTTP access logging needs site-block
log, not globallog. Caddy's globallogdirective only configures the default logger (the one used for thedefaultlog target); HTTP server access logs requireloginside the site block. Without it, request flow is invisible — critical for diagnosing bouncer behaviour.
The big one: hairpin NAT made the bouncer look broken when it wasn't. Spent ~30 min testing the bounce from WSL with cscli decisions add for my WAN IP, saw nothing happen, chased four false leads (binary version, env-var syntax, streaming-vs-live mode, restart vs reload). Finally added access logging and saw client_ip: 192.168.1.1 — the UDM's LAN IP, not my WAN IP. UDM hairpin rewrites source on internal-loop traffic. The bouncer was correctly allowing LAN traffic the entire time. Always validate edge bouncers from outside the LAN. Captured as a feedback memory so future-me doesn't repeat this.