Skip to content

Accepted Risks

Risks the homelab knowingly carries — tradeoffs made deliberately rather than oversights. Each entry records what the risk is, what happens if it's realised, why it's been accepted, and what signal would cause a revisit.

This register is not a TODO list. Items here have been considered and left as-is because the cost of remediation is judged to exceed the expected loss. Move an item to the roadmap when conditions change.

Media library has no backup

Description. The /stash/rodneystash/ media library (Movies, TV Shows, Downloads, Music) sits on the stash ZFS pool with RAIDZ1 redundancy, but is not backed up anywhere else. The PBS backup job covers VM/CT rootfs and configs — not bulk media.

Impact if realised. Loss of the stash pool beyond RAIDZ1 tolerance (second-drive failure during rebuild; accidental destroy; fire/theft) means the media library is gone. Re-acquisition via arrstack is possible but time-consuming and dependent on source availability at the time.

Why accepted.

  • Media is conceptually reproducible — arrstack can re-acquire from sources.
  • At ~12 TB stored, no affordable offsite target exists in the current budget. The QNAP TS-269L is 2-bay and itself constrained (see below).
  • Daily incremental to any on-site target would still lose to a whole-site failure — marginal value beyond RAIDZ1.
  • RAIDZ1 on 6 matched PM1633a SAS SSDs handles the realistic single-drive failure case.

Date accepted. Pre-2026 (codified 2026-04-24). Existing arrangement since pool creation.

Review trigger.

  • stash utilisation sustained above 80% — RAIDZ1 rebuild time becomes a real concern.
  • An affordable ≥20 TB offsite target becomes viable (cold-storage pricing shift, or a larger NAS on hand).
  • A real media-loss event — at home or reported across the arrstack community — showing re-acquisition isn't practical.

QNAP TS-269L firmware EOL

Description. The backup NAS at 192.168.1.253 runs QTS 4.3.4 — the final firmware for this model, unpatched by QNAP since 2020. It serves as NFS source for the PBS nas-primary datastore and CIFS target for legacy nasbackup vzdump pushes.

Impact if realised.

  • A CVE in QTS 4.3.4 with a LAN-reachable exploit path would require LAN-side compromise first, but would then surface the NAS as an easy foothold.
  • Hardware failure (Atom D2701 is 2012-era) would lose the PBS datastore and the vzdump target. No production data (proxfold stash) depends on the NAS.

Why accepted.

  • LAN-only behind the UDM; no WAN exposure. Attack path requires prior LAN compromise.
  • The box is dedicated to backup-target duty; no services exposed beyond NFS/SMB on the management LAN.
  • TS-269L is too underpowered to host PBS directly (Atom D2701, ≤3 GB RAM). Replacement would be a new appliance, not a firmware bump.
  • Replacement isn't urgent while the device functions.

Date accepted. 2026-04-23 (Phase 5A scoping — decision to use the NAS as PBS backend rather than delay for replacement).

Review trigger.

  • Hardware failure.
  • A new QTS 4.3.4 CVE with a working LAN-reachable exploit.
  • A suitable modern replacement at acceptable price (e.g. small-form x86 NAS appliance with ≥2 bays and current firmware support).

Edge security gap until CrowdSec

Closed 2026-05-04 — Phase 7D landed

CrowdSec engine + hslatman Caddy bouncer module live on edge (CT 107). End-to-end validation via cellular phone confirmed: blocked IP got 403, removed IP returned to 200. The only edge-protection layer in front of Caddy is no longer just the UDM firewall — there's now a behavioural IPS with a federated reputation feed in the request path. See crowdsec_engine role and crowdsec-validation runbook.

7E (Pocket-ID SSO for Proxmox + PBS) is a separate concern — admin-UI auth, not edge protection — and remains scoped but not yet executed. It doesn't reopen this risk.

Original description (kept for history). After Phase 5D cut over from NPM to Caddy on the edge LXC, the four publicly-exposed hostnames (requests, dash, n8n, kosync) hit Caddy directly via the WAN port-forward. CF orange-cloud was attempted but reverted — Universal SSL is disabled at the zone, so flipping orange caused immediate TLS handshake failures with no edge cert. Net effect: until Phase 7D — CrowdSec on edge landed, the only edge-protection layer in front of Caddy was the UDM firewall.

Original impact assessment. A determined scanner or brute-force tool would reach Caddy directly. Each backing app has its own auth (n8n login, Overseerr login, Beszel admin, kosync HTTP basic on a few endpoints), so impact ≈ "app-level credential strength" rather than "no auth at all." DDoS volumetric load landing on the home WAN connection rather than CF's edge was the residual concern.

Date accepted. 2026-05-02 (Phase 5D cutover; CF orange revert decision same day). Date closed. 2026-05-04.

MatrixRTC port-forwards bypass edge LXC + CrowdSec

Description. Phase 6E.4 opened five UDM port-forwards landing directly on VM 111 (matrix, 192.168.1.243):

Name WAN port Protocol
matrix-rtc-ice-tcp 7881 TCP
matrix-rtc-ice-udp-mux 7882 UDP
matrix-rtc-turn-udp 3479 UDP
matrix-rtc-turn-tcp 5350 TCP
matrix-rtc-turn-relay 30000-30020 UDP

These bypass the edge LXC (CT 107) and its CrowdSec coverage. Real-time WebRTC media needs a direct UDP-friendly path to the LiveKit SFU — reverse-proxying it through Caddy isn't a working option (Caddy is TCP/HTTP oriented; reverse-proxying UDP-mux + relay ranges isn't viable). The forwards are how spantaleev's reference deployment is documented and how every comparable LiveKit-on-self-hosted-Matrix setup looks.

Impact assessment. Bounded by LiveKit's signed-JWT auth — every WebRTC connection requires a valid JWT minted by our lk-jwt-service, which in turn authenticates the requesting Matrix user via Tuwunel's normal auth path. A scanner finding these ports gets connection refused without valid JWT at the WebRTC handshake. No data exposed, no anonymous access.

The bypass means: scanning / volumetric UDP traffic against those ports hits VM 111 directly rather than CT 107. UDM firewall (the only layer in front) doesn't have CrowdSec's reputation feed. Realistic worst case is bandwidth/CPU consumption on VM 111 from junk traffic on the WAN ports.

Mitigations available if needed later.

  • Drop the UDP relay range (30000-30020) and force all media through TURN-TLS on 5350. Costs ~30-100ms latency per call but eliminates 21 UDP-exposed ports.
  • External TURN service (Twilio, etc.) — moves the WAN exposure off the home network at the cost of a third-party dependency and ~$5/mo.
  • Coturn on a VPS — same trade-off as external TURN but self-hosted relay.

None of these are needed at our current scale (closed-membership, small-group calls).

Date accepted. 2026-05-22 (Phase 6E.4 close).

Review trigger.

  • Sustained junk traffic volume on the RTC ports (visible in UDM logs or VM 111 metrics).
  • A LiveKit CVE that bypasses JWT auth.
  • A Matrix specification change that makes TURN-TLS-only deployments viable for Element Call (would let us drop the wide UDP exposure).