Home Assistant setup — Phase 6B¶

Stand up HAOS as a sealed Proxmox VM (deliberate departure from the Debian + apt + Ansible baseline that covers every other guest), integrate the Hue V2 bridge + Tapo P110M via Matter + Bambu A1 Mini via HACS, front via Caddy edge at home.rampancy.cloud, build a starter dashboard + automations.

Status: 6B.1 + 6B.3 executed 2026-05-23; 6B.2 deferred (user driving hands-on); 6B.4 partial

Scaffold-drift findings from the 2026-05-23 run have been folded back into the body of this runbook so it's correct for re-runs. The Lessons from the 2026-05-23 run appendix at the bottom captures the same findings in narrative form.

Stages¶

Stage	Scope	Hold points
6B.1	HAOS VM stand-up	After `qm importdisk` — verify the resulting disk-volume name before `qm set scsi0`; before declaring HA reachable
6B.2	Core integrations (HACS + Matter + Hue + Bambu)	Before flipping the A1 Mini to Developer Mode (changes the trust posture of the printer); before adding HACS-installed integrations
6B.3	Edge integration	Before adding `trusted_proxies` to HA — without it the connection breaks rather than degrades; verify cellular access before declaring done
6B.4	Dashboard + automations + docs sweep	None high-risk

Cross-phase decisions¶

Install method: manual qm create + qcow2 import, not the community-scripts HAOS installer. Manual matches existing precedent (n8n VM 108, forgejo CT 109), keeps the HAOS version explicitly pinned in this runbook (the script pulls latest), and avoids bash <(curl ...) as root on proxfold. The script's correctness wins (Q35, OVMF, pre-enrolled-keys=0, EFI disk) are captured by writing the qm create invocation explicitly with all flags inline.
HAOS version pin: 17.3 (updated from scaffold's 17.2 at execution time on 2026-05-23). Pin to whatever's current stable on the day of execution — install-time pinning, not ongoing. HAOS Supervisor self-updates HA Core + add-ons + the OS layer afterward. The pin protects the cold-start state for rebuild-kit reproducibility and dodges known fresh-install regressions. Ongoing pinning is intentionally not adopted — Supervisor's update cadence is well-behaved and the maintenance overhead of policing it isn't worth it.
No Ansible coverage of the VM. HAOS doesn't take cloud-init, has no SSH or apt, and the common / security / auto_updates / beszel_agent roles all assume Debian. host_vars/hass.yml exists only for the Caddy vhost and as a documentation anchor. The VM is not added to inventory/hosts.yml — no playbook should target it.
Tapo via tplink integration primary, Matter as fallback. Revised from scaffold's Matter-primary at the 2026-05-23 sanity-check: community + upstream HA-core issues (core#149847 — Matter P110M power endpoints not enumerating, core#112639 — Matter Server update broke P110M, recurrent "Something went wrong" pairing failures) show Matter pairing for the P110M is flaky and energy-endpoint enumeration incomplete via Matter — only the total-energy sensors enumerate; voltage/current/frequency are missing. python-kasa (driving the tplink integration) now handles KLAP locally without cloud creds, so the original rationale for preferring Matter (avoid cloud-credential sync) no longer holds. Use Matter only if tplink pairing fails on the specific firmware shipped on the unit.
HACS as managed dependency. Required for ha-bambulab (no Core-integration path). Documented as an additional update cadence outside HA Core. Bambu's own Home Assistant integration story is in flux; HACS keeps us on the actively-maintained greghesp/ha-bambulab codebase.
Wazuh agent deferred to Phase 7B. BeardedTinker's HAOS rule pack integrates API-side and doesn't require an in-VM agent.

Pre-flight gates¶

VMID 110 free (qm list on proxfold)
192.168.1.241 free (no DHCP lease, no static reservation in UDM)
local-zfs has > 35 GB free (32 GB disk + EFI overhead)
HAOS qcow2 download verified — pin to current stable release at execution time (haos_ova-<version>.qcow2.xz from the HAOS releases page)
PBS daily job's namespace covers root (it does — confirmed in pbs role)
Hue V2 bridge IP known (http://<bridge-ip>/description.xml returns <modelNumber>BSB002</modelNumber>)
Tapo P110M powered on the same VLAN as HA (LAN-1 / vmbr0); plug LED solid
Bambu A1 Mini accessible from the HA VM's subnet (no VLAN boundary)
Discord webhook URL for #homelab-ops available in vault as vault_discord_webhook_homelab_ops (already stored — used by PVE 9 / Beszel / drift detection)

Stage 6B.1 — HAOS VM stand-up¶

Download the qcow2¶

# on proxfold
HAOS_VERSION=17.3   # pin to current stable at execution time
cd /var/lib/vz/template/iso

wget "https://github.com/home-assistant/operating-system/releases/download/${HAOS_VERSION}/haos_ova-${HAOS_VERSION}.qcow2.xz"

# HAOS does NOT publish per-asset .sha256 sidecars (nor a SHASUMS file) — TLS-only
# trust is the upstream precedent (community-scripts + ProxmoxVE-helper-scripts both
# do TLS-only too). Record the local sha256 for rebuild-kit reproducibility:
sha256sum "haos_ova-${HAOS_VERSION}.qcow2.xz"

unxz "haos_ova-${HAOS_VERSION}.qcow2.xz"
ls -lh "haos_ova-${HAOS_VERSION}.qcow2"

Create the VM shell¶

# on proxfold
qm create 110 \
  --name hass \
  --machine q35 \
  --bios ovmf \
  --cpu host --cores 2 --sockets 1 \
  --memory 4096 \
  --net0 virtio,bridge=vmbr0,firewall=0 \
  --ostype l26 \
  --onboot 1 \
  --agent enabled=1 \
  --efidisk0 local-zfs:0,efitype=4m,pre-enrolled-keys=0

pre-enrolled-keys=0 is the critical flag — HAOS does not sign its bootloader, and Pre-Enroll Keys = ON puts the VM into a boot loop that's diagnostically opaque (no console error, just resets).

Import the qcow2 + attach as scsi0¶

qm importdisk 110 "/var/lib/vz/template/iso/haos_ova-${HAOS_VERSION}.qcow2" local-zfs
# → Note the resulting volume name from output, typically vm-110-disk-1
#   (vm-110-disk-0 is the EFI disk created above)

qm set 110 \
  --scsihw virtio-scsi-single \
  --scsi0 local-zfs:vm-110-disk-1,discard=on,ssd=1 \
  --boot order=scsi0

# Defensive — the HAOS OVA qcow2 already ships with a 32 GiB virtual size, so on
# 17.x this is effectively a no-op. Keep it; cheap insurance if upstream ever
# changes the default.
qm disk resize 110 scsi0 32G

Hold point — verify config before first boot¶

qm config 110 | grep -E "machine|bios|efidisk|scsi0|boot:"
# Expected:
#   bios: ovmf
#   boot: order=scsi0
#   efidisk0: local-zfs:vm-110-disk-0,efitype=4m,pre-enrolled-keys=0,...
#   machine: q35
#   scsi0: local-zfs:vm-110-disk-1,discard=on,size=32G,ssd=1

First boot + onboarding¶

qm start 110
# Watch the console (Proxmox UI → VM 110 → Console) for first boot. On local-zfs
# HAOS 17.3 went from `qm start` to HTTP-200 ready in ~90 seconds on the 2026-05-23
# run; budget up to ~5 minutes on slower storage or first-ever HA-Core-image pull.

When the console shows Welcome to Home Assistant, hit http://<dhcp-assigned-ip>:8123 from a browser. Find the DHCP-assigned IP via UDM Network UI (Clients filtered by hostname homeassistant) or:

# on proxfold
qm guest cmd 110 network-get-interfaces 2&gt;/dev/null | jq -r '.[] | .["ip-addresses"][]?["ip-address"]' | grep '^192'

Run through the HA onboarding wizard (create owner account → location → unit prefs → analytics opt-out) but stop before the integrations step — we'll do those in 6B.2.

Set static IP¶

HAOS does not consume cloud-init or /etc/network/interfaces. Set the static IP from the HA UI:

Settings → System → Network → IPv4 → Static — set address 192.168.1.241/24, gateway 192.168.1.1, DNS 192.168.1.1. Save and wait for HAOS to apply (the UI will reconnect on the new address).

Hold point — declare 6B.1 done¶

# from any LAN host
# (use GET, not HEAD — HA returns 405 Method Not Allowed on HEAD /)
curl -s -o /dev/null -w "%{http_code}\n" "http://192.168.1.241:8123/"   # expect: 200

ssh root@192.168.1.250 "qm agent 110 ping && echo guest-agent-OK"
# (the older `qm guest ping <vmid>` syntax is not valid in PVE 9 — `qm agent <vmid> ping` is)

PBS job picks up the new VM that night at 02:00 ACST. Verify the next morning via PBS UI.

Stage 6B.2 — Core integrations¶

HACS bootstrap¶

Install the Studio Code Server add-on (Settings → Add-ons → Add-on Store → Studio Code Server → Install + Start). Open it; in the integrated terminal:

wget -O - https://get.hacs.xyz | bash -

Restart Home Assistant (Settings → System → Restart). After restart: Settings → Devices & Services → Add Integration → search "HACS" → follow the GitHub-OAuth flow.

Tapo P110M — `tplink` integration primary¶

Settings → Devices & Services → Add Integration → TP-Link. The integration auto-discovers Tapo plugs on the LAN via mDNS; if the P110M doesn't appear, add it by IP. Local-network KLAP is handled by the bundled python-kasa — no cloud account required. The integration enumerates switch + power-sensor cluster (energy total, voltage, current, frequency).

If tplink discovery fails or the specific firmware on the unit doesn't speak the local protocol, fall back to Matter (heads-up that energy-endpoint coverage via Matter is incomplete — only the cumulative-energy sensor enumerates; voltage/current/frequency don't):

Install the Matter Server add-on (Settings → Add-ons → Add-on Store → Matter Server → Install + Start)
Settings → Devices & Services → Add Integration → Matter
Hold the P110M button for ~5s until the LED pulses orange (commissioning mode)
In HA: Add Device (Matter) → scan QR code from the plug's bottom sticker (or enter the manual setup code printed below the QR)
Wait for commissioning (~30s)

Hue V2 bridge¶

Should auto-discover via mDNS — appear as a notification in Settings → Devices & Services. Click Configure → press the link button on the bridge → confirm. Bulbs + rooms + scenes import automatically. The V2 bridge's event-stream endpoint is used (push, not polling) — confirm by triggering a bulb from the Hue app and watching for the event in HA's Developer Tools → Events.

A1 Mini — flip to LAN Mode + Developer Mode¶

Two separate toggles on the printer:

Settings → Network → LAN Mode — toggle on, note the access code displayed
Settings → Network → Developer Mode — toggle on (this is what permits MQTT, in addition to LAN Mode)

Without both, MQTT writes are blocked under firmware ≥ 01.05.x and ha-bambulab falls back to read-only.

`ha-bambulab` via HACS¶

HACS → Integrations → search Bambu Lab → Download → restart HA. Then Settings → Devices & Services → Add Integration → Bambu Lab → enter:

Printer IP (LAN address from the A1's network screen)
Access code (from LAN Mode toggle)
Serial (from printer's About screen)

Choose LAN Mode (Local MQTT) for the connection mode. The integration adds ~30 entities (printer state, AMS slots if equipped, chamber temp, current print, etc.).

Hold point — declare 6B.2 done¶

HACS shows installed and connected to GitHub
P110M paired via tplink (or Matter fallback) and toggling from HA UI confirms physical state change on the plug
Hue bridge integration shows all bulbs; toggling from Hue app updates HA within < 1s
ha-bambulab integration shows printer as Idle or Ready; print-state changes propagate

Stage 6B.3 — Edge integration¶

Add the vhost to `host_vars/edge.yml`¶

In ~/homelab-ansible/inventory/host_vars/edge.yml, append home.rampancy.cloud to the caddy_vhosts list, mirroring the existing git.rampancy.cloud pattern (upstream 192.168.1.241:8123, no auth — HA handles auth itself).

# on CT 104
cd ~/homelab-ansible
ansible-playbook playbooks/edge.yml --check --diff --limit edge   # preview
ansible-playbook playbooks/edge.yml --limit edge                  # apply

Add Cloudflare DNS record¶

home.rampancy.cloud CNAME → rampancy.cloud (apex). The wildcard LE cert covers it; CrowdSec coverage automatic via the existing wildcard handler.

Configure HA's reverse-proxy trust¶

In HA: Settings → Add-ons → File Editor → install + start (or use Studio Code Server again). Edit /config/configuration.yaml, add:

http:
  use_x_forwarded_for: true
  trusted_proxies:
    - 192.168.1.244   # edge / Caddy

Without this, HA refuses the proxied connection — the symptom is HTTP 400 "Bad Request" from any request bearing an X-Forwarded-For header (with the server log carrying the "A request from a reverse proxy was received from [edge IP], but your HTTP integration is not set up for reverse proxies" message). The Caddy side returns clean (200 to its own request, via: 1.1 Caddy); the 400 surfaces only at the browser. Easy to misread as a Caddy/edge config error.

Restart HA (Settings → System → Restart). Then verify the restart actually took effect before moving on to the Caddy apply — ha core check returning "Command completed successfully" is necessary but not sufficient (it validates the file; it doesn't prove the running HA Core picked it up):

# on proxfold — confirm HA Core's container start time is newer than configuration.yaml mtime
ssh root@192.168.1.250 'qm guest exec 110 --timeout 10 -- /bin/sh -c "docker inspect homeassistant --format \"{{.State.StartedAt}}\""'
ssh root@192.168.1.250 'qm guest exec 110 --timeout 10 -- /bin/sh -c "stat -c %y /mnt/data/supervisor/homeassistant/configuration.yaml"'
# StartedAt must be > the config mtime. If not, restart didn't fire — re-restart via
# UI or drive directly: `qm guest exec 110 -- /bin/sh -c "ha core restart"`

Hold point — declare 6B.3 done¶

# from any LAN host
curl -sI "https://home.rampancy.cloud/" | head -3
# expect: HTTP/2 200 (or 302 to /auth/authorize), valid LE cert, no x-frame-options error

End-to-end validation from cellular (replicating the edge-cutover and crowdsec-validation pattern): disable Wi-Fi on phone → open the HA iOS app → connect via https://home.rampancy.cloud → confirm full UI loads.

Stage 6B.4 — Dashboard + automations + close-out¶

Discord notification target¶

Add to configuration.yaml:

notify:
  - name: homelab_ops
    platform: rest
    resource: !secret discord_webhook_homelab_ops
    method: POST_JSON
    data:
      username: "Home Assistant"
      content: "{{ message }}"

The data / data_template split syntax was deprecated in HA years ago — current HA puts templates directly inside data: and the engine handles them transparently.

Store the webhook URL in HA's secrets.yaml (lifted from vault_discord_webhook_homelab_ops). Mirrors the PVE 9 / ZED / Beszel webhook pattern.

Automations¶

Sunset → Hue lights on: built-in Sun trigger + Hue scene activate (use Hue's own scenes rather than per-bulb sets — keeps Hue app and HA aligned).
Print complete → Discord: trigger on sensor.a1_mini_current_stage transition to idle from a printing state; action notify.homelab_ops with print name + duration.
Print failed → Discord: trigger on sensor.a1_mini_print_error not-empty; same notify target.

Dashboard¶

One Lovelace view, three cards:

Lights — Hue rooms + scenes (mushroom-light cards if HACS Mushroom is added; otherwise built-in light cards)
Printer — A1 Mini state, chamber temp, current print progress, P110M energy draw (printer is plugged into the P110M, so its power consumption tracks with print state)
System — HA Core version, last backup time, Supervisor status

Close-out — docs sweep¶

Per the sync-docs pattern (any environment change is a docs update — see arrstack CLAUDE.md):

roadmap.md — flip Phase 6B sub-stages to - [x], replace the section header with !!! success "Completed YYYY-MM-DD"
changelog.md — dated entry under current month
This runbook — append Lessons from the <date> run section
hosts/hass/index.md — new page, mirrors hosts/forgejo/index.md shape (sealed-appliance caveat called out, references this runbook)
services/home-assistant.md — new page, surface the integration list + dashboard structure
mkdocs.yml — add nav entries for the two new docs above
reference/accepted-risks.md — new entry for HAOS opacity to drift detection (see roadmap §6B note for canonical wording)
ansible/roles/ — no role page additions (HAOS is not Ansible-managed); host_vars/edge.yml change is captured by the existing edge role page if vhost list is enumerated there

Lessons from the 2026-05-23 run¶

Stages 6B.1 and 6B.3 executed cleanly on 2026-05-23; 6B.2 was deferred to a hands-on user session. Nine scaffold-drift findings were caught between the pre-execution sanity check and the run itself, and all were folded back into the body of this runbook above so it's correct for re-runs. They're summarised here in narrative form for future-reader context.

Pre-execution sanity-check findings (3)¶

HAOS pin was one release stale. Scaffold pinned 17.2 (2026-04-07); current stable on the day of execution was 17.3 (2026-05-06 — pure security release, kernel 6.12.85 for CVE-2026-31431, no breaking changes). Bumped the pin at execution time. Lesson for the runbook: state "current stable on the day of execution" rather than a literal version that ages.
Tapo P110M Matter path was shakier than the scaffold implied. Sanity-check turned up HA core#149847 (Matter P110M power endpoints not enumerating as sensors), core#112639 (Matter Server update broke P110M connection), and recurrent community "Something went wrong" pairing reports. Even when Matter works, voltage/current/frequency don't enumerate — only cumulative energy. Meanwhile python-kasa (driving the tplink integration) now handles KLAP locally without cloud creds, removing the original rationale for preferring Matter. Reversed the fallback order before execution.
Scaffold's Discord notify YAML used deprecated split syntax. The data: / data_template: split was deprecated in HA years ago; current HA puts templates directly inside data:. Would have thrown a config warning at minimum on first restart. Unified before execution.

Execution-time findings (6)¶

HAOS publishes no .sha256 sidecars (nor a SHASUMS file). Scaffold's "verify against the .sha256 sidecar" step was impossible — the GitHub release lists only the raw asset files, nothing else. TLS-only download is the upstream precedent (community-scripts and ProxmoxVE-helper-scripts both do TLS-only too). Runbook now records the local sha256 post-download for rebuild-kit reproducibility instead.
Hold-point HTTP probe used curl -sI (HEAD) — HA only allows GET on / and returns 405. Looked like a failure on first read of the output; was just a wrong-method response. Fixed to curl -s -o /dev/null -w "%{http_code}\n".
qm guest ping <vmid> is not a valid PVE 9.2 command. Correct syntax is qm agent <vmid> ping. Scaffold likely carried over an older PVE-version syntax.
qm disk resize 110 scsi0 32G is a no-op on the HAOS OVA. The qcow2 already ships with a 32 GiB virtual size, so the resize doesn't grow anything on 17.x. Kept the step as defensive insurance (cheap, and forward-compatible if upstream ever ships a different default) but documented its no-op nature.
HAOS first boot from qm start to HTTP-200 ready was ~90 s. The scaffold said "~5 minutes." Faster on local-zfs than the round-number conservative estimate suggested. Updated the runbook to give the observed value with the conservative budget as the upper bound.
HA Core restart didn't actually fire when the trusted_proxies edit was made via File Editor + UI "Restart Home Assistant." Symptom: end-to-end through Caddy returned HTTP 400 "Bad Request" for any X-Forwarded-For-bearing request, with a clean 200 response on a HEAD via Caddy that confused the picture (HEAD returns 405 from HA's GET-only / regardless of proxy trust). Root cause: the UI restart click was missed; ha core check reported the config valid but docker inspect homeassistant --format '{{.State.StartedAt}}' showed HA Core was still running from the initial onboarding boot, ~4 hours earlier. Drove ha core restart via qm guest exec directly to fix. Runbook now requires verifying StartedAt > configuration.yaml mtime before declaring 6B.3 done.

Things that worked exactly as scaffolded¶

pre-enrolled-keys=0 on the EFI disk — HAOS still doesn't sign its bootloader, omitting this flag still puts the VM in a boot loop. Captured correctly.
Q35 / OVMF / virtio-scsi-single / agent enabled=1 — all current.
The Caddy vhost addition pattern (append to caddy_proxy_hosts in host_vars/edge.yml, push, pull on CT 104, --check --diff, apply) was clean: ok=33 changed=2, single-task render + reload handler, 22 s playbook runtime, validate task passed before reload.
CrowdSec coverage on the new vhost was automatic via the existing wildcard handler in the Caddyfile template ({% if caddy_crowdsec_enabled %} block per-vhost).
LE wildcard cert auto-covered home.rampancy.cloud — no per-vhost cert work.

Per-LAN hairpin: bouncer testing only meaningful from cellular¶

LAN-side curl against https://home.rampancy.cloud/ was reflected back through the UDM, which rewrites the source IP to the router (Caddy access log records client_ip: 192.168.1.1). That masks any CrowdSec bouncer behaviour — a banned IP test from LAN would appear "not blocked" purely because the request never reaches Caddy from the actual client IP. End-to-end validation was done from cellular per the existing edge-cutover and crowdsec-validation pattern, where the request traverses the real WAN path.

Deferred to a later session¶

6B.2 — Core integrations (HACS, Hue, Tapo, Bambu): user driving hands-on. Will likely surface additional findings on the integration side, particularly the A1 Mini LAN Mode + Developer Mode toggle order and ha-bambulab HACS install path. Append a 6B.2 follow-up section here when run.
6B.4 — Dashboard, automations, host page, service page, accepted-risks entry: blocked on 6B.2.