Home Assistant setup — Phase 6B¶
Stand up HAOS as a sealed Proxmox VM (deliberate departure from the Debian + apt + Ansible baseline that covers every other guest), integrate the Hue V2 bridge + Tapo P110M via Matter + Bambu A1 Mini via HACS, front via Caddy edge at home.rampancy.cloud, build a starter dashboard + automations.
Status: 6B.1 + 6B.3 executed 2026-05-23; 6B.2 deferred (user driving hands-on); 6B.4 partial
Scaffold-drift findings from the 2026-05-23 run have been folded back into the body of this runbook so it's correct for re-runs. The Lessons from the 2026-05-23 run appendix at the bottom captures the same findings in narrative form.
Stages¶
| Stage | Scope | Hold points |
|---|---|---|
| 6B.1 | HAOS VM stand-up | After qm importdisk — verify the resulting disk-volume name before qm set scsi0; before declaring HA reachable |
| 6B.2 | Core integrations (HACS + Matter + Hue + Bambu) | Before flipping the A1 Mini to Developer Mode (changes the trust posture of the printer); before adding HACS-installed integrations |
| 6B.3 | Edge integration | Before adding trusted_proxies to HA — without it the connection breaks rather than degrades; verify cellular access before declaring done |
| 6B.4 | Dashboard + automations + docs sweep | None high-risk |
Cross-phase decisions¶
- Install method: manual
qm create+ qcow2 import, not the community-scripts HAOS installer. Manual matches existing precedent (n8n VM 108, forgejo CT 109), keeps the HAOS version explicitly pinned in this runbook (the script pullslatest), and avoidsbash <(curl ...)as root on proxfold. The script's correctness wins (Q35, OVMF,pre-enrolled-keys=0, EFI disk) are captured by writing theqm createinvocation explicitly with all flags inline. - HAOS version pin: 17.3 (updated from scaffold's 17.2 at execution time on 2026-05-23). Pin to whatever's current stable on the day of execution — install-time pinning, not ongoing. HAOS Supervisor self-updates HA Core + add-ons + the OS layer afterward. The pin protects the cold-start state for rebuild-kit reproducibility and dodges known fresh-install regressions. Ongoing pinning is intentionally not adopted — Supervisor's update cadence is well-behaved and the maintenance overhead of policing it isn't worth it.
- No Ansible coverage of the VM. HAOS doesn't take cloud-init, has no SSH or apt, and the
common/security/auto_updates/beszel_agentroles all assume Debian.host_vars/hass.ymlexists only for the Caddy vhost and as a documentation anchor. The VM is not added toinventory/hosts.yml— no playbook should target it. - Tapo via
tplinkintegration primary, Matter as fallback. Revised from scaffold's Matter-primary at the 2026-05-23 sanity-check: community + upstream HA-core issues (core#149847 — Matter P110M power endpoints not enumerating, core#112639 — Matter Server update broke P110M, recurrent "Something went wrong" pairing failures) show Matter pairing for the P110M is flaky and energy-endpoint enumeration incomplete via Matter — only the total-energy sensors enumerate; voltage/current/frequency are missing.python-kasa(driving thetplinkintegration) now handles KLAP locally without cloud creds, so the original rationale for preferring Matter (avoid cloud-credential sync) no longer holds. Use Matter only iftplinkpairing fails on the specific firmware shipped on the unit. - HACS as managed dependency. Required for
ha-bambulab(no Core-integration path). Documented as an additional update cadence outside HA Core. Bambu's own Home Assistant integration story is in flux; HACS keeps us on the actively-maintainedgreghesp/ha-bambulabcodebase. - Wazuh agent deferred to Phase 7B. BeardedTinker's HAOS rule pack integrates API-side and doesn't require an in-VM agent.
Pre-flight gates¶
- VMID 110 free (
qm liston proxfold) - 192.168.1.241 free (no DHCP lease, no static reservation in UDM)
-
local-zfshas > 35 GB free (32 GB disk + EFI overhead) - HAOS qcow2 download verified — pin to current stable release at execution time (
haos_ova-<version>.qcow2.xzfrom the HAOS releases page) - PBS daily job's namespace covers root (it does — confirmed in pbs role)
- Hue V2 bridge IP known (
http://<bridge-ip>/description.xmlreturns<modelNumber>BSB002</modelNumber>) - Tapo P110M powered on the same VLAN as HA (LAN-1 / vmbr0); plug LED solid
- Bambu A1 Mini accessible from the HA VM's subnet (no VLAN boundary)
- Discord webhook URL for
#homelab-opsavailable in vault asvault_discord_webhook_homelab_ops(already stored — used by PVE 9 / Beszel / drift detection)
Stage 6B.1 — HAOS VM stand-up¶
Download the qcow2¶
# on proxfold
HAOS_VERSION=17.3 # pin to current stable at execution time
cd /var/lib/vz/template/iso
wget "https://github.com/home-assistant/operating-system/releases/download/${HAOS_VERSION}/haos_ova-${HAOS_VERSION}.qcow2.xz"
# HAOS does NOT publish per-asset .sha256 sidecars (nor a SHASUMS file) — TLS-only
# trust is the upstream precedent (community-scripts + ProxmoxVE-helper-scripts both
# do TLS-only too). Record the local sha256 for rebuild-kit reproducibility:
sha256sum "haos_ova-${HAOS_VERSION}.qcow2.xz"
unxz "haos_ova-${HAOS_VERSION}.qcow2.xz"
ls -lh "haos_ova-${HAOS_VERSION}.qcow2"
Create the VM shell¶
# on proxfold
qm create 110 \
--name hass \
--machine q35 \
--bios ovmf \
--cpu host --cores 2 --sockets 1 \
--memory 4096 \
--net0 virtio,bridge=vmbr0,firewall=0 \
--ostype l26 \
--onboot 1 \
--agent enabled=1 \
--efidisk0 local-zfs:0,efitype=4m,pre-enrolled-keys=0
pre-enrolled-keys=0 is the critical flag — HAOS does not sign its bootloader, and Pre-Enroll Keys = ON puts the VM into a boot loop that's diagnostically opaque (no console error, just resets).
Import the qcow2 + attach as scsi0¶
qm importdisk 110 "/var/lib/vz/template/iso/haos_ova-${HAOS_VERSION}.qcow2" local-zfs
# → Note the resulting volume name from output, typically vm-110-disk-1
# (vm-110-disk-0 is the EFI disk created above)
qm set 110 \
--scsihw virtio-scsi-single \
--scsi0 local-zfs:vm-110-disk-1,discard=on,ssd=1 \
--boot order=scsi0
# Defensive — the HAOS OVA qcow2 already ships with a 32 GiB virtual size, so on
# 17.x this is effectively a no-op. Keep it; cheap insurance if upstream ever
# changes the default.
qm disk resize 110 scsi0 32G
Hold point — verify config before first boot¶
qm config 110 | grep -E "machine|bios|efidisk|scsi0|boot:"
# Expected:
# bios: ovmf
# boot: order=scsi0
# efidisk0: local-zfs:vm-110-disk-0,efitype=4m,pre-enrolled-keys=0,...
# machine: q35
# scsi0: local-zfs:vm-110-disk-1,discard=on,size=32G,ssd=1
First boot + onboarding¶
qm start 110
# Watch the console (Proxmox UI → VM 110 → Console) for first boot. On local-zfs
# HAOS 17.3 went from `qm start` to HTTP-200 ready in ~90 seconds on the 2026-05-23
# run; budget up to ~5 minutes on slower storage or first-ever HA-Core-image pull.
When the console shows Welcome to Home Assistant, hit http://<dhcp-assigned-ip>:8123 from a browser. Find the DHCP-assigned IP via UDM Network UI (Clients filtered by hostname homeassistant) or:
# on proxfold
qm guest cmd 110 network-get-interfaces 2>/dev/null | jq -r '.[] | .["ip-addresses"][]?["ip-address"]' | grep '^192'
Run through the HA onboarding wizard (create owner account → location → unit prefs → analytics opt-out) but stop before the integrations step — we'll do those in 6B.2.
Set static IP¶
HAOS does not consume cloud-init or /etc/network/interfaces. Set the static IP from the HA UI:
Settings → System → Network → IPv4 → Static — set address 192.168.1.241/24, gateway 192.168.1.1, DNS 192.168.1.1. Save and wait for HAOS to apply (the UI will reconnect on the new address).
Hold point — declare 6B.1 done¶
# from any LAN host
# (use GET, not HEAD — HA returns 405 Method Not Allowed on HEAD /)
curl -s -o /dev/null -w "%{http_code}\n" "http://192.168.1.241:8123/" # expect: 200
ssh root@192.168.1.250 "qm agent 110 ping && echo guest-agent-OK"
# (the older `qm guest ping <vmid>` syntax is not valid in PVE 9 — `qm agent <vmid> ping` is)
PBS job picks up the new VM that night at 02:00 ACST. Verify the next morning via PBS UI.
Stage 6B.2 — Core integrations¶
HACS bootstrap¶
Install the Studio Code Server add-on (Settings → Add-ons → Add-on Store → Studio Code Server → Install + Start). Open it; in the integrated terminal:
Restart Home Assistant (Settings → System → Restart). After restart: Settings → Devices & Services → Add Integration → search "HACS" → follow the GitHub-OAuth flow.
Tapo P110M — tplink integration primary¶
Settings → Devices & Services → Add Integration → TP-Link. The integration auto-discovers Tapo plugs on the LAN via mDNS; if the P110M doesn't appear, add it by IP. Local-network KLAP is handled by the bundled python-kasa — no cloud account required. The integration enumerates switch + power-sensor cluster (energy total, voltage, current, frequency).
If tplink discovery fails or the specific firmware on the unit doesn't speak the local protocol, fall back to Matter (heads-up that energy-endpoint coverage via Matter is incomplete — only the cumulative-energy sensor enumerates; voltage/current/frequency don't):
- Install the Matter Server add-on (Settings → Add-ons → Add-on Store → Matter Server → Install + Start)
- Settings → Devices & Services → Add Integration → Matter
- Hold the P110M button for ~5s until the LED pulses orange (commissioning mode)
- In HA: Add Device (Matter) → scan QR code from the plug's bottom sticker (or enter the manual setup code printed below the QR)
- Wait for commissioning (~30s)
Hue V2 bridge¶
Should auto-discover via mDNS — appear as a notification in Settings → Devices & Services. Click Configure → press the link button on the bridge → confirm. Bulbs + rooms + scenes import automatically. The V2 bridge's event-stream endpoint is used (push, not polling) — confirm by triggering a bulb from the Hue app and watching for the event in HA's Developer Tools → Events.
A1 Mini — flip to LAN Mode + Developer Mode¶
Two separate toggles on the printer:
- Settings → Network → LAN Mode — toggle on, note the access code displayed
- Settings → Network → Developer Mode — toggle on (this is what permits MQTT, in addition to LAN Mode)
Without both, MQTT writes are blocked under firmware ≥ 01.05.x and ha-bambulab falls back to read-only.
ha-bambulab via HACS¶
HACS → Integrations → search Bambu Lab → Download → restart HA. Then Settings → Devices & Services → Add Integration → Bambu Lab → enter:
- Printer IP (LAN address from the A1's network screen)
- Access code (from LAN Mode toggle)
- Serial (from printer's About screen)
Choose LAN Mode (Local MQTT) for the connection mode. The integration adds ~30 entities (printer state, AMS slots if equipped, chamber temp, current print, etc.).
Hold point — declare 6B.2 done¶
- HACS shows installed and connected to GitHub
- P110M paired via
tplink(or Matter fallback) and toggling from HA UI confirms physical state change on the plug - Hue bridge integration shows all bulbs; toggling from Hue app updates HA within < 1s
-
ha-bambulabintegration shows printer asIdleorReady; print-state changes propagate
Stage 6B.3 — Edge integration¶
Add the vhost to host_vars/edge.yml¶
In ~/homelab-ansible/inventory/host_vars/edge.yml, append home.rampancy.cloud to the caddy_vhosts list, mirroring the existing git.rampancy.cloud pattern (upstream 192.168.1.241:8123, no auth — HA handles auth itself).
# on CT 104
cd ~/homelab-ansible
ansible-playbook playbooks/edge.yml --check --diff --limit edge # preview
ansible-playbook playbooks/edge.yml --limit edge # apply
Add Cloudflare DNS record¶
home.rampancy.cloud CNAME → rampancy.cloud (apex). The wildcard LE cert covers it; CrowdSec coverage automatic via the existing wildcard handler.
Configure HA's reverse-proxy trust¶
In HA: Settings → Add-ons → File Editor → install + start (or use Studio Code Server again). Edit /config/configuration.yaml, add:
Without this, HA refuses the proxied connection — the symptom is HTTP 400 "Bad Request" from any request bearing an X-Forwarded-For header (with the server log carrying the "A request from a reverse proxy was received from [edge IP], but your HTTP integration is not set up for reverse proxies" message). The Caddy side returns clean (200 to its own request, via: 1.1 Caddy); the 400 surfaces only at the browser. Easy to misread as a Caddy/edge config error.
Restart HA (Settings → System → Restart). Then verify the restart actually took effect before moving on to the Caddy apply — ha core check returning "Command completed successfully" is necessary but not sufficient (it validates the file; it doesn't prove the running HA Core picked it up):
# on proxfold — confirm HA Core's container start time is newer than configuration.yaml mtime
ssh root@192.168.1.250 'qm guest exec 110 --timeout 10 -- /bin/sh -c "docker inspect homeassistant --format \"{{.State.StartedAt}}\""'
ssh root@192.168.1.250 'qm guest exec 110 --timeout 10 -- /bin/sh -c "stat -c %y /mnt/data/supervisor/homeassistant/configuration.yaml"'
# StartedAt must be > the config mtime. If not, restart didn't fire — re-restart via
# UI or drive directly: `qm guest exec 110 -- /bin/sh -c "ha core restart"`
Hold point — declare 6B.3 done¶
# from any LAN host
curl -sI "https://home.rampancy.cloud/" | head -3
# expect: HTTP/2 200 (or 302 to /auth/authorize), valid LE cert, no x-frame-options error
End-to-end validation from cellular (replicating the edge-cutover and crowdsec-validation pattern): disable Wi-Fi on phone → open the HA iOS app → connect via https://home.rampancy.cloud → confirm full UI loads.
Stage 6B.4 — Dashboard + automations + close-out¶
Discord notification target¶
Add to configuration.yaml:
notify:
- name: homelab_ops
platform: rest
resource: !secret discord_webhook_homelab_ops
method: POST_JSON
data:
username: "Home Assistant"
content: "{{ message }}"
The data / data_template split syntax was deprecated in HA years ago — current HA puts templates directly inside data: and the engine handles them transparently.
Store the webhook URL in HA's secrets.yaml (lifted from vault_discord_webhook_homelab_ops). Mirrors the PVE 9 / ZED / Beszel webhook pattern.
Automations¶
- Sunset → Hue lights on: built-in Sun trigger + Hue scene activate (use Hue's own scenes rather than per-bulb sets — keeps Hue app and HA aligned).
- Print complete → Discord: trigger on
sensor.a1_mini_current_stagetransition toidlefrom a printing state; actionnotify.homelab_opswith print name + duration. - Print failed → Discord: trigger on
sensor.a1_mini_print_errornot-empty; same notify target.
Dashboard¶
One Lovelace view, three cards:
- Lights — Hue rooms + scenes (mushroom-light cards if HACS Mushroom is added; otherwise built-in
lightcards) - Printer — A1 Mini state, chamber temp, current print progress, P110M energy draw (printer is plugged into the P110M, so its power consumption tracks with print state)
- System — HA Core version, last backup time, Supervisor status
Close-out — docs sweep¶
Per the sync-docs pattern (any environment change is a docs update — see arrstack CLAUDE.md):
-
roadmap.md— flip Phase 6B sub-stages to- [x], replace the section header with!!! success "Completed YYYY-MM-DD" -
changelog.md— dated entry under current month - This runbook — append Lessons from the <date> run section
-
hosts/hass/index.md— new page, mirrorshosts/forgejo/index.mdshape (sealed-appliance caveat called out, references this runbook) -
services/home-assistant.md— new page, surface the integration list + dashboard structure -
mkdocs.yml— add nav entries for the two new docs above -
reference/accepted-risks.md— new entry for HAOS opacity to drift detection (see roadmap §6B note for canonical wording) -
ansible/roles/— no role page additions (HAOS is not Ansible-managed);host_vars/edge.ymlchange is captured by the existing edge role page if vhost list is enumerated there
Lessons from the 2026-05-23 run¶
Stages 6B.1 and 6B.3 executed cleanly on 2026-05-23; 6B.2 was deferred to a hands-on user session. Nine scaffold-drift findings were caught between the pre-execution sanity check and the run itself, and all were folded back into the body of this runbook above so it's correct for re-runs. They're summarised here in narrative form for future-reader context.
Pre-execution sanity-check findings (3)¶
- HAOS pin was one release stale. Scaffold pinned 17.2 (2026-04-07); current stable on the day of execution was 17.3 (2026-05-06 — pure security release, kernel 6.12.85 for CVE-2026-31431, no breaking changes). Bumped the pin at execution time. Lesson for the runbook: state "current stable on the day of execution" rather than a literal version that ages.
- Tapo P110M Matter path was shakier than the scaffold implied. Sanity-check turned up HA core#149847 (Matter P110M power endpoints not enumerating as sensors), core#112639 (Matter Server update broke P110M connection), and recurrent community "Something went wrong" pairing reports. Even when Matter works, voltage/current/frequency don't enumerate — only cumulative energy. Meanwhile
python-kasa(driving thetplinkintegration) now handles KLAP locally without cloud creds, removing the original rationale for preferring Matter. Reversed the fallback order before execution. - Scaffold's Discord notify YAML used deprecated split syntax. The
data:/data_template:split was deprecated in HA years ago; current HA puts templates directly insidedata:. Would have thrown a config warning at minimum on first restart. Unified before execution.
Execution-time findings (6)¶
- HAOS publishes no
.sha256sidecars (nor a SHASUMS file). Scaffold's "verify against the .sha256 sidecar" step was impossible — the GitHub release lists only the raw asset files, nothing else. TLS-only download is the upstream precedent (community-scripts and ProxmoxVE-helper-scripts both do TLS-only too). Runbook now records the local sha256 post-download for rebuild-kit reproducibility instead. - Hold-point HTTP probe used
curl -sI(HEAD) — HA only allows GET on/and returns 405. Looked like a failure on first read of the output; was just a wrong-method response. Fixed tocurl -s -o /dev/null -w "%{http_code}\n". qm guest ping <vmid>is not a valid PVE 9.2 command. Correct syntax isqm agent <vmid> ping. Scaffold likely carried over an older PVE-version syntax.qm disk resize 110 scsi0 32Gis a no-op on the HAOS OVA. The qcow2 already ships with a 32 GiB virtual size, so the resize doesn't grow anything on 17.x. Kept the step as defensive insurance (cheap, and forward-compatible if upstream ever ships a different default) but documented its no-op nature.- HAOS first boot from
qm startto HTTP-200 ready was ~90 s. The scaffold said "~5 minutes." Faster on local-zfs than the round-number conservative estimate suggested. Updated the runbook to give the observed value with the conservative budget as the upper bound. - HA Core restart didn't actually fire when the trusted_proxies edit was made via File Editor + UI "Restart Home Assistant." Symptom: end-to-end through Caddy returned HTTP 400 "Bad Request" for any X-Forwarded-For-bearing request, with a clean 200 response on a HEAD via Caddy that confused the picture (HEAD returns 405 from HA's GET-only
/regardless of proxy trust). Root cause: the UI restart click was missed;ha core checkreported the config valid butdocker inspect homeassistant --format '{{.State.StartedAt}}'showed HA Core was still running from the initial onboarding boot, ~4 hours earlier. Droveha core restartviaqm guest execdirectly to fix. Runbook now requires verifyingStartedAt > configuration.yaml mtimebefore declaring 6B.3 done.
Things that worked exactly as scaffolded¶
pre-enrolled-keys=0on the EFI disk — HAOS still doesn't sign its bootloader, omitting this flag still puts the VM in a boot loop. Captured correctly.- Q35 / OVMF / virtio-scsi-single /
agent enabled=1— all current. - The Caddy vhost addition pattern (append to
caddy_proxy_hostsinhost_vars/edge.yml, push, pull on CT 104,--check --diff, apply) was clean:ok=33 changed=2, single-task render + reload handler, 22 s playbook runtime, validate task passed before reload. - CrowdSec coverage on the new vhost was automatic via the existing wildcard handler in the Caddyfile template (
{% if caddy_crowdsec_enabled %}block per-vhost). - LE wildcard cert auto-covered
home.rampancy.cloud— no per-vhost cert work.
Per-LAN hairpin: bouncer testing only meaningful from cellular¶
LAN-side curl against https://home.rampancy.cloud/ was reflected back through the UDM, which rewrites the source IP to the router (Caddy access log records client_ip: 192.168.1.1). That masks any CrowdSec bouncer behaviour — a banned IP test from LAN would appear "not blocked" purely because the request never reaches Caddy from the actual client IP. End-to-end validation was done from cellular per the existing edge-cutover and crowdsec-validation pattern, where the request traverses the real WAN path.
Deferred to a later session¶
- 6B.2 — Core integrations (HACS, Hue, Tapo, Bambu): user driving hands-on. Will likely surface additional findings on the integration side, particularly the A1 Mini LAN Mode + Developer Mode toggle order and
ha-bambulabHACS install path. Append a 6B.2 follow-up section here when run. - 6B.4 — Dashboard, automations, host page, service page, accepted-risks entry: blocked on 6B.2.