Skip to content

Home Assistant setup — Phase 6B

Stand up HAOS as a sealed Proxmox VM (deliberate departure from the Debian + apt + Ansible baseline that covers every other guest), integrate the Hue V2 bridge + Tapo P110M via Matter + Bambu A1 Mini via HACS, front via Caddy edge at home.rampancy.cloud, build a starter dashboard + automations.

Status: 6B.1 + 6B.3 executed 2026-05-23; 6B.2 deferred (user driving hands-on); 6B.4 partial

Scaffold-drift findings from the 2026-05-23 run have been folded back into the body of this runbook so it's correct for re-runs. The Lessons from the 2026-05-23 run appendix at the bottom captures the same findings in narrative form.

Stages

Stage Scope Hold points
6B.1 HAOS VM stand-up After qm importdisk — verify the resulting disk-volume name before qm set scsi0; before declaring HA reachable
6B.2 Core integrations (HACS + Matter + Hue + Bambu) Before flipping the A1 Mini to Developer Mode (changes the trust posture of the printer); before adding HACS-installed integrations
6B.3 Edge integration Before adding trusted_proxies to HA — without it the connection breaks rather than degrades; verify cellular access before declaring done
6B.4 Dashboard + automations + docs sweep None high-risk

Cross-phase decisions

  • Install method: manual qm create + qcow2 import, not the community-scripts HAOS installer. Manual matches existing precedent (n8n VM 108, forgejo CT 109), keeps the HAOS version explicitly pinned in this runbook (the script pulls latest), and avoids bash <(curl ...) as root on proxfold. The script's correctness wins (Q35, OVMF, pre-enrolled-keys=0, EFI disk) are captured by writing the qm create invocation explicitly with all flags inline.
  • HAOS version pin: 17.3 (updated from scaffold's 17.2 at execution time on 2026-05-23). Pin to whatever's current stable on the day of execution — install-time pinning, not ongoing. HAOS Supervisor self-updates HA Core + add-ons + the OS layer afterward. The pin protects the cold-start state for rebuild-kit reproducibility and dodges known fresh-install regressions. Ongoing pinning is intentionally not adopted — Supervisor's update cadence is well-behaved and the maintenance overhead of policing it isn't worth it.
  • No Ansible coverage of the VM. HAOS doesn't take cloud-init, has no SSH or apt, and the common / security / auto_updates / beszel_agent roles all assume Debian. host_vars/hass.yml exists only for the Caddy vhost and as a documentation anchor. The VM is not added to inventory/hosts.yml — no playbook should target it.
  • Tapo via tplink integration primary, Matter as fallback. Revised from scaffold's Matter-primary at the 2026-05-23 sanity-check: community + upstream HA-core issues (core#149847 — Matter P110M power endpoints not enumerating, core#112639 — Matter Server update broke P110M, recurrent "Something went wrong" pairing failures) show Matter pairing for the P110M is flaky and energy-endpoint enumeration incomplete via Matter — only the total-energy sensors enumerate; voltage/current/frequency are missing. python-kasa (driving the tplink integration) now handles KLAP locally without cloud creds, so the original rationale for preferring Matter (avoid cloud-credential sync) no longer holds. Use Matter only if tplink pairing fails on the specific firmware shipped on the unit.
  • HACS as managed dependency. Required for ha-bambulab (no Core-integration path). Documented as an additional update cadence outside HA Core. Bambu's own Home Assistant integration story is in flux; HACS keeps us on the actively-maintained greghesp/ha-bambulab codebase.
  • Wazuh agent deferred to Phase 7B. BeardedTinker's HAOS rule pack integrates API-side and doesn't require an in-VM agent.

Pre-flight gates

  • VMID 110 free (qm list on proxfold)
  • 192.168.1.241 free (no DHCP lease, no static reservation in UDM)
  • local-zfs has > 35 GB free (32 GB disk + EFI overhead)
  • HAOS qcow2 download verified — pin to current stable release at execution time (haos_ova-&lt;version&gt;.qcow2.xz from the HAOS releases page)
  • PBS daily job's namespace covers root (it does — confirmed in pbs role)
  • Hue V2 bridge IP known (http://&lt;bridge-ip&gt;/description.xml returns &lt;modelNumber&gt;BSB002&lt;/modelNumber&gt;)
  • Tapo P110M powered on the same VLAN as HA (LAN-1 / vmbr0); plug LED solid
  • Bambu A1 Mini accessible from the HA VM's subnet (no VLAN boundary)
  • Discord webhook URL for #homelab-ops available in vault as vault_discord_webhook_homelab_ops (already stored — used by PVE 9 / Beszel / drift detection)

Stage 6B.1 — HAOS VM stand-up

Download the qcow2

# on proxfold
HAOS_VERSION=17.3   # pin to current stable at execution time
cd /var/lib/vz/template/iso

wget "https://github.com/home-assistant/operating-system/releases/download/${HAOS_VERSION}/haos_ova-${HAOS_VERSION}.qcow2.xz"

# HAOS does NOT publish per-asset .sha256 sidecars (nor a SHASUMS file) — TLS-only
# trust is the upstream precedent (community-scripts + ProxmoxVE-helper-scripts both
# do TLS-only too). Record the local sha256 for rebuild-kit reproducibility:
sha256sum "haos_ova-${HAOS_VERSION}.qcow2.xz"

unxz "haos_ova-${HAOS_VERSION}.qcow2.xz"
ls -lh "haos_ova-${HAOS_VERSION}.qcow2"

Create the VM shell

# on proxfold
qm create 110 \
  --name hass \
  --machine q35 \
  --bios ovmf \
  --cpu host --cores 2 --sockets 1 \
  --memory 4096 \
  --net0 virtio,bridge=vmbr0,firewall=0 \
  --ostype l26 \
  --onboot 1 \
  --agent enabled=1 \
  --efidisk0 local-zfs:0,efitype=4m,pre-enrolled-keys=0

pre-enrolled-keys=0 is the critical flag — HAOS does not sign its bootloader, and Pre-Enroll Keys = ON puts the VM into a boot loop that's diagnostically opaque (no console error, just resets).

Import the qcow2 + attach as scsi0

qm importdisk 110 "/var/lib/vz/template/iso/haos_ova-${HAOS_VERSION}.qcow2" local-zfs
# → Note the resulting volume name from output, typically vm-110-disk-1
#   (vm-110-disk-0 is the EFI disk created above)

qm set 110 \
  --scsihw virtio-scsi-single \
  --scsi0 local-zfs:vm-110-disk-1,discard=on,ssd=1 \
  --boot order=scsi0

# Defensive — the HAOS OVA qcow2 already ships with a 32 GiB virtual size, so on
# 17.x this is effectively a no-op. Keep it; cheap insurance if upstream ever
# changes the default.
qm disk resize 110 scsi0 32G

Hold point — verify config before first boot

qm config 110 | grep -E "machine|bios|efidisk|scsi0|boot:"
# Expected:
#   bios: ovmf
#   boot: order=scsi0
#   efidisk0: local-zfs:vm-110-disk-0,efitype=4m,pre-enrolled-keys=0,...
#   machine: q35
#   scsi0: local-zfs:vm-110-disk-1,discard=on,size=32G,ssd=1

First boot + onboarding

qm start 110
# Watch the console (Proxmox UI → VM 110 → Console) for first boot. On local-zfs
# HAOS 17.3 went from `qm start` to HTTP-200 ready in ~90 seconds on the 2026-05-23
# run; budget up to ~5 minutes on slower storage or first-ever HA-Core-image pull.

When the console shows Welcome to Home Assistant, hit http://&lt;dhcp-assigned-ip&gt;:8123 from a browser. Find the DHCP-assigned IP via UDM Network UI (Clients filtered by hostname homeassistant) or:

# on proxfold
qm guest cmd 110 network-get-interfaces 2&gt;/dev/null | jq -r '.[] | .["ip-addresses"][]?["ip-address"]' | grep '^192'

Run through the HA onboarding wizard (create owner account → location → unit prefs → analytics opt-out) but stop before the integrations step — we'll do those in 6B.2.

Set static IP

HAOS does not consume cloud-init or /etc/network/interfaces. Set the static IP from the HA UI:

Settings → System → Network → IPv4 → Static — set address 192.168.1.241/24, gateway 192.168.1.1, DNS 192.168.1.1. Save and wait for HAOS to apply (the UI will reconnect on the new address).

Hold point — declare 6B.1 done

# from any LAN host
# (use GET, not HEAD — HA returns 405 Method Not Allowed on HEAD /)
curl -s -o /dev/null -w "%{http_code}\n" "http://192.168.1.241:8123/"   # expect: 200

ssh root@192.168.1.250 "qm agent 110 ping && echo guest-agent-OK"
# (the older `qm guest ping <vmid>` syntax is not valid in PVE 9 — `qm agent <vmid> ping` is)

PBS job picks up the new VM that night at 02:00 ACST. Verify the next morning via PBS UI.

Stage 6B.2 — Core integrations

HACS bootstrap

Install the Studio Code Server add-on (Settings → Add-ons → Add-on Store → Studio Code Server → Install + Start). Open it; in the integrated terminal:

wget -O - https://get.hacs.xyz | bash -

Restart Home Assistant (Settings → System → Restart). After restart: Settings → Devices & Services → Add Integration → search "HACS" → follow the GitHub-OAuth flow.

Settings → Devices & Services → Add Integration → TP-Link. The integration auto-discovers Tapo plugs on the LAN via mDNS; if the P110M doesn't appear, add it by IP. Local-network KLAP is handled by the bundled python-kasa — no cloud account required. The integration enumerates switch + power-sensor cluster (energy total, voltage, current, frequency).

If tplink discovery fails or the specific firmware on the unit doesn't speak the local protocol, fall back to Matter (heads-up that energy-endpoint coverage via Matter is incomplete — only the cumulative-energy sensor enumerates; voltage/current/frequency don't):

  1. Install the Matter Server add-on (Settings → Add-ons → Add-on Store → Matter Server → Install + Start)
  2. Settings → Devices & Services → Add Integration → Matter
  3. Hold the P110M button for ~5s until the LED pulses orange (commissioning mode)
  4. In HA: Add Device (Matter) → scan QR code from the plug's bottom sticker (or enter the manual setup code printed below the QR)
  5. Wait for commissioning (~30s)

Hue V2 bridge

Should auto-discover via mDNS — appear as a notification in Settings → Devices & Services. Click Configure → press the link button on the bridge → confirm. Bulbs + rooms + scenes import automatically. The V2 bridge's event-stream endpoint is used (push, not polling) — confirm by triggering a bulb from the Hue app and watching for the event in HA's Developer Tools → Events.

A1 Mini — flip to LAN Mode + Developer Mode

Two separate toggles on the printer:

  1. Settings → Network → LAN Mode — toggle on, note the access code displayed
  2. Settings → Network → Developer Mode — toggle on (this is what permits MQTT, in addition to LAN Mode)

Without both, MQTT writes are blocked under firmware ≥ 01.05.x and ha-bambulab falls back to read-only.

ha-bambulab via HACS

HACS → Integrations → search Bambu Lab → Download → restart HA. Then Settings → Devices & Services → Add Integration → Bambu Lab → enter:

  • Printer IP (LAN address from the A1's network screen)
  • Access code (from LAN Mode toggle)
  • Serial (from printer's About screen)

Choose LAN Mode (Local MQTT) for the connection mode. The integration adds ~30 entities (printer state, AMS slots if equipped, chamber temp, current print, etc.).

Hold point — declare 6B.2 done

  • HACS shows installed and connected to GitHub
  • P110M paired via tplink (or Matter fallback) and toggling from HA UI confirms physical state change on the plug
  • Hue bridge integration shows all bulbs; toggling from Hue app updates HA within < 1s
  • ha-bambulab integration shows printer as Idle or Ready; print-state changes propagate

Stage 6B.3 — Edge integration

Add the vhost to host_vars/edge.yml

In ~/homelab-ansible/inventory/host_vars/edge.yml, append home.rampancy.cloud to the caddy_vhosts list, mirroring the existing git.rampancy.cloud pattern (upstream 192.168.1.241:8123, no auth — HA handles auth itself).

# on CT 104
cd ~/homelab-ansible
ansible-playbook playbooks/edge.yml --check --diff --limit edge   # preview
ansible-playbook playbooks/edge.yml --limit edge                  # apply

Add Cloudflare DNS record

home.rampancy.cloud CNAME → rampancy.cloud (apex). The wildcard LE cert covers it; CrowdSec coverage automatic via the existing wildcard handler.

Configure HA's reverse-proxy trust

In HA: Settings → Add-ons → File Editor → install + start (or use Studio Code Server again). Edit /config/configuration.yaml, add:

http:
  use_x_forwarded_for: true
  trusted_proxies:
    - 192.168.1.244   # edge / Caddy

Without this, HA refuses the proxied connection — the symptom is HTTP 400 "Bad Request" from any request bearing an X-Forwarded-For header (with the server log carrying the "A request from a reverse proxy was received from [edge IP], but your HTTP integration is not set up for reverse proxies" message). The Caddy side returns clean (200 to its own request, via: 1.1 Caddy); the 400 surfaces only at the browser. Easy to misread as a Caddy/edge config error.

Restart HA (Settings → System → Restart). Then verify the restart actually took effect before moving on to the Caddy applyha core check returning "Command completed successfully" is necessary but not sufficient (it validates the file; it doesn't prove the running HA Core picked it up):

# on proxfold — confirm HA Core's container start time is newer than configuration.yaml mtime
ssh root@192.168.1.250 'qm guest exec 110 --timeout 10 -- /bin/sh -c "docker inspect homeassistant --format \"{{.State.StartedAt}}\""'
ssh root@192.168.1.250 'qm guest exec 110 --timeout 10 -- /bin/sh -c "stat -c %y /mnt/data/supervisor/homeassistant/configuration.yaml"'
# StartedAt must be > the config mtime. If not, restart didn't fire — re-restart via
# UI or drive directly: `qm guest exec 110 -- /bin/sh -c "ha core restart"`

Hold point — declare 6B.3 done

# from any LAN host
curl -sI "https://home.rampancy.cloud/" | head -3
# expect: HTTP/2 200 (or 302 to /auth/authorize), valid LE cert, no x-frame-options error

End-to-end validation from cellular (replicating the edge-cutover and crowdsec-validation pattern): disable Wi-Fi on phone → open the HA iOS app → connect via https://home.rampancy.cloud → confirm full UI loads.

Stage 6B.4 — Dashboard + automations + close-out

Discord notification target

Add to configuration.yaml:

notify:
  - name: homelab_ops
    platform: rest
    resource: !secret discord_webhook_homelab_ops
    method: POST_JSON
    data:
      username: "Home Assistant"
      content: "{{ message }}"

The data / data_template split syntax was deprecated in HA years ago — current HA puts templates directly inside data: and the engine handles them transparently.

Store the webhook URL in HA's secrets.yaml (lifted from vault_discord_webhook_homelab_ops). Mirrors the PVE 9 / ZED / Beszel webhook pattern.

Automations

  • Sunset → Hue lights on: built-in Sun trigger + Hue scene activate (use Hue's own scenes rather than per-bulb sets — keeps Hue app and HA aligned).
  • Print complete → Discord: trigger on sensor.a1_mini_current_stage transition to idle from a printing state; action notify.homelab_ops with print name + duration.
  • Print failed → Discord: trigger on sensor.a1_mini_print_error not-empty; same notify target.

Dashboard

One Lovelace view, three cards:

  1. Lights — Hue rooms + scenes (mushroom-light cards if HACS Mushroom is added; otherwise built-in light cards)
  2. Printer — A1 Mini state, chamber temp, current print progress, P110M energy draw (printer is plugged into the P110M, so its power consumption tracks with print state)
  3. System — HA Core version, last backup time, Supervisor status

Close-out — docs sweep

Per the sync-docs pattern (any environment change is a docs update — see arrstack CLAUDE.md):

  • roadmap.md — flip Phase 6B sub-stages to - [x], replace the section header with !!! success "Completed YYYY-MM-DD"
  • changelog.md — dated entry under current month
  • This runbook — append Lessons from the <date> run section
  • hosts/hass/index.md — new page, mirrors hosts/forgejo/index.md shape (sealed-appliance caveat called out, references this runbook)
  • services/home-assistant.md — new page, surface the integration list + dashboard structure
  • mkdocs.yml — add nav entries for the two new docs above
  • reference/accepted-risks.md — new entry for HAOS opacity to drift detection (see roadmap §6B note for canonical wording)
  • ansible/roles/ — no role page additions (HAOS is not Ansible-managed); host_vars/edge.yml change is captured by the existing edge role page if vhost list is enumerated there

Lessons from the 2026-05-23 run

Stages 6B.1 and 6B.3 executed cleanly on 2026-05-23; 6B.2 was deferred to a hands-on user session. Nine scaffold-drift findings were caught between the pre-execution sanity check and the run itself, and all were folded back into the body of this runbook above so it's correct for re-runs. They're summarised here in narrative form for future-reader context.

Pre-execution sanity-check findings (3)

  • HAOS pin was one release stale. Scaffold pinned 17.2 (2026-04-07); current stable on the day of execution was 17.3 (2026-05-06 — pure security release, kernel 6.12.85 for CVE-2026-31431, no breaking changes). Bumped the pin at execution time. Lesson for the runbook: state "current stable on the day of execution" rather than a literal version that ages.
  • Tapo P110M Matter path was shakier than the scaffold implied. Sanity-check turned up HA core#149847 (Matter P110M power endpoints not enumerating as sensors), core#112639 (Matter Server update broke P110M connection), and recurrent community "Something went wrong" pairing reports. Even when Matter works, voltage/current/frequency don't enumerate — only cumulative energy. Meanwhile python-kasa (driving the tplink integration) now handles KLAP locally without cloud creds, removing the original rationale for preferring Matter. Reversed the fallback order before execution.
  • Scaffold's Discord notify YAML used deprecated split syntax. The data: / data_template: split was deprecated in HA years ago; current HA puts templates directly inside data:. Would have thrown a config warning at minimum on first restart. Unified before execution.

Execution-time findings (6)

  • HAOS publishes no .sha256 sidecars (nor a SHASUMS file). Scaffold's "verify against the .sha256 sidecar" step was impossible — the GitHub release lists only the raw asset files, nothing else. TLS-only download is the upstream precedent (community-scripts and ProxmoxVE-helper-scripts both do TLS-only too). Runbook now records the local sha256 post-download for rebuild-kit reproducibility instead.
  • Hold-point HTTP probe used curl -sI (HEAD) — HA only allows GET on / and returns 405. Looked like a failure on first read of the output; was just a wrong-method response. Fixed to curl -s -o /dev/null -w "%{http_code}\n".
  • qm guest ping <vmid> is not a valid PVE 9.2 command. Correct syntax is qm agent <vmid> ping. Scaffold likely carried over an older PVE-version syntax.
  • qm disk resize 110 scsi0 32G is a no-op on the HAOS OVA. The qcow2 already ships with a 32 GiB virtual size, so the resize doesn't grow anything on 17.x. Kept the step as defensive insurance (cheap, and forward-compatible if upstream ever ships a different default) but documented its no-op nature.
  • HAOS first boot from qm start to HTTP-200 ready was ~90 s. The scaffold said "~5 minutes." Faster on local-zfs than the round-number conservative estimate suggested. Updated the runbook to give the observed value with the conservative budget as the upper bound.
  • HA Core restart didn't actually fire when the trusted_proxies edit was made via File Editor + UI "Restart Home Assistant." Symptom: end-to-end through Caddy returned HTTP 400 "Bad Request" for any X-Forwarded-For-bearing request, with a clean 200 response on a HEAD via Caddy that confused the picture (HEAD returns 405 from HA's GET-only / regardless of proxy trust). Root cause: the UI restart click was missed; ha core check reported the config valid but docker inspect homeassistant --format '{{.State.StartedAt}}' showed HA Core was still running from the initial onboarding boot, ~4 hours earlier. Drove ha core restart via qm guest exec directly to fix. Runbook now requires verifying StartedAt > configuration.yaml mtime before declaring 6B.3 done.

Things that worked exactly as scaffolded

  • pre-enrolled-keys=0 on the EFI disk — HAOS still doesn't sign its bootloader, omitting this flag still puts the VM in a boot loop. Captured correctly.
  • Q35 / OVMF / virtio-scsi-single / agent enabled=1 — all current.
  • The Caddy vhost addition pattern (append to caddy_proxy_hosts in host_vars/edge.yml, push, pull on CT 104, --check --diff, apply) was clean: ok=33 changed=2, single-task render + reload handler, 22 s playbook runtime, validate task passed before reload.
  • CrowdSec coverage on the new vhost was automatic via the existing wildcard handler in the Caddyfile template ({% if caddy_crowdsec_enabled %} block per-vhost).
  • LE wildcard cert auto-covered home.rampancy.cloud — no per-vhost cert work.

Per-LAN hairpin: bouncer testing only meaningful from cellular

LAN-side curl against https://home.rampancy.cloud/ was reflected back through the UDM, which rewrites the source IP to the router (Caddy access log records client_ip: 192.168.1.1). That masks any CrowdSec bouncer behaviour — a banned IP test from LAN would appear "not blocked" purely because the request never reaches Caddy from the actual client IP. End-to-end validation was done from cellular per the existing edge-cutover and crowdsec-validation pattern, where the request traverses the real WAN path.

Deferred to a later session

  • 6B.2 — Core integrations (HACS, Hue, Tapo, Bambu): user driving hands-on. Will likely surface additional findings on the integration side, particularly the A1 Mini LAN Mode + Developer Mode toggle order and ha-bambulab HACS install path. Append a 6B.2 follow-up section here when run.
  • 6B.4 — Dashboard, automations, host page, service page, accepted-risks entry: blocked on 6B.2.