Matrix maintenance — keeping the homeserver current¶

Runbook for the cadence of updating the Matrix server. Triggered monthly by the matrix_deploy_notifier role — that role POSTs pending upstream commits to #homelab-updates on the first Monday of each month, but never pulls or applies. Application is operator-driven, per upstream's maintenance-upgrading-services.md and the Phase 6E lessons appendix.

What updates and how¶

Layer	Mechanism	Cadence	Notes
Matrix VM OS (Debian 13 trixie)	`auto_updates` role (unattended-upgrades, security-only, manual reboot) + needrestart	Daily, automatic	See auto-updates role. `auto_updates_notify_discord: true` on this host → `#homelab-ops` ping when a reboot is pending
spantaleev playbook on CT 104	Manual `git pull` of `/root/matrix-deploy/`	Monthly fetch + on-demand	The `matrix_deploy_notifier` timer fetches and notifies; you decide when to pull and apply
Docker images (Tuwunel, Traefik, LiveKit, …)	Manual — version-pinned in `vars.yml`; advance on apply	On-demand, tied to a playbook apply	Spantaleev pins image tags so the 9-container compatibility matrix doesn't drift; `:latest` is never used here

Standing apply procedure (triggered by a `#homelab-updates` notification)¶

Run from a terminal on CT 104 (192.168.1.245). All commands assume root.

1. Read the upstream changelog¶

The Discord embed lists pending commit subjects but not the full context. Open matrix-docker-ansible-deploy CHANGELOG.md and skim entries covering the date range of the pending commits.

What to look for: - Sections marked "Breaking change" / "Upgrading from x.y to x.z" — these are why the migration-validation gate exists - Image version bumps for components you actively use (Tuwunel, Element Call / LiveKit, Element Web) - Removed defaults or changed variable semantics

If you see a breaking change that affects your config, plan the variable update before running the playbook — the playbook will refuse to run until you do.

2. Pull the playbook + refresh galaxy roles¶

cd /root/matrix-deploy
git pull
rm -rf roles/galaxy
ansible-galaxy install -r requirements.yml -p roles/galaxy/ --force

The rm -rf roles/galaxy && --force reinstall is the upstream-recommended pattern. just update is equivalent if you prefer; both end at the same state.

3. Acknowledge any migration breakpoint¶

grep matrix_playbook_migration_expected_version \
  /root/matrix-deploy/roles/custom/matrix-base/defaults/main.yml
grep matrix_playbook_migration_validated_version \
  /root/matrix-deploy/inventory/host_vars/matrix.rampancy.cloud/vars.yml

If the two values differ, the playbook will fail with a message linking to the changelog entry for the new expected version. Read the linked entry, adapt your vars if needed, then bump matrix_playbook_migration_validated_version in inventory/host_vars/matrix.rampancy.cloud/vars.yml to match.

If the values already match, you can skip this step.

4. Apply¶

cd /root/matrix-deploy
ansible-playbook -i inventory/hosts setup.yml \
  --tags=install-all,start \
  --vault-password-file /root/.vault_pass \
  2>&1 | tee /tmp/matrix-apply-$(date +%Y%m%d).log

Always pipe to tee + keep stderr

Bare pipes drop stderr. The 2>&1 | tee pattern (plus set -o pipefail if you're inside a script) is the only reliable way to catch failures — the Phase 6E run hit two cases where a silenced error would have masked a downstream cascade.

install-all,start is the right tag set for routine upgrades — it's 2–5× faster than setup-all,ensure-matrix-users-created,start per spantaleev's changelog. Use setup-all,… only when you've removed a component from vars.yml (Tuwunel admins are deactivated by the spantaleev playbook on setup-all, so don't reach for it casually).

5. Verify¶

# From CT 104, SSH to VM 111 to check container state
ssh darcyn@192.168.1.243 \
  'sudo docker ps --filter name=matrix- --format "table {{.Names}}\t{{.Status}}"'

# Quick health probe
curl -s https://matrix.rampancy.cloud/_matrix/client/versions | jq .versions[0]

# Federation tester from outside (open this in a browser)
echo "https://federationtester.matrix.org/#rampancy.cloud"

All matrix-* containers should be Up (healthy) or Up <duration>. The federation tester should return all-green.

6. Clean up¶

Move the apply log somewhere durable if anything notable came out of it; otherwise it's fine to leave in /tmp/ (cleared on reboot).

What if the playbook fails mid-apply¶

Spantaleev's setup.yml is idempotent. Re-running after fixing the reported issue is the standard recovery — the playbook picks up where it failed.

Common failure shapes:

Symptom	Likely cause	Fix
"Detected breaking change … validated version is X, expected version is Y"	New migration breakpoint upstream	Read the linked CHANGELOG entry, adapt vars, bump `matrix_playbook_migration_validated_version` to Y, re-run
Pull stalls / image not found	Registry blip or removed image tag	Re-run; if persistent, check the relevant `matrix_*_version` var against the image tag actually on the registry
`M_SENDER_IGNORED` in Tuwunel logs after apply	`matrix_tuwunel_config_allowed_remote_server_names` lost `rampancy.cloud`	Re-add `rampancy.cloud` to the allowlist (the variable's name says "remote" but the implementation applies to all senders, including the local server). See Phase 6E lessons #1

For other failures, check journalctl -u matrix-tuwunel -f on VM 111 (ssh darcyn@192.168.1.243) and consult the matrix-setup runbook's Lessons appendix before going wider.

Skipping a month¶

Skipping a #homelab-updates notification is fine if you've read the changelog and there's nothing security-relevant. The notification will fire again next month with the cumulative pending list — the timer is fetch-only, so nothing accumulates badly.

If you're going to skip for >3 months, consider running through the changelog deliberately rather than letting it pile up — large jumps multiply the chance of hitting a migration breakpoint you'd rather see in isolation.

matrix_deploy_notifier role — the timer that triggers this runbook
Matrix service page — homeserver details, admin tasks
Matrix setup runbook — Phase 6E bring-up, lessons appendix
auto-updates role — Debian package updates for the matrix VM
spantaleev maintenance-upgrading-services.md — upstream cadence + rationale