Matrix maintenance — keeping the homeserver current¶
Runbook for the cadence of updating the Matrix server. Triggered monthly by the matrix_deploy_notifier role — that role POSTs pending upstream commits to #homelab-updates on the first Monday of each month, but never pulls or applies. Application is operator-driven, per upstream's maintenance-upgrading-services.md and the Phase 6E lessons appendix.
What updates and how¶
| Layer | Mechanism | Cadence | Notes |
|---|---|---|---|
| Matrix VM OS (Debian 13 trixie) | auto_updates role (unattended-upgrades, security-only, manual reboot) + needrestart |
Daily, automatic | See auto-updates role. auto_updates_notify_discord: true on this host → #homelab-ops ping when a reboot is pending |
| spantaleev playbook on CT 104 | Manual git pull of /root/matrix-deploy/ |
Monthly fetch + on-demand | The matrix_deploy_notifier timer fetches and notifies; you decide when to pull and apply |
| Docker images (Tuwunel, Traefik, LiveKit, …) | Manual — version-pinned in vars.yml; advance on apply |
On-demand, tied to a playbook apply | Spantaleev pins image tags so the 9-container compatibility matrix doesn't drift; :latest is never used here |
Standing apply procedure (triggered by a #homelab-updates notification)¶
Run from a terminal on CT 104 (192.168.1.245). All commands assume root.
1. Read the upstream changelog¶
The Discord embed lists pending commit subjects but not the full context. Open matrix-docker-ansible-deploy CHANGELOG.md and skim entries covering the date range of the pending commits.
What to look for: - Sections marked "Breaking change" / "Upgrading from x.y to x.z" — these are why the migration-validation gate exists - Image version bumps for components you actively use (Tuwunel, Element Call / LiveKit, Element Web) - Removed defaults or changed variable semantics
If you see a breaking change that affects your config, plan the variable update before running the playbook — the playbook will refuse to run until you do.
2. Pull the playbook + refresh galaxy roles¶
cd /root/matrix-deploy
git pull
rm -rf roles/galaxy
ansible-galaxy install -r requirements.yml -p roles/galaxy/ --force
The rm -rf roles/galaxy && --force reinstall is the upstream-recommended pattern. just update is equivalent if you prefer; both end at the same state.
3. Acknowledge any migration breakpoint¶
grep matrix_playbook_migration_expected_version \
/root/matrix-deploy/roles/custom/matrix-base/defaults/main.yml
grep matrix_playbook_migration_validated_version \
/root/matrix-deploy/inventory/host_vars/matrix.rampancy.cloud/vars.yml
If the two values differ, the playbook will fail with a message linking to the changelog entry for the new expected version. Read the linked entry, adapt your vars if needed, then bump matrix_playbook_migration_validated_version in inventory/host_vars/matrix.rampancy.cloud/vars.yml to match.
If the values already match, you can skip this step.
4. Apply¶
cd /root/matrix-deploy
ansible-playbook -i inventory/hosts setup.yml \
--tags=install-all,start \
--vault-password-file /root/.vault_pass \
2>&1 | tee /tmp/matrix-apply-$(date +%Y%m%d).log
Always pipe to tee + keep stderr
Bare pipes drop stderr. The 2>&1 | tee pattern (plus set -o pipefail if you're inside a script) is the only reliable way to catch failures — the Phase 6E run hit two cases where a silenced error would have masked a downstream cascade.
install-all,start is the right tag set for routine upgrades — it's 2–5× faster than setup-all,ensure-matrix-users-created,start per spantaleev's changelog. Use setup-all,… only when you've removed a component from vars.yml (Tuwunel admins are deactivated by the spantaleev playbook on setup-all, so don't reach for it casually).
5. Verify¶
# From CT 104, SSH to VM 111 to check container state
ssh darcyn@192.168.1.243 \
'sudo docker ps --filter name=matrix- --format "table {{.Names}}\t{{.Status}}"'
# Quick health probe
curl -s https://matrix.rampancy.cloud/_matrix/client/versions | jq .versions[0]
# Federation tester from outside (open this in a browser)
echo "https://federationtester.matrix.org/#rampancy.cloud"
All matrix-* containers should be Up (healthy) or Up <duration>. The federation tester should return all-green.
6. Clean up¶
Move the apply log somewhere durable if anything notable came out of it; otherwise it's fine to leave in /tmp/ (cleared on reboot).
What if the playbook fails mid-apply¶
Spantaleev's setup.yml is idempotent. Re-running after fixing the reported issue is the standard recovery — the playbook picks up where it failed.
Common failure shapes:
| Symptom | Likely cause | Fix |
|---|---|---|
| "Detected breaking change … validated version is X, expected version is Y" | New migration breakpoint upstream | Read the linked CHANGELOG entry, adapt vars, bump matrix_playbook_migration_validated_version to Y, re-run |
| Pull stalls / image not found | Registry blip or removed image tag | Re-run; if persistent, check the relevant matrix_*_version var against the image tag actually on the registry |
M_SENDER_IGNORED in Tuwunel logs after apply |
matrix_tuwunel_config_allowed_remote_server_names lost rampancy.cloud |
Re-add rampancy.cloud to the allowlist (the variable's name says "remote" but the implementation applies to all senders, including the local server). See Phase 6E lessons #1 |
For other failures, check journalctl -u matrix-tuwunel -f on VM 111 (ssh darcyn@192.168.1.243) and consult the matrix-setup runbook's Lessons appendix before going wider.
Skipping a month¶
Skipping a #homelab-updates notification is fine if you've read the changelog and there's nothing security-relevant. The notification will fire again next month with the cumulative pending list — the timer is fetch-only, so nothing accumulates badly.
If you're going to skip for >3 months, consider running through the changelog deliberately rather than letting it pile up — large jumps multiply the chance of hitting a migration breakpoint you'd rather see in isolation.
Related¶
- matrix_deploy_notifier role — the timer that triggers this runbook
- Matrix service page — homeserver details, admin tasks
- Matrix setup runbook — Phase 6E bring-up, lessons appendix
- auto-updates role — Debian package updates for the matrix VM
- spantaleev maintenance-upgrading-services.md — upstream cadence + rationale