Backup & Restore¶
Architecture (post-Phase 5A + 5E)¶
Two scheduled jobs, both targeting PBS CT 105 (nas-primary datastore on the QNAP TS-269L via NFSv3):
| Job | Captures | Schedule | Mechanism | Retention |
|---|---|---|---|---|
pbs-daily |
All guests (LXC + VM, all /etc/pve/lxc/* and /etc/pve/qemu-server/*) |
02:00 daily | vzdump → PBS via PVE storage nas-primary |
global prune-job: 7d / 4w / 6m |
pbs-host-backup.timer |
proxfold host config (/etc, /root, /var/lib/pve-cluster) |
02:30 daily | proxmox-backup-client systemd timer (separate user, namespace host/proxfold) |
namespace-scoped prune-job: 14d / 8w / 12m / 2y |
The original CIFS path (nasbackup, mounted at /mnt/pve/nasbackup) is still registered as PVE storage but no scheduled job writes to it. It's kept for ad-hoc vzdump --storage nasbackup pushes (e.g. pre-decom snapshots, see VM 102 retirement 2026-05-03).
What ZFS data is not backed up
LXC ZFS mount points are excluded from vzdump — /stash media on CT 100 isn't captured by pbs-daily, only the OS + app configs. The host's data pool (stash, including all media) is not backed up anywhere; that's accepted risk on a homelab media server. The 4C ZFS boot mirror covers physical-disk failure for rpool; pbs-host-backup.timer covers config-level corruption on the host OS.
Manual backup (ad-hoc)¶
# All guests to PBS (matches the scheduled job, useful pre-change)
vzdump 100 101 104 105 106 107 108 109 --storage nas-primary --mode snapshot \
--notes-template "Manual backup - {{guestname}}"
# Single guest to CIFS (pre-decom or break-glass)
vzdump 102 --storage nasbackup --compress zstd --mode snapshot \
--notes-template "Pre-decom snapshot"
Verify backups¶
# PBS-side guest snapshots
ssh root@192.168.1.246 'proxmox-backup-manager task list --limit 10'
ssh root@192.168.1.246 'proxmox-backup-client snapshot list ct/<id>'
# Host-backup snapshots (Phase 5E)
ssh root@192.168.1.246 'proxmox-backup-client snapshot list --ns host/proxfold host/proxfold'
# CIFS dump dir
ls -lh /mnt/pve/nasbackup/dump/
Restore (guest)¶
List available snapshots¶
PBS UI at https://192.168.1.246:8007 (datastore nas-primary) is the easiest path. CLI equivalents:
# From proxfold (uses the registered PVE storage)
pvesm list nas-primary
# Or directly from PBS
ssh root@192.168.1.246 'proxmox-backup-client snapshot list ct/100'
Restore an LXC container from PBS¶
# Replace <volid> with the listing entry, e.g. nas-primary:backup/ct/100/2026-05-06T16:00:00Z
pct restore <vmid> <volid> --storage local-zfs
Restore a VM from PBS¶
Restore from the legacy CIFS path¶
Only relevant for backups predating Phase 5A or pre-decom snapshots intentionally pushed to nasbackup:
ls /mnt/pve/nasbackup/dump/
pct restore <vmid> /mnt/pve/nasbackup/dump/<file>.tar.zst --storage local-zfs
qmrestore /mnt/pve/nasbackup/dump/<file>.vma.zst <vmid> --storage local-zfs
Note
local-zfs has been the boot-drive-backed storage pool on proxfold since Phase 4C (2026-04-22). For historical restores taken against the pre-4C single-drive LVM install, pass --storage local-zfs anyway — the restore transparently redirects onto whatever pool currently exists.
Re-add ZFS mount points to Plex LXC after restore¶
The vzdump backup does not include Proxmox-level mount point configuration — these must be re-added manually after restoring the Plex LXC:
pct set 100 -mp0 /stash,mp=/stash
pct set 100 -mp1 /stash/plex-data,mp=/stash/plex-data
pct set 100 -features mount=nfs;cifs
Note
The stash/plex-data ZFS dataset (100G quota) must also exist with correct ownership (999:996). This is codified by the plex role — ansible-playbook playbooks/plex.yml creates the dataset via delegate_to: proxfold and sets the mount points. Only re-add mp0/mp1 manually if the role is unavailable (cold-start before CT104 restore).
Start and verify after restore¶
# Start containers and VMs
pct start 100
qm start 101
qm start 102
pct start 104
# Verify Plex can see media
pct exec 100 -- ls /mnt/plex/Movies/ | head -5
# Verify arrstack NFS mount
qm guest exec 101 -- df -h /stash
# Verify Docker containers are running
qm guest exec 101 -- docker ps
Restart services¶
# Plex LXC
pct restart 100
# Arrstack VM
qm restart 101
# Individual Docker containers (from inside arrstack)
docker restart sonarr radarr qbittorrent
# Redeploy full stack — push to GitHub, Dockhand picks up the change
Host-level file backup (Phase 5E)¶
Daily file-level backup of proxfold host config to PBS via proxmox-backup-client. Closes the gap that vzdump leaves: guests are captured by pbs-daily, but the host's own /etc, /root, and /var/lib/pve-cluster are not.
Architecture summary¶
- Client side —
roles/proxmox/tasks/host_backup.ymlrenders a credentials env file, a wrapper script, aoneshotsystemd service, and a daily timer. Activated whenpbs_host_backupis defined in host_vars andvault_pbs_host_token_secretexists in vault. - PBS side — separate user
host-backup@pbswith tokenhost-backup@pbs!proxfold, scoped via ACL to namespacehost/proxfoldon thenas-primarydatastore. Namespace-scoped prune-job applies its own (longer) retention. - Schedule —
02:30daily. Vzdump kickoff is02:00, recent runs complete in ~5 min, PBS prune is03:00. 02:30 is clear of both.
Bootstrap (one-shot, manual)¶
Run once after the homelab-ansible code lands but before the role activates. Steps (1)–(4) on PBS CT 105 (192.168.1.246), (5) on the WSL/CT104 control node.
# (1) On PBS CT 105 — create user
ssh root@192.168.1.246
proxmox-backup-manager user create host-backup@pbs --comment 'proxfold host file-level backup'
# Set a throwaway password when prompted; real auth is the token below.
# (2) Generate the API token (value shown ONCE)
proxmox-backup-manager user generate-token host-backup@pbs proxfold
# Capture the `value` field IMMEDIATELY — there is no retrieval path.
# Do NOT echo it to scrollback. Pipe to /dev/shm or copy directly into the
# vault-append script. See feedback memory: never_view_vault_to_scrollback.
# (3) Create the namespace
proxmox-backup-client namespace create host/proxfold \
--repository host-backup@pbs!proxfold@127.0.0.1:nas-primary
# Will prompt for the token value — paste the same one captured in (2).
# (4) Grant DatastoreBackup on the namespace to BOTH the user and the token
# auth-ids. The "token inherits from user" pattern documented in older versions
# of pbs.md is wrong — see [pbs role doc](../ansible/roles/pbs.md#gotchas-captured-during-execution).
# Without the token grant, the timer fails with "missing permissions
# 'Datastore.Backup'" even though `user permissions host-backup@pbs` looks fine.
proxmox-backup-manager acl update /datastore/nas-primary/host/proxfold \
DatastoreBackup --auth-id host-backup@pbs
proxmox-backup-manager acl update /datastore/nas-primary/host/proxfold \
DatastoreBackup --auth-id 'host-backup@pbs!proxfold'
# Verify BOTH resolve to a dict (not {}):
proxmox-backup-manager user permissions host-backup@pbs \
--path /datastore/nas-primary/host/proxfold --output-format json
proxmox-backup-manager user permissions 'host-backup@pbs!proxfold' \
--path /datastore/nas-primary/host/proxfold --output-format json
# (5) Namespace-scoped prune-job — longer retention than the global pbs-daily prune.
# Patterned after the official PBS prune example, tuned for homelab.
proxmox-backup-manager prune-job create nas-primary-host-prune \
--store nas-primary --ns host/proxfold --schedule '03:15' \
--keep-daily 14 --keep-weekly 8 --keep-monthly 12 --keep-yearly 2
Then on the control node:
# (6) Append vault entry — append-only pattern, no plaintext to scrollback.
# The token value from step (2) goes into vault_pbs_host_token_secret.
cd ~/homelab-ansible
TMP=$(mktemp -p /dev/shm vault-edit.XXXXXX)
ansible-vault decrypt --output "$TMP" group_vars/all/vault.yml
printf '\nvault_pbs_host_token_secret: %s\n' '<paste-token-value-here>' >> "$TMP"
ansible-vault encrypt --output group_vars/all/vault.yml "$TMP"
shred -u "$TMP"
# Verify the encrypt round-tripped (decrypts cleanly without dumping plaintext):
ansible-vault view group_vars/all/vault.yml > /dev/null && echo "vault decrypts OK"
# Don't echo the variable's value. Activation in step (7) confirms the entry
# parsed correctly: pre-step (6) the host_backup include is gated off; post-(6)
# `--check --diff` shows the new tasks evaluating against proxfold.
# (7) First role run — only the host_backup tasks
ansible-playbook playbooks/site.yml --limit proxfold --tags host_backup --diff
# (8) Smoke test the timer
ssh root@192.168.1.250 'systemctl start pbs-host-backup.service'
ssh root@192.168.1.250 'systemctl status pbs-host-backup.service'
# Verify the snapshot landed:
ssh root@192.168.1.246 \
'proxmox-backup-client snapshot list --ns host/proxfold host/proxfold'
Restore (host configs)¶
# List available snapshots
ssh root@192.168.1.250
proxmox-backup-client snapshot list --ns host/proxfold host/proxfold
# Mount a snapshot read-only (FUSE)
proxmox-backup-client mount host/proxfold/<snapshot> etc.pxar /mnt/restore
# Selective restore — DO NOT blanket-copy /etc; pick the specific files
cp /mnt/restore/pve/storage.cfg /etc/pve/storage.cfg
cp /mnt/restore/network/interfaces /etc/network/interfaces
# Unmount when done
fusermount -u /mnt/restore
Host backups are not bare-metal restore
These backups capture files, not a bootable image. Recovering a dead proxfold means: reinstall PVE, run the homelab-ansible bootstrap (rebuild runbook), then selectively restore from the snapshot. The 4C ZFS boot mirror is the bare-metal protection; this is the config-level protection.
What's actually in the snapshot¶
proxmox-backup-client doesn't traverse mount points by default, so /etc/pve (the pmxcfs FUSE mount) is not captured by etc.pxar. That's fine — /etc/pve is a synthesised view; the source of truth is /var/lib/pve-cluster/config.db, captured by pve-cluster.pxar. Restore path: install PVE on a fresh host, restore pve-cluster.pxar to /var/lib/pve-cluster, restart pve-cluster.service, and /etc/pve repopulates from the DB. If you ever want the live FUSE view captured directly, add --all-file-systems to the wrapper script — but that's redundant given the DB is the source of truth.
Lessons from the 2026-05-06 run¶
proxmox-backup-manager user generate-tokendoesn't accept--output-format. The CLI rejects it with "schema does not allow additional properties". Useproxmox-backup-debug api create /access/users/<userid>/token/<name> --output-format jsoninstead — same effect, returns the token value as JSON for clean parsing.proxmox-backup-manager namespacedoesn't exist (caught earlier in the housemate-access runbook too — same memory). Create namespaces viaproxmox-backup-debug api create /admin/datastore/<ds>/namespace --name <leaf> [--parent <p>]. Top-level first, then nest. The CLI panics intext_table.rson the empty-result render after success — cosmetic, the operation succeeded.- ACL needs to land on BOTH user AND token auth-ids — see step (4) above. Caused a "missing permissions 'Datastore.Backup'" failure on the first timer fire even though
user permissions host-backup@pbslooked correct. Revealed a latent bug in thepbsrole (only granting on user); patched same cycle —Ensure datastore ACLs for PBS client TOKEN auth-idtask added, gated onvault_pbs_token_id is defined, idempotent against the live state. The host-backup bootstrap above doesn't use the role (it's a separate user/token), so step (4) keeps the manual two-grant pattern.
Quarterly drill¶
The verify-job (Phase 5A, sun 04:00) checks chunk integrity but not that the documented restore path works. Once a quarter, mount the latest host/proxfold snapshot to /tmp/restore-test and diff a known-stable file (e.g. /etc/network/interfaces, /etc/pve/storage.cfg) against the live host. Surfaces silent regressions in the snapshot pipeline.
Config file locations¶
Proxmox stores guest configurations at:
/etc/pve/lxc/— LXC container configs/etc/pve/qemu-server/— VM configs
Docker Compose is managed via Dockhand (Git-backed, from the homelab-ansible repo stacks/arrstack/). App configs are under /opt/mediaserver/ on the arrstack VM.