Hash snapshot IDs at the storage boundary; make snapshot list resilient

Two related changes, both addressing leakage and brittleness around
the public bytes the destination store sees.

First, every remote storage path that previously embedded a human
snapshot ID (e.g. metadata/heraklion_berlin.sneak.fs.photos.2026.
catalog_2026-06-24T07:00:15Z/...) now uses the hashed remote key:

  RemoteSnapshotKey(id) = hex(SHA256(SHA256("vaultik|" + id)))

Applied at:

  * uploadSnapshotArtifacts (snapshot create write path)
  * the manifest.json.zst snapshot_id field — manifest is
    unencrypted, so the human ID would otherwise be readable to
    anyone with bucket-list permission
  * cleanupIncompleteSnapshots metadata-existence probe
  * snapshot restore / verify (downloadSnapshotDB,
    loadVerificationData)
  * downloadManifestByKey, deleteRemoteSnapshotByKey
  * CleanupLocalSnapshots reconciliation
  * the locally-driven removal paths (RemoveSnapshot,
    RemoveAllSnapshots, confirmAndExecutePurge)

The local index database keeps human IDs everywhere — the hash is a
boundary translation, not a rename. A directory listing of the
backup destination now looks like
"metadata/<64-hex>/{db.zst.age,manifest.json.zst}" with no host,
snapshot-name, or timestamp information visible.

Second, snapshot list no longer fails just because remote storage is
unreachable, and only consults the remote when the local machine can
plausibly decrypt:

  * Listing is always driven by the local index database — that's
    what holds the human IDs, timestamps, and per-snapshot stats
    that the table actually shows.
  * If no age secret key is configured, we skip remote listing
    entirely (the box is treated as a write-only backup machine —
    there's no value showing it remote-only keys it could never
    restore).
  * If a key IS configured, we try the remote listing; failures
    (volume unmounted, permission denied, network error) downgrade
    to a warning instead of aborting the command.
  * When the remote listing succeeds, we cross-reference by hashing
    each local human ID and diffing against the returned key set.
    Local-only snapshots get the existing "stale local record"
    cleanup hint; remote-only keys are surfaced as a single
    "NOTE: N remote snapshot(s) found in backup destination store
    but not in local database" line.

FileStorer construction also no longer does an eager mkdir — the
basePath is recorded and the directory is created lazily on first
write. A missing or unmounted destination during `snapshot list`
should NOT block the command, and now it doesn't.

RemoveAllSnapshots is rewritten to drive deletion from the local
index instead of from a remote listing, hashing each local ID to
find the corresponding remote key. Orphan remote keys (no matching
local snapshot) are handled separately and only deleted when
--remote is set. Existing tests are updated to hash storage paths
through the new RemoteSnapshotKey helper.

The hash format is a hard pre-1.0 break: existing remote snapshots
written under the human-ID path scheme are no longer readable; they
need to be either re-uploaded under the new scheme or manually
renamed. There is no fallback path; matching the project policy of
"no migrations pre-1.0."
This commit is contained in:
2026-06-26 01:54:35 +02:00
parent a84b911155
commit fd759a921a
9 changed files with 328 additions and 237 deletions

View File

@@ -314,10 +314,17 @@ func (sm *SnapshotManager) prepareExportDB(ctx context.Context, dbPath, snapshot
return finalData, tempDBPath, nil
}
// uploadSnapshotArtifacts uploads the database backup and blob manifest to S3
// uploadSnapshotArtifacts uploads the database backup and blob manifest
// to remote storage at metadata/<remote-key>/, where remote-key is the
// double-SHA256 derivation of the snapshot ID (see RemoteSnapshotKey).
// We never write the human-readable snapshot ID into any unencrypted
// part of remote storage so a listing of the destination bucket leaks
// no host, configuration, or scheduling information.
func (sm *SnapshotManager) uploadSnapshotArtifacts(ctx context.Context, snapshotID string, dbData, manifestData []byte) error {
remoteKey := RemoteSnapshotKey(snapshotID)
// Upload database backup (compressed and encrypted)
dbKey := fmt.Sprintf("metadata/%s/db.zst.age", snapshotID)
dbKey := fmt.Sprintf("metadata/%s/db.zst.age", remoteKey)
dbUploadStart := time.Now()
if err := sm.storage.Put(ctx, dbKey, bytes.NewReader(dbData)); err != nil {
@@ -332,7 +339,7 @@ func (sm *SnapshotManager) uploadSnapshotArtifacts(ctx context.Context, snapshot
"speed", humanize.SI(dbUploadSpeed, "bps"))
// Upload blob manifest (compressed only, not encrypted)
manifestKey := fmt.Sprintf("metadata/%s/manifest.json.zst", snapshotID)
manifestKey := fmt.Sprintf("metadata/%s/manifest.json.zst", remoteKey)
manifestUploadStart := time.Now()
if err := sm.storage.Put(ctx, manifestKey, bytes.NewReader(manifestData)); err != nil {
return fmt.Errorf("uploading blob manifest: %w", err)
@@ -607,9 +614,11 @@ func (sm *SnapshotManager) generateBlobManifest(ctx context.Context, dbPath stri
}
}
// Create manifest
// Create manifest. SnapshotID in the unencrypted manifest is the
// double-SHA256 remote key, not the human ID, so the public bytes
// don't reveal hostname/snapshot-name/timestamp metadata.
manifest := &Manifest{
SnapshotID: snapshotID,
SnapshotID: RemoteSnapshotKey(snapshotID),
Timestamp: time.Now().UTC().Format(time.RFC3339),
BlobCount: len(blobs),
TotalCompressedSize: totalCompressedSize,
@@ -680,8 +689,9 @@ func (sm *SnapshotManager) CleanupIncompleteSnapshots(ctx context.Context, hostn
// Check each incomplete snapshot for metadata in storage
for _, snapshot := range incompleteSnapshots {
// Check if metadata exists in storage
metadataKey := fmt.Sprintf("metadata/%s/db.zst", snapshot.ID)
// Check if metadata exists in storage (paths use the hashed
// remote key so we don't leak host info to the listing).
metadataKey := fmt.Sprintf("metadata/%s/db.zst", RemoteSnapshotKey(snapshot.ID.String()))
_, err := sm.storage.Stat(ctx, metadataKey)
if err != nil {