Files
vaultik/internal/snapshot/remotekey.go
sneak fd759a921a Hash snapshot IDs at the storage boundary; make snapshot list resilient
Two related changes, both addressing leakage and brittleness around
the public bytes the destination store sees.

First, every remote storage path that previously embedded a human
snapshot ID (e.g. metadata/heraklion_berlin.sneak.fs.photos.2026.
catalog_2026-06-24T07:00:15Z/...) now uses the hashed remote key:

  RemoteSnapshotKey(id) = hex(SHA256(SHA256("vaultik|" + id)))

Applied at:

  * uploadSnapshotArtifacts (snapshot create write path)
  * the manifest.json.zst snapshot_id field — manifest is
    unencrypted, so the human ID would otherwise be readable to
    anyone with bucket-list permission
  * cleanupIncompleteSnapshots metadata-existence probe
  * snapshot restore / verify (downloadSnapshotDB,
    loadVerificationData)
  * downloadManifestByKey, deleteRemoteSnapshotByKey
  * CleanupLocalSnapshots reconciliation
  * the locally-driven removal paths (RemoveSnapshot,
    RemoveAllSnapshots, confirmAndExecutePurge)

The local index database keeps human IDs everywhere — the hash is a
boundary translation, not a rename. A directory listing of the
backup destination now looks like
"metadata/<64-hex>/{db.zst.age,manifest.json.zst}" with no host,
snapshot-name, or timestamp information visible.

Second, snapshot list no longer fails just because remote storage is
unreachable, and only consults the remote when the local machine can
plausibly decrypt:

  * Listing is always driven by the local index database — that's
    what holds the human IDs, timestamps, and per-snapshot stats
    that the table actually shows.
  * If no age secret key is configured, we skip remote listing
    entirely (the box is treated as a write-only backup machine —
    there's no value showing it remote-only keys it could never
    restore).
  * If a key IS configured, we try the remote listing; failures
    (volume unmounted, permission denied, network error) downgrade
    to a warning instead of aborting the command.
  * When the remote listing succeeds, we cross-reference by hashing
    each local human ID and diffing against the returned key set.
    Local-only snapshots get the existing "stale local record"
    cleanup hint; remote-only keys are surfaced as a single
    "NOTE: N remote snapshot(s) found in backup destination store
    but not in local database" line.

FileStorer construction also no longer does an eager mkdir — the
basePath is recorded and the directory is created lazily on first
write. A missing or unmounted destination during `snapshot list`
should NOT block the command, and now it doesn't.

RemoveAllSnapshots is rewritten to drive deletion from the local
index instead of from a remote listing, hashing each local ID to
find the corresponding remote key. Orphan remote keys (no matching
local snapshot) are handled separately and only deleted when
--remote is set. Existing tests are updated to hash storage paths
through the new RemoteSnapshotKey helper.

The hash format is a hard pre-1.0 break: existing remote snapshots
written under the human-ID path scheme are no longer readable; they
need to be either re-uploaded under the new scheme or manually
renamed. There is no fallback path; matching the project policy of
"no migrations pre-1.0."
2026-06-26 01:54:35 +02:00

41 lines
1.6 KiB
Go

package snapshot
import (
"crypto/sha256"
"encoding/hex"
)
// remoteKeyPrefix is mixed into the snapshot ID hash so the resulting
// hex digest is domain-separated from any other "double SHA256 of a
// string" identifier the user might also use. Keeping this stable is a
// hard compatibility requirement: changing it invalidates every
// existing snapshot's remote storage path.
const remoteKeyPrefix = "vaultik|"
// RemoteSnapshotKey returns the storage-side identifier for a snapshot
// given its human snapshot ID. It is hex(SHA256(SHA256(prefix + id))).
// The two SHA256 rounds match Bitcoin's "hash256" convention so the
// output looks like a 64-character hex blob with no exploitable
// structure visible to a remote observer.
//
// We use this in three places:
//
// - the "metadata/<remote-key>/..." subdirectory on the storage
// backend so a directory listing of the bucket / file:// dest
// doesn't reveal hostnames, configured snapshot names, or backup
// timestamps;
// - the `snapshot_id` field of the unencrypted manifest.json.zst
// for the same reason;
// - any code path that needs to translate a known local snapshot ID
// into the path it would occupy on remote storage.
//
// The human ID stays the user-visible handle everywhere else — local
// database joins, CLI arguments, summary lines, log fields — because
// it's never written to the public bytes once this function gates
// every storage-path construction.
func RemoteSnapshotKey(snapshotID string) string {
first := sha256.Sum256([]byte(remoteKeyPrefix + snapshotID))
second := sha256.Sum256(first[:])
return hex.EncodeToString(second[:])
}