88 Commits

Author SHA1 Message Date
aa3e8f081b Merge fix/info-and-doc-drift
All checks were successful
check / check (push) Successful in 2m4s
2026-06-24 08:55:04 +02:00
1f22b9c603 Collapse snapshot prune into vaultik prune; auto-clean on removal
The CLI had two commands named "prune" doing different jobs (local
DB orphan cleanup vs. remote blob garbage collection), which was
confusing and forced a manual two-step workflow after deleting any
snapshot.

Single user-facing prune surface is now `vaultik prune`, which calls
PruneDatabase (local orphan cleanup) then PruneBlobs (remote unref
blob GC). Snapshot deletion paths (snapshot remove, snapshot remove
--all, snapshot purge) auto-run CleanupOrphanedData inline so the
local index database doesn't accumulate ghost rows after every
removal — the user observed ~39k orphaned files and 2 orphaned blobs
after a remove --all because that cleanup was previously a separate
opt-in command. `snapshot prune` is removed.

Also addresses the doc/help-string drift the user audit caught:

  * cli/prune.go help text used to reference a non-existent
    `vaultik purge` command.
  * cli/config.go get/set short/long examples were S3-specific
    (s3.bucket) when the primary storage configuration is
    storage_url.
  * vaultik/info.go printed S3 Bucket/Endpoint/Region labels
    unconditionally; for file:// or rclone:// users those rows
    were empty. The Storage Configuration block now prints the
    storer's Type+Location first, the storage_url string when set,
    and only emits S3 rows that are actually populated.
  * vaultik/info.go's "Run 'vaultik prune --remote'" hint
    referenced a flag that doesn't exist.
  * vaultik/blobcache.go's doc comment claimed LRU eviction, which
    is no longer the restore-time policy (the sweeper drives
    eviction; LRU is the safety-net fallback when maxBytes is
    finite).
  * README.md listed `vaultik restore`, `vaultik snapshot prune`,
    and `s3.bucket` example, all out of date.

README's roadmap section is rewritten with concrete pre-1.0 items
(security audit, error-condition tests, parallel blob downloads,
restart of interrupted restore, …) so the next-steps surface
matches what the project actually still needs.

The cleanup calls are guarded against a nil SnapshotManager so
tests that construct a bare Vaultik struct continue to work.
2026-06-24 08:55:00 +02:00
60abeb636a Merge test/restore-locality-and-readat
All checks were successful
check / check (push) Successful in 2m17s
2026-06-24 08:33:22 +02:00
7ae49a1b2c Stream blobs to disk and restore files in blob-locality order
Three coordinated changes drop restore wall-clock by orders of
magnitude on real-world snapshots and bring memory use back under
control:

  * Streaming download into the disk cache. New
    blobDiskCache.PutFromReader takes an io.Reader and io.Copy's it
    straight into the cache file. The old downloadBlob path did
    io.ReadAll on the decrypted plaintext stream — for a 50 GB blob
    that meant 50 GB in RAM before the cache write. The whole chain
    (Storage.Get → age.Decrypt → zstd.NewReader → io.Copy) is now
    fully streaming; peak RAM per blob is bounded by ~64 KB of
    internal age/zstd buffers plus the io.Copy buffer.

  * Chunk extraction via ReadAt. After a blob is on disk, restore
    reads chunks via blobDiskCache.ReadAt(hash, offset, length) so
    only the chunk's bytes ever touch RAM. The previous code path
    called blobCache.Get for every cache-hit chunk, which read the
    entire blob (e.g. 10 GB) from disk into a []byte just to slice
    out a few KB — single-handedly ~900 ms per cache hit on the
    user's photo snapshot.

  * Locality-aware restore ordering. New restorePlan indexes
    file→blob_set and blob→file_set at restore start, then drives
    the loop so that every file whose full blob set is on disk is
    drained before any new blob downloads. After the drain queue
    empties, the planner picks the pending file with the smallest
    uncached-blob count, downloads those blobs, and drains again.
    A sweep is forced right before each download so the just-
    completed blob is evicted before the next one is Put, keeping
    peak disk-cache occupancy at 1 for path-order-adversarial
    snapshots.

The restore hot path also moves onto a restoreSession struct so
restoreFile/restoreRegularFile/etc. take only the per-call file
argument instead of threading 9+ parameters through every signature.
The new BlobRepository.GetAll method lets the session build a single
blob-id → blob-hash map at start instead of doing one DB query per
chunk.

TestRestoreLocalityAndReadAt passes: peak_len ≤ 1, get_calls = 0,
readat_calls > 0, every blob fetched exactly once.
2026-06-24 08:33:22 +02:00
a92b1a82ad Add failing test for restore blob-cache locality and ReadAt usage
Captures three behaviors the restore hot path must exhibit but
currently doesn't, all under one test:

  * Peak blob disk cache occupancy ≤ 1. Smart restore ordering should
    drain every file referencing the currently-cached blob before
    downloading the next one, so the sweeper can free each blob the
    moment its file set is exhausted.
  * Every remote blob fetched exactly once (counter on a wrapping
    Storer). Already true today; the test pins it so neither future
    cache-eviction nor reorder regressions can introduce
    re-downloads.
  * blobDiskCache.Get is never called during restore — chunk
    extraction must go through ReadAt so we never read the whole
    blob from disk to slice out a few KB. The 10 GB
    photo-snapshot --debug output showed ~900 ms per cache-hit chunk
    extract; ReadAt should bring that to sub-millisecond.

Adds Get/ReadAt call counters and a peak-Len tracker to
blobDiskCache, plus an internal restoreCacheObserver hook on Vaultik
so the test can capture the production cache instance without
exporting unexported types.

Currently fails with peak_len=3, get_calls=24, readat_calls=0. The
fix follows in subsequent commits.
2026-06-17 08:14:55 +02:00
39d5d21d48 Revert "Merge fix/restore-cache-readat"
All checks were successful
check / check (push) Successful in 4s
This reverts commit 44c9008e7e, reversing
changes made to b55d5763ad.
2026-06-17 08:01:56 +02:00
44c9008e7e Merge fix/restore-cache-readat
All checks were successful
check / check (push) Successful in 2m2s
2026-06-17 07:58:01 +02:00
8036d93914 Read chunks from cached blobs via ReadAt instead of full-blob Get
Restore's per-chunk loop called blobCache.Get(blobHash) and sliced the
returned []byte to extract the chunk it actually wanted. Get reads the
entire blob from disk into memory — so for a 10 GB blob, every chunk
extraction was a 10 GB ReadFile to get back a few KB. With ~40k files
needing ~600ms per cache hit, that alone was burning ~6 hours of
wall-clock on a real restore.

Hot loop now:
  - If the blob isn't cached: download (full plaintext into memory),
    Put to disk cache, satisfy this chunk from the in-memory buffer.
  - If it's cached: blobCache.ReadAt(hash, offset, length) — reads
    only the chunk's bytes from the on-disk blob file.

ReadAt was already implemented on blobDiskCache; restore just wasn't
using it.

Debug timings from the user's photo-catalog restore showed
ms_cache_gets dominating every cache-hit file at 500-1000ms. With
ReadAt those should drop to sub-millisecond and the visible throughput
should be bound by single-stream blob download + decrypt/decompress
rather than disk-read amplification.
2026-06-17 07:58:01 +02:00
b55d5763ad Merge refactor/restore-progress-output
All checks were successful
check / check (push) Successful in 2m46s
2026-06-17 07:54:18 +02:00
53febb48d2 Replace restore progress bars with periodic ui.Progress lines
Restore and verify no longer use schollz/progressbar. Instead they emit
a periodic status line every 15 s via ui.Writer.Progress, matching the
cadence and shape of the snapshot create scanner output. The lines
include files done, byte counts, throughput in bits/sec, elapsed,
absolute ETA, and remaining duration — same conventions as snapshot
create. The progressbar dependency, the newProgressBar/isTerminal
helpers, and the unused printfStderr helper are removed; go.mod loses
schollz/progressbar plus its colorstring and uniseg transitive deps.

Adds --debug timing instrumentation throughout the restore hot path so
the next slow-restore report can pinpoint which stage is the
bottleneck. Per-file: file-chunks query, output Create, per-chunk blob
DB lookups, cache get/put, blob download, chunk write, sweeper call.
Per-blob-download: fetch-setup (Get + Stat) vs read+decrypt+decompress
vs close-and-verify. FetchBlob splits the Storage.Get and Storage.Stat
round-trips so an expensive size-stat is visible separately.
2026-06-17 07:54:14 +02:00
d55ddc5914 Merge test/restore-sweeper
All checks were successful
check / check (push) Successful in 2m8s
2026-06-17 07:20:10 +02:00
d9319dc0fb Add integration test for restore sweeper
Writes 30 random 1 MB files plus 10 duplicates (40 files, 30 MB of
unique content), backs them up with a 10 MB blob_size_limit, then
restores through a counting storer that records every Get per key.
Each blob on disk must be downloaded exactly once during restore — a
re-download would mean the sweeper evicted a blob whose chunks were
still referenced by an unrestored file, and zero downloads would mean
the cache silently stopped being consulted.

The duplicates exercise the dedup path: the sweeper has to keep each
blob alive until every file (original AND duplicate) that references
any of its chunks has been restored.
2026-06-17 07:20:07 +02:00
af330f2777 Merge fix/restore-blob-cache-eviction
All checks were successful
check / check (push) Successful in 1m57s
2026-06-17 07:15:26 +02:00
683fb0b103 Replace LRU eviction in restore with reference-counted sweeper
Restore previously capped the blob disk cache at 4× the configured
blob_size_limit (so 40 GB by default). With large or heavily-deduped
snapshots a chunk-by-chunk file walk could blow past that cap and
trigger LRU eviction of blobs that were still needed by later files,
forcing repeated re-downloads — observed during a real restore as
single-stream throughput collapsing to under 1 MB/s.

Restore now allocates the cache with no practical size cap and drives
eviction explicitly:

  * An in-memory set of restored file IDs accumulates as files finish.
  * Every blob_size_limit/100 bytes of restored data (≈100 sweeps per
    blob's worth of writes) the sweeper iterates the cache. For each
    cached blob it queries the snapshot's local SQLite DB for every
    file that references any chunk in the blob and deletes the cache
    entry only when every such file is already in the restored set.
  * blobStillNeeded returns true on any error so an unreadable DB
    never causes premature eviction.

The cache itself gains Delete(key) and Keys() so the sweeper can drive
removal without touching internal LRU state.
2026-06-17 07:15:22 +02:00
cf8a527d35 Merge fix/output-style-banner-errors
All checks were successful
check / check (push) Successful in 2m9s
2026-06-17 06:56:38 +02:00
a63c729fbc Print banner before cobra parsing; route arg errors through ui.Error
Two output-style fixes plus a quiet-mode correction.

Banner: a manual scan of os.Args in CLIEntry decides whether to suppress
the banner (--quiet/-q/--cron), then prints it before cobra parses any
arguments. This makes the banner appear even when cobra rejects bad args
("requires at least 2 arg(s)") and on --help — paths that previously
skipped PersistentPreRun entirely. The cobra-side hook plumbing (sync.Once,
PersistentPreRun, custom HelpFunc) is removed.

Errors: rootCmd.SilenceErrors = true so cobra no longer prints its own
"Error: <msg>" line. Any error returned from Execute() goes through
ui.New(os.Stderr).Error(...), giving the documented "🛑 ERROR: <msg>"
format. A new helper cli.ReportError() formats errors from goroutine
paths that can't return through cobra's normal return chain; every
CLI command's fx-goroutine error path now calls it alongside the
existing structured log.Error so both channels record the failure.

Quiet mode: previously --quiet/--cron swapped Vaultik.UI to io.Discard,
which silenced Warning and Error messages too — contradicting the
documented "suppresses non-error output" semantics. ui.Writer now has
a SetQuiet flag that drops Begin/Complete/Info/Notice/Detail/Progress/
Banner only; Warning and Error always emit.

Also folds in restore.go cleanups the audit flagged: the hardcoded
"WARNING:" prefix on the failed-files block now uses ui.Warning +
ui.Detail, the post-restore "Restored N files" line uses ui.Complete,
and the "No files found to restore" branch emits both log.Warn and
ui.Warning so structured logs continue to capture it under --verbose.
2026-06-17 06:56:34 +02:00
a1065d4f1f Merge feature/snapshot-ls-delta-column
All checks were successful
check / check (push) Successful in 2m37s
2026-06-17 06:34:02 +02:00
0e9c96c8b5 Add uncompressed-size and new-chunk-size columns to snapshot list
The remote snapshot table now shows the total plaintext size of all
chunks referenced by each snapshot, plus the plaintext size of chunks
newly referenced by that snapshot (chunks not in any earlier completed
snapshot known to the local DB). The latter is the marginal data
introduced by each backup — useful for spotting which snapshots
actually added bytes vs. dedup'd against prior state.

Both new columns are computed from the local database only. Snapshots
that exist in remote storage but not in the local DB show
"<remote only>" in those cells; their COMPRESSED SIZE column still
reflects the value fetched from the remote manifest.
2026-06-17 06:33:59 +02:00
cafae65f61 Merge refactor/snapshot-restore
All checks were successful
check / check (push) Successful in 2m40s
2026-06-17 06:27:53 +02:00
7a0d5bfd73 Move restore to snapshot restore subcommand
Renames the top-level `restore` command to `vaultik snapshot restore`
for consistency with `vaultik snapshot create`. The factory follows the
sibling pattern (newSnapshotRestoreCommand) and its file is renamed to
snapshot_restore.go to match.
2026-06-17 06:27:44 +02:00
8d1c8982d7 Merge feature/remote-nuke 2026-06-17 06:21:21 +02:00
e75367c594 Add 'vaultik remote nuke', rename Processing→Backing up, bits/sec rates
remote nuke: new subcommand that deletes every snapshot's metadata and
every blob from remote storage, leaving the bucket prefix empty.
Requires --force.

User-facing 'Processing' is now 'Backing up' everywhere it referred to
the chunking/upload phase. Files summary line says 'backed up' instead
of 'processed'.

ui.Speed now formats bytes/sec input as bits/sec output (bit/s, Kbit/s,
Mbit/s, Gbit/s). Network transfer rates are conventionally expressed
in bits — the per-blob heartbeat now matches the per-snapshot summary
line which has always been bits/sec.
2026-06-17 06:21:21 +02:00
64c69cd8e3 Merge fix/dedup-only-snapshot-restore
All checks were successful
check / check (push) Successful in 1m58s
2026-06-17 06:05:52 +02:00
132f7149ca Populate snapshot_blobs for dedup-referenced blobs at completion
The bug: fully-deduplicated snapshots (every chunk already in storage
from a prior run) had an empty snapshot_blobs table. The metadata-
export pipeline then dropped all blob/blob_chunks rows from the
exported database, leaving file_chunks references to chunks whose
blobs were no longer recorded. Restore fails on every file with
"chunk X not found in any blob".

Fix: at CompleteSnapshot time, run an INSERT OR IGNORE that links
every blob holding a chunk referenced by this snapshot's files into
snapshot_blobs. New blobs uploaded during the snapshot are already
recorded (no-op for them); dedup-referenced blobs are added.

The cleanup query in deleteOrphanedBlobs already restricts to
snapshot_blobs entries for the current snapshot — so once
snapshot_blobs is correctly populated, the exported database
contains the full set of blob/blob_chunks rows needed for restore.

Regression test: TestDedupOnlySnapshotRestores creates two
identical snapshots (the second uploads zero new blobs) and
restores the second. Without the fix, restore fails on every file.
2026-06-17 06:05:52 +02:00
f1ce085972 Merge fix/restore-fail-fast 2026-06-17 06:02:15 +02:00
d8edf90fac Restore fails fast on first error; --skip-errors is now global
restore aborts on the first per-file failure by default, surfacing
the file path and the underlying error and suggesting --skip-errors
to continue past failures.

--skip-errors moved from a 'snapshot create' subcommand flag to a
top-level persistent flag on the root command. It applies to both
snapshot create and restore. Old 'vaultik snapshot create --skip-
errors' still works because persistent flags are inherited.
2026-06-17 06:02:15 +02:00
301ea217e8 Merge fix/banner-everywhere
All checks were successful
check / check (push) Successful in 2m4s
2026-06-17 05:57:21 +02:00
9f537b9c4c Print startup banner on every invocation (except -q / --cron)
Adds maybePrintBanner() called from three cobra hooks:
  - PersistentPreRun on root: covers every subcommand invocation
  - Custom HelpFunc on root: covers --help and group-level help
  - Run on root: covers bare 'vaultik' with no subcommand

bannerOnce sync.Once ensures the banner prints exactly once per
process regardless of which hook(s) fire.

Removed the duplicate banner-print from fx setupGlobals; that hook
still handles the --cron/--quiet UI swap for the rest of the output.
2026-06-17 05:57:21 +02:00
cf5b643bee Merge fix/banner-always-shown 2026-06-17 05:54:48 +02:00
3113014b58 Print banner when vaultik is invoked with no subcommand
Cobra's default 'no subcommand → print help' path bypasses fx, so
the startup banner never ran for bare 'vaultik'. Add a Run handler
on the root command that prints the banner and then calls Help.

Extracted the banner-printing logic into writeStartupBanner() so
both this path and the fx setupGlobals hook share one implementation.
2026-06-17 05:54:48 +02:00
706284d590 Merge feature/banner-bold-newline
All checks were successful
check / check (push) Successful in 1m55s
2026-06-17 05:52:03 +02:00
75564a504e Bold the startup banner on TTY; blank line after banner 2026-06-17 05:52:03 +02:00
edd3e5fdb2 Merge feature/snapshot-summary-indent 2026-06-17 05:51:02 +02:00
d5796bd6c1 Indent snapshot summary details; add Finished message; fix 'to process'
- New ui.Detail method for indented continuation lines under a
  preceding Complete (visually same as Progress: "  》" in white).
- Snapshot summary lines (Files/Data/Storage/Upload/Duration) are
  now Detail lines indented under "Created snapshot X.".
- Local index database prune complete result lines (incomplete
  snapshots, orphaned files/chunks/blobs) are also Detail lines
  under a clean Complete header.
- "Files: ... to process" → "Files: ... processed" (they have been
  processed by the time we emit the summary).
- "Data: ... (... to process)" → "Data: ... (... processed)".
- ui.Writer now tracks warning and error counts emitted; Vaultik
  prints "Finished successfully." or "Finished (with N warnings)."
  as the final line of CreateSnapshot.
2026-06-17 05:51:02 +02:00
90e855ef99 Merge fix/progress-eta-format 2026-06-17 05:44:48 +02:00
2185421c01 Reformat progress lines and prune output
Progress lines now use the form:
  ..., <subject> elapsed: <dur>, <subject> ETA: <time> (est remain <dur>).

ui.Time formats same-day times as HH:MM:SS and other-day times as
YYYY-MM-DD HH:MM:SS, with no timezone suffix (local time is implied).

The local-index-database prune complete line now shows remaining
counts for each category:
  ... 1 incomplete snapshots removed (3 remain), 3783 orphaned files
  removed (42 remain), ...
2026-06-17 05:44:48 +02:00
ce0d7b45a1 Merge fix/commit-date-format
All checks were successful
check / check (push) Successful in 2m1s
2026-06-17 05:39:11 +02:00
1266a263fc Add author/homepage/license to version + banner; date format fixes
- globals.go: add Homepage and License constants.
- version command: show author, homepage, license, build date.
- Startup banner reformatted to:
    vaultik X by Author (commit Y, built on Z) starting up at T.
    https://sneak.berlin/go/vaultik
- Commit date now formatted as YYYY-MM-DD (called "build date" in
  user-facing output, since the binary was at least compiled once on
  the date of commit). Makefile/Dockerfile use git --format=%cs.
  goreleaser slices its RFC3339 .CommitDate template var to 10 chars.
2026-06-17 05:39:11 +02:00
70632e4353 Merge fix/error-emoji
All checks were successful
check / check (push) Successful in 2m3s
2026-06-17 04:35:29 +02:00
77b9d943e4 Use 🛑 (red octagonal stop sign) for ERROR prefix
 is a thin black-and-white cross that gets lost against terminal
backgrounds and the ANSI red text. 🛑 is a solid red octagon that
reads unmistakably as 'stop/error' at a glance, even when the user
isn't reading the line carefully.
2026-06-17 04:35:28 +02:00
fc4d0d6dc7 Merge feature/ui-error-warning-emoji 2026-06-17 04:33:55 +02:00
22227aa0c5 Add emoji prefixes to Warning and Error output 2026-06-17 04:33:55 +02:00
9cb14d143d Merge fix/clean-startup-errors 2026-06-17 04:32:05 +02:00
00d4b36e35 Introduce internal/ui package and rewrite user-facing output
All user-facing output now goes through a single ui.Writer with a
uniform style:

  》 (white)     for begin / info / notice
  》 (green)     for complete / success
  Warning:      for warnings (orange)
  ERROR:        for errors (red)
    》          (indented) for progress heartbeats

Color is enabled when stdout is a TTY and NO_COLOR is unset.

Standards:
- Complete-sentence messages with fully qualified terms ("backup
  destination store", "local index database", "snapshot source
  files enumeration").
- Every Complete has a matching Begin.
- Natural verb tense conveys state ("Uploading" -> "Uploaded"). The
  words "begin"/"complete" never appear in message bodies; the marker
  color carries that information.
- ETA means clock time, not duration. Progress lines say "estimated
  remaining time (<dur>), finish at <time>" with both labeled.

Adds globals.CommitDate (populated by Makefile/Dockerfile/goreleaser
via ldflags from `git show -s --format=%cI HEAD`) and a startup banner
printed once per invocation.

Strips fx call-chain noise from startup errors so users see the actual
underlying error (e.g. "creating base path: mkdir /Volumes/BACKUPS:
permission denied" instead of three layers of "could not build
arguments for function ...").

README documents the output style and the ui package conventions.
2026-06-17 04:32:05 +02:00
8de8f8e5cc Strip fx call-chain noise from startup errors; clarify file:// error 2026-06-17 03:58:50 +02:00
6e6e107243 Merge fix/upload-progress-labels
All checks were successful
check / check (push) Successful in 2m12s
2026-06-17 02:29:25 +02:00
6bb6f7c8a8 Make blob upload progress heartbeat unambiguous (vs snapshot progress) 2026-06-17 02:29:25 +02:00
8e55d2f970 Merge feature/upload-progress-output 2026-06-17 02:27:23 +02:00
b0747657e3 Print upload start line and 15s heartbeat during blob upload
Long-running uploads (multi-GB blobs over slow links) previously
produced silence between the start of the upload and the "Blob
stored" line at the end. Now we print:

  Uploading blob: <hash> (<size>)

before the upload starts, and a heartbeat line at most every 15s:

  uploading <hash>: <done>/<total> (NN%), <speed>/sec, <elapsed> elapsed, ETA <eta>

This gives the user visible progress on large uploads, especially
over SMB or remote storage where 10+ second stalls are normal.
2026-06-17 02:27:23 +02:00
2a9718855c Merge fix/usability-improvements
All checks were successful
check / check (push) Successful in 2m21s
2026-06-17 01:41:09 +02:00
485f3296d9 Fix config-not-found errors, dev-build hint, unify output writer
ResolveConfigPath now stats explicit paths from --config and
$VAULTIK_CONFIG and produces an actionable error naming the bad
path and suggesting 'vaultik config init' (with the right path
in the --config case). The default-search failure message lists
the paths it tried.

The scanner no longer hard-codes os.Stdout vs io.Discard based on
EnableProgress. ScannerConfig and ScannerParams take an explicit
Output io.Writer, and the Vaultik caller passes v.Stdout — which
itself is set to io.Discard in --cron mode. One knob controls
both scanner-level and Vaultik-level user-facing output.

The version command prints a hint when Version == "dev" telling
the user this is a development build without embedded version
metadata.
2026-06-17 01:41:09 +02:00
adf73c5413 Merge fix/macos-fda-error-message
All checks were successful
check / check (push) Successful in 2m5s
2026-06-16 05:20:33 -07:00
8959741c90 Add actionable permission-error message with macOS Full Disk Access hint
When the scanner hits a permission-denied error (TCC-protected
directories on macOS without Full Disk Access, or any other EPERM),
the error now names the offending path and includes platform-specific
remediation instructions. On macOS it points the user at System
Settings -> Privacy & Security -> Full Disk Access. On other
platforms it suggests --skip-errors.

The error wraps os.ErrPermission so errors.Is still works for callers
that care about the underlying error.

README quickstart and snapshot create docs now mention the macOS FDA
requirement.
2026-06-16 05:20:33 -07:00
e534746cf3 Merge docs/private-key-filename
Some checks failed
check / check (push) Failing after 6s
2026-06-10 11:44:58 -07:00
5397b37c13 Use vaultik_backup_private_key.txt filename in keygen examples 2026-06-10 11:44:58 -07:00
2df2792a75 Merge docs/shell-completion 2026-06-10 11:44:05 -07:00
4fe568f803 Document shell completion in README 2026-06-10 11:44:05 -07:00
27e85f01f2 Merge feature/vanity-import-readme
All checks were successful
check / check (push) Successful in 2m36s
2026-06-10 11:37:42 -07:00
d479bfcd52 Adopt sneak.berlin/go/vaultik vanity import path, README overhaul
Module path changed from git.eeqj.de/sneak/vaultik to
sneak.berlin/go/vaultik (vanity redirect). All imports, ldflags,
Dockerfile, goreleaser config, and docs updated. App data/config
directories now use plain "vaultik" instead of the reverse-DNS name.

README:
- New copy-pasteable quickstart at top: go install, config init,
  age keypair, config set for key + file:// destination, home backup
- All command names in command details are code-quoted
- config set/get gained sequence index support (age_recipients.0)
  so lists are settable from the CLI
- Dockerfile build is CGO_ENABLED=0 to match the pure-Go build
2026-06-10 11:37:23 -07:00
cb16d6869f Add previously-untracked snapshot removal and verify tests
These test files existed locally and ran in the suite but were never
committed due to the old .gitignore 'vaultik' pattern matching the
internal/vaultik/ directory.
2026-06-10 11:24:10 -07:00
ff85f1e4f8 Merge feature/config-subcommands 2026-06-10 11:23:47 -07:00
b2e160944f Move init to 'config init', add config edit/get/set subcommands
The config command group manages the config file:
  config init  - write default config (moved from top-level init)
  config edit  - open the config in $EDITOR (falls back to vi)
  config get   - print a value by dotted YAML path (s3.bucket)
  config set   - set a scalar value by dotted YAML path

get/set operate on the yaml.Node tree so comments and formatting in
the config file are preserved across edits. set creates intermediate
maps as needed.
2026-06-10 11:23:47 -07:00
307867f59e Merge feature/exclude-list-refinement
All checks were successful
check / check (push) Successful in 2m21s
2026-06-10 11:12:50 -07:00
9d12d500fa Refine default exclude list: keep .docker config, add never-backup paths
Removed /.docker (small, contains registry auth config worth keeping)
and /Library/Parallels (small app support; the actual VM disks live in
~/Parallels) from the default excludes.

Added recommended excludes for data that should never be backed up:
- Language/toolchain caches (npm, cargo, rustup, go modules, maven,
  vagrant boxes, node_modules, __pycache__, .venv)
- VM disk images (Parallels, VMware Fusion, VirtualBox, OrbStack, UTM)
- Downloaded LLM models (ollama, LM Studio)
- Cloud-synced storage (~/Library/CloudStorage, iCloud Drive) — synced
  elsewhere, and dataless placeholder files would be force-downloaded
- Android SDK and emulator images
2026-06-10 11:12:50 -07:00
2e2bf01130 Merge feature/default-config-excludes 2026-06-10 11:10:00 -07:00
e9687c68b7 Integrate macOS backup exclude lists into default config template
The init-generated config now ships with a comprehensive home snapshot
exclude list (caches, trash, cloud-synced data, rebuildable app state,
device backups) derived from a battle-tested rsync backup script, plus
an apps snapshot for /Applications excluding Apple-redownloadable apps
(Safari, GarageBand, iWork, iMovie) and large third-party installs.

Obsolete pre-Catalina app entries (Dashboard, iTunes, DVD Player, etc.)
were dropped — OS apps live in /System/Applications on modern macOS and
never appear in /Applications.

Adds a test asserting the template parses as valid YAML with the
expected snapshot structure.
2026-06-10 11:10:00 -07:00
a8970a87fc Merge feature/init-config 2026-06-10 11:01:33 -07:00
e6ee488d9d Add 'vaultik init' command and quickstart section in README
New init command writes a default config file with commented
explanations for every setting. Uses XDG config directory via
github.com/adrg/xdg for platform-appropriate paths:
  macOS: ~/Library/Application Support/vaultik/config.yml
  Linux: ~/.config/vaultik/config.yml
  root:  /etc/vaultik/config.yml

Config resolution now searches the XDG path before /etc/vaultik/.
Refuses to overwrite an existing file. Created with 0600 permissions.

README quickstart rewritten as a single copy-pasteable shell block
walking through install, keygen, init, edit, first backup, verify,
and cron setup.
2026-06-10 11:01:29 -07:00
2e2b02a056 Merge fix/cron-silence-list-sideffect-gitignore
All checks were successful
check / check (push) Successful in 1m18s
2026-06-09 13:45:54 -04:00
0b95cb4308 Fix --cron silence, add snapshot cleanup, fix .gitignore
--cron now sets Vaultik.Stdout to io.Discard so all user-facing output
is suppressed, not just the scanner progress. Errors still go to stderr
via the structured logger.

snapshot list now warns when local snapshot records have no matching
remote metadata, and suggests 'vaultik snapshot cleanup' instead of
silently deleting them.

snapshot cleanup is a new subcommand that explicitly removes stale
local snapshot records. syncWithRemote (used by purge) still does
this automatically since purge is already destructive.

.gitignore changed from 'vaultik' to '/vaultik' so it only matches
the binary at the repo root, not the internal/vaultik/ directory.
2026-06-09 13:45:54 -04:00
4a3e61f8e1 Merge docs/limitations-section
All checks were successful
check / check (push) Successful in 1m19s
2026-06-09 13:38:32 -04:00
6fbcac0cd8 Add limitations section to README 2026-06-09 13:38:32 -04:00
34f73f72d8 Merge feature/keep-newer-than 2026-06-09 13:22:24 -04:00
ee240faa32 Add --keep-newer-than flag for rolling retention window
snapshot create --prune now accepts --keep-newer-than <duration> (e.g.
4w, 30d, 6mo) to keep a rolling window of snapshots instead of only
the latest. Supports d/w/mo/y units and combinations (2w3d).

Without --keep-newer-than, --prune still defaults to keep-latest-only.
2026-06-09 13:22:24 -04:00
f719ab3adc Merge docs/consolidate-readme 2026-06-09 12:57:33 -04:00
1a8baf7491 Consolidate docs: rewrite README as primary reference, remove TODO.md
README now covers: storage backends (s3/file/rclone), all CLI commands
with full flag docs, configuration reference table, architecture overview,
roadmap (post-1.0 only), and development workflow.

TODO.md removed — completed items dropped, remaining roadmap items
merged into README.

ARCHITECTURE.md updated: correct snapshot ID format, storage.Storer
instead of s3.Client, binary SQLite export instead of SQL dump.
2026-06-09 12:57:33 -04:00
7d5d3fa598 Merge test/e2e-symlinks-dirs-perms: backup symlinks, empty dirs, permissions 2026-06-09 12:47:22 -04:00
ac5d2f4a0d Back up symlinks, empty directories, and file permissions
Scanner now records symlinks (with their target) and directories
during the walk phase instead of skipping them. processFileStreaming
detects non-regular entries and writes the DB record without chunking.

The e2e test (TestEndToEndFileStorage) now verifies:
- Symlink target preserved through backup→restore
- Empty directory survives round-trip
- File permissions (0600) restored correctly
2026-06-09 12:47:18 -04:00
b250ddfa94 Merge docs/development-workflow
All checks were successful
check / check (push) Successful in 5s
2026-06-09 12:38:00 -04:00
fe3ad13a91 Document development workflow in README, fix Go version requirement 2026-06-09 12:38:00 -04:00
ebd6619638 Route scanner output through writer, fix S3 error handling, improve error messages
All checks were successful
check / check (push) Successful in 2m38s
Scanner now writes all user-facing output to an io.Writer (os.Stdout
when progress is enabled, io.Discard in --cron mode). This fixes the
long-standing issue where --cron still printed progress lines.

S3 HeadObject now properly distinguishes not-found from other errors
instead of swallowing all errors as not-found.

Config/CLI error messages include actionable hints (where to find the
config, how to generate keys, what storage options exist).
2026-06-09 12:31:50 -04:00
20d3a9ac8c Remove unused shortHostname helper
All checks were successful
check / check (push) Successful in 2m24s
Was added when PurgeSnapshots needed hostname-aware name parsing.
After adopting parseSnapshotName(snapshotID) from origin, the
helper has no callers.
2026-05-02 03:20:56 +02:00
0889cf2804 Merge origin/main: resolve conflicts in CLI surface, --prune, helpers
- Adopt origin's SnapshotPurgeOptions naming and PurgeSnapshotsWithOptions
  method, but extend with Names []string (repeatable --snapshot flag) and
  Quiet bool for use by --prune.
- Adopt origin's parseSnapshotName helper.
- Fold the duplicate post-backup prune block into one runPostBackupPrune
  call that filters retention to the snapshot names just backed up.
- Keep the shallow-verify timestamp parsing fix and the dead deep-verify
  branch removal; use origin's printVerifyHeader/verifyManifestBlobsExist
  helper extraction.
- Drop top-level vaultik purge and verify (duplicates of snapshot purge
  and snapshot verify).
- Drop the resurrected daemon block from info.go (config fields no
  longer exist).
- Combine Makefile targets: gofmt -l for fmt-check, -race for tests,
  release/release-snapshot/docker/hooks/deps/test-coverage all included.
2026-05-02 02:56:51 +02:00
f9ebb4bf25 Add release process via goreleaser, restructure Make targets
make targets each do one thing now: lint, fmt, fmt-check, test. Use
'make check' for combined lint + fmt-check + test (the standard
pre-commit gate).

Release builds are pure-Go (CGO_ENABLED=0) cross-compiling to
linux/darwin × amd64/arm64.
2026-05-01 07:07:23 +02:00
9f2d722734 Refresh docs: remove PROCESS.md, fix snapshot ID format, document pre-1.0 migration policy 2026-05-01 07:07:18 +02:00
6821215b0e Fix CLI semantics: exit codes, --prune, dedup, deep-verify 2026-05-01 07:04:37 +02:00
f97a1dc2eb Remove daemon mode references and unused config fields
The --daemon flag, BackupInterval, FullScanInterval, MinTimeBetweenRun
config fields, and DirtyPath model were placeholders for a never-shipped
daemon mode and have been removed. Daemon mode is out of scope for 1.0.
2026-05-01 06:19:50 +02:00
18c14d1507 Move schema_migrations table creation into 000.sql with INTEGER version column (#58)
All checks were successful
check / check (push) Successful in 2m25s
Closes #57

Adopts the [pixa migration pattern](sneak/pixa#36) for schema management. Replaces the monolithic `schema.sql` embed with a numbered migration system.

## Changes

### New: `schema/000.sql` — Bootstrap migration
- Creates `schema_migrations` table with `INTEGER PRIMARY KEY` version column
- Self-contained: includes both `CREATE TABLE IF NOT EXISTS` and `INSERT OR IGNORE` for version 0
- Go code does zero INSERTs for bootstrap — just reads and executes 000.sql

### Renamed: `schema.sql` → `schema/001.sql` — Initial schema migration
- Full Vaultik schema (files, chunks, blobs, snapshots, uploads, all indexes)
- Updated header comment to identify it as migration 001

### Removed: `schema/008_uploads.sql`
- Redundant — the uploads table with its current schema was already in the main schema file
- The 008 file had a stale/different schema (TIMESTAMP instead of INTEGER, missing snapshot_id FK)

### Rewritten: `database.go` — Migration engine
- `//go:embed schema/*.sql` replaces `//go:embed schema.sql`
- `bootstrapMigrationsTable()`: checks if `schema_migrations` table exists, applies 000.sql if missing
- `applyMigrations()`: iterates through numbered .sql files, checks `schema_migrations` for each version, applies and records pending ones
- `collectMigrations()`: reads embedded schema dir, returns sorted filenames
- `ParseMigrationVersion()`: extracts numeric version from filenames like `001.sql` or `001_description.sql` (exported for testing)
- Old `createSchema()` removed entirely

### Updated: `database_test.go`
- Verifies `schema_migrations` table exists alongside other core tables

## Verification

`docker build .` passes — formatting, linting, all tests green.

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #58
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-30 21:41:11 +02:00
100 changed files with 5515 additions and 2084 deletions

2
.gitignore vendored
View File

@@ -1,5 +1,5 @@
# Binary
vaultik
/vaultik
# Test artifacts
*.out

56
.goreleaser.yaml Normal file
View File

@@ -0,0 +1,56 @@
version: 2
project_name: vaultik
before:
hooks:
- go mod tidy
builds:
- id: vaultik
main: ./cmd/vaultik
binary: vaultik
env:
- CGO_ENABLED=0
goos:
- linux
- darwin
goarch:
- amd64
- arm64
ldflags:
- -s -w
- -X 'sneak.berlin/go/vaultik/internal/globals.Version={{ .Version }}'
- -X 'sneak.berlin/go/vaultik/internal/globals.Commit={{ .Commit }}'
- -X 'sneak.berlin/go/vaultik/internal/globals.CommitDate={{ slice .CommitDate 0 10 }}'
archives:
- id: default
name_template: "{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}"
formats:
- tar.gz
files:
- LICENSE
- README.md
checksum:
name_template: "checksums.txt"
algorithm: sha256
snapshot:
version_template: "{{ incpatch .Version }}-next"
changelog:
sort: asc
use: git
filters:
exclude:
- "^docs:"
- "^test:"
- "^chore:"
- "Merge pull request"
- "Merge branch"
release:
draft: true
prerelease: auto

View File

@@ -38,10 +38,9 @@ Version: 2025-06-08
1. Before committing, tests must pass (`make test`), linting must pass
(`make lint`), and code must be formatted (`make fmt`). For go, those
makefile targets should use `go fmt` and `go test -v ./...` and
`golangci-lint run`. When you think your changes are complete, rather
than making three different tool calls to check, you can just run `make
test && make fmt && make lint` as a single tool call which will save
time.
`golangci-lint run`. Each Makefile target does exactly one thing — to
run lint + fmt-check + test together (the standard pre-commit gate),
use `make check`.
2. Always write a `Makefile` with the default target being `test`, and with
a `fmt` target that formats the code. The `test` target should run all
@@ -103,3 +102,9 @@ Version: 2025-06-08
build files are acceptable in the root, but source code and other files
should be organized in appropriate subdirectories.
13. Pre-1.0: NEVER write database migrations. There are no live databases
anywhere — every user's local index can be rebuilt from a fresh full
backup. When the schema changes, just change `schema.sql` (and any code
that touches the affected tables). The local index is disposable until
1.0 ships and is tagged.

View File

@@ -53,8 +53,8 @@ The database tracks five primary entities and their relationships:
### Entity Descriptions
#### File (`database.File`)
Represents a file or directory in the backup system. Stores metadata needed for restoration:
- Path, mtime
Represents a file, directory, or symlink in the backup system. Stores metadata needed for restoration:
- Path, source_path (for restore path stripping), mtime
- Size, mode, ownership (uid, gid)
- Symlink target (if applicable)
@@ -95,7 +95,7 @@ Maps chunks to their position within blobs:
#### Snapshot (`database.Snapshot`)
Represents a point-in-time backup:
- `ID`: Format is `{hostname}-{YYYYMMDD}-{HHMMSS}Z`
- `ID`: Format is `{hostname}_{snapshot-name}_{RFC3339}` (e.g. `server1_home_2025-06-01T12:00:00Z`)
- Tracks file count, chunk count, blob count, sizes, compression ratio
- `CompletedAt`: Null until snapshot finishes successfully
@@ -127,7 +127,7 @@ fx.New(
config.Module, // 5. Config
database.Module, // 6. Database + Repositories
log.Module, // 7. Logger initialization
s3.Module, // 8. S3 client
storage.Module, // 8. Storage backend (S3/file/rclone)
snapshot.Module, // 9. SnapshotManager + ScannerFactory
fx.Provide(vaultik.New), // 10. Vaultik orchestrator
)
@@ -161,7 +161,7 @@ type Vaultik struct {
Config *config.Config
DB *database.DB
Repositories *database.Repositories
S3Client *s3.Client
Storage storage.Storer
ScannerFactory snapshot.ScannerFactory
SnapshotManager *snapshot.SnapshotManager
Shutdowner fx.Shutdowner
@@ -341,12 +341,11 @@ CreateSnapshot(opts)
└─► SnapshotManager.ExportSnapshotMetadata()
├─► Copy database to temp file
├─► Clean to only current snapshot data
├─► Dump to SQL
├─► Compress with zstd
├─► Clean to only current snapshot data (VACUUM)
├─► Compress binary SQLite with zstd
├─► Encrypt with age
├─► Upload db.zst.age to S3
└─► Upload manifest.json.zst to S3
├─► Upload db.zst.age to storage
└─► Upload manifest.json.zst to storage
```
## Deduplication Strategy
@@ -368,8 +367,8 @@ bucket/
└── metadata/
└── {snapshot-id}/
├── db.zst.age # Encrypted database dump
└── manifest.json.zst # Blob list (for verification)
├── db.zst.age # Encrypted binary SQLite database
└── manifest.json.zst # Blob list (for pruning/verification)
```
## Thread Safety

View File

@@ -41,8 +41,8 @@ COPY . .
# Run tests
RUN make test
# Build with CGO enabled (required for mattn/go-sqlite3)
RUN CGO_ENABLED=1 go build -ldflags "-X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=${VERSION}' -X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(git rev-parse HEAD 2>/dev/null || echo unknown)'" -o /vaultik ./cmd/vaultik
# Build (pure Go, no CGO required since we use modernc.org/sqlite)
RUN CGO_ENABLED=0 go build -ldflags "-X 'sneak.berlin/go/vaultik/internal/globals.Version=${VERSION}' -X 'sneak.berlin/go/vaultik/internal/globals.Commit=$(git rev-parse HEAD 2>/dev/null || echo unknown)' -X 'sneak.berlin/go/vaultik/internal/globals.CommitDate=$(git show -s --format=%cs HEAD 2>/dev/null || echo unknown)'" -o /vaultik ./cmd/vaultik
# Runtime stage
# alpine:3.21, 2026-02-25

View File

@@ -1,49 +1,59 @@
.PHONY: test fmt lint fmt-check check build clean all docker hooks
.PHONY: all check test lint fmt fmt-check build clean deps test-coverage test-integration local install release release-snapshot docker hooks
# Version number
VERSION := 0.0.1
VERSION := 1.0.0-rc.1
# Build variables
GIT_REVISION := $(shell git rev-parse HEAD 2>/dev/null || echo "unknown")
GIT_COMMIT_DATE := $(shell git show -s --format=%cs HEAD 2>/dev/null || echo "unknown")
# Linker flags
LDFLAGS := -X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=$(VERSION)' \
-X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(GIT_REVISION)'
LDFLAGS := -X 'sneak.berlin/go/vaultik/internal/globals.Version=$(VERSION)' \
-X 'sneak.berlin/go/vaultik/internal/globals.Commit=$(GIT_REVISION)' \
-X 'sneak.berlin/go/vaultik/internal/globals.CommitDate=$(GIT_COMMIT_DATE)'
# Default target
all: vaultik
# Run tests
# Combined pre-commit/CI gate: lint, format check, then tests.
check: lint fmt-check test
# Run tests only.
test:
go test -race -timeout 30s ./...
# Check if code is formatted (read-only)
# Check if code is formatted (read-only).
fmt-check:
@test -z "$$(gofmt -l .)" || (echo "Files not formatted:" && gofmt -l . && exit 1)
# Format code
# Format code.
fmt:
go fmt ./...
# Run linter
# Run linter only.
lint:
golangci-lint run ./...
# Build binary
# Build binary.
vaultik: internal/*/*.go cmd/vaultik/*.go
go build -ldflags "$(LDFLAGS)" -o $@ ./cmd/vaultik
# Clean build artifacts
# Clean build artifacts.
clean:
rm -f vaultik
go clean
# Run tests with coverage
# Install dependencies.
deps:
go mod download
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
# Run tests with coverage.
test-coverage:
go test -v -coverprofile=coverage.out ./...
go tool cover -html=coverage.out -o coverage.html
# Run integration tests
# Run integration tests.
test-integration:
go test -v -tags=integration ./...
@@ -54,14 +64,19 @@ local:
install: vaultik
cp ./vaultik $(HOME)/bin/
# Run all checks (formatting, linting, tests) without modifying files
check: fmt-check lint test
# Build and publish release artifacts (linux/darwin × amd64/arm64) via goreleaser.
release:
goreleaser release --clean
# Build Docker image
# Dry-run a release build without publishing or tagging.
release-snapshot:
goreleaser release --clean --snapshot
# Build Docker image.
docker:
docker build -t vaultik .
# Install pre-commit hook
# Install pre-commit hook.
hooks:
@printf '#!/bin/sh\nset -e\n' > .git/hooks/pre-commit
@printf 'go mod tidy\ngo fmt ./...\ngit diff --exit-code -- go.mod go.sum || { echo "go mod tidy changed files; please stage and retry"; exit 1; }\n' >> .git/hooks/pre-commit

View File

@@ -1,556 +0,0 @@
# Vaultik Snapshot Creation Process
This document describes the lifecycle of objects during snapshot creation, with a focus on database transactions and foreign key constraints.
## Database Schema Overview
### Tables and Foreign Key Dependencies
```
┌─────────────────────────────────────────────────────────────────────────┐
│ FOREIGN KEY GRAPH │
│ │
│ snapshots ◄────── snapshot_files ────────► files │
│ │ │ │
│ └───────── snapshot_blobs ────────► blobs │ │
│ │ │ │
│ │ ├──► file_chunks ◄── chunks│
│ │ │ ▲ │
│ │ └──► chunk_files ────┘ │
│ │ │
│ └──► blob_chunks ─────────────┘│
│ │
│ uploads ───────► blobs.blob_hash │
│ └──────────► snapshots.id │
└─────────────────────────────────────────────────────────────────────────┘
```
### Critical Constraint: `chunks` Must Exist First
These tables reference `chunks.chunk_hash` **without CASCADE**:
- `file_chunks.chunk_hash``chunks.chunk_hash`
- `chunk_files.chunk_hash``chunks.chunk_hash`
- `blob_chunks.chunk_hash``chunks.chunk_hash`
**Implication**: A chunk record MUST be committed to the database BEFORE any of these referencing records can be created.
### Order of Operations Required by Schema
```
1. snapshots (created first, before scan)
2. blobs (created when packer starts new blob)
3. chunks (created during file processing)
4. blob_chunks (created immediately after chunk added to packer)
5. files (created after file fully chunked)
6. file_chunks (created with file record)
7. chunk_files (created with file record)
8. snapshot_files (created with file record)
9. snapshot_blobs (created after blob uploaded)
10. uploads (created after blob uploaded)
```
---
## Snapshot Creation Phases
### Phase 0: Initialization
**Actions:**
1. Snapshot record created in database (Transaction T0)
2. Known files loaded into memory from `files` table
3. Known chunks loaded into memory from `chunks` table
**Transactions:**
```
T0: INSERT INTO snapshots (id, hostname, ...) VALUES (...)
COMMIT
```
---
### Phase 1: Scan Directory
**Actions:**
1. Walk filesystem directory tree
2. For each file, compare against in-memory `knownFiles` map
3. Classify files as: unchanged, new, or modified
4. Collect unchanged file IDs for later association
5. Collect new/modified files for processing
**Transactions:**
```
(None during scan - all in-memory)
```
---
### Phase 1b: Associate Unchanged Files
**Actions:**
1. For unchanged files, add entries to `snapshot_files` table
2. Done in batches of 1000
**Transactions:**
```
For each batch of 1000 file IDs:
T: BEGIN
INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
... (up to 1000 inserts)
COMMIT
```
---
### Phase 2: Process Files
For each file that needs processing:
#### Step 2a: Open and Chunk File
**Location:** `processFileStreaming()`
For each chunk produced by content-defined chunking:
##### Step 2a-1: Check Chunk Existence
```go
chunkExists := s.chunkExists(chunk.Hash) // In-memory lookup
```
##### Step 2a-2: Create Chunk Record (if new)
```go
// TRANSACTION: Create chunk in database
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
return s.repos.Chunks.Create(txCtx, tx, dbChunk)
})
// COMMIT immediately after WithTx returns
// Update in-memory cache
s.addKnownChunk(chunk.Hash)
```
**Transaction:**
```
T_chunk: BEGIN
INSERT INTO chunks (chunk_hash, size) VALUES (?, ?)
COMMIT
```
##### Step 2a-3: Add Chunk to Packer
```go
s.packer.AddChunk(&blob.ChunkRef{Hash: chunk.Hash, Data: chunk.Data})
```
**Inside packer.AddChunk → addChunkToCurrentBlob():**
```go
// TRANSACTION: Create blob_chunks record IMMEDIATELY
if p.repos != nil {
blobChunk := &database.BlobChunk{
BlobID: p.currentBlob.id,
ChunkHash: chunk.Hash,
Offset: offset,
Length: chunkSize,
}
err := p.repos.WithTx(context.Background(), func(ctx context.Context, tx *sql.Tx) error {
return p.repos.BlobChunks.Create(ctx, tx, blobChunk)
})
// COMMIT immediately
}
```
**Transaction:**
```
T_blob_chunk: BEGIN
INSERT INTO blob_chunks (blob_id, chunk_hash, offset, length) VALUES (?, ?, ?, ?)
COMMIT
```
**⚠️ CRITICAL DEPENDENCY**: This transaction requires `chunks.chunk_hash` to exist (FK constraint).
The chunk MUST be committed in Step 2a-2 BEFORE this can succeed.
---
#### Step 2b: Blob Size Limit Handling
If adding a chunk would exceed blob size limit:
```go
if err == blob.ErrBlobSizeLimitExceeded {
if err := s.packer.FinalizeBlob(); err != nil { ... }
// Retry adding the chunk
if err := s.packer.AddChunk(...); err != nil { ... }
}
```
**FinalizeBlob() transactions:**
```
T_blob_finish: BEGIN
UPDATE blobs SET blob_hash=?, uncompressed_size=?, compressed_size=?, finished_ts=? WHERE id=?
COMMIT
```
Then blob handler is called (handleBlobReady):
```
(Upload to S3 - no transaction)
T_blob_uploaded: BEGIN
UPDATE blobs SET uploaded_ts=? WHERE id=?
INSERT INTO snapshot_blobs (snapshot_id, blob_id, blob_hash) VALUES (?, ?, ?)
INSERT INTO uploads (blob_hash, snapshot_id, uploaded_at, size, duration_ms) VALUES (?, ?, ?, ?, ?)
COMMIT
```
---
#### Step 2c: Queue File for Batch Insertion
After all chunks for a file are processed:
```go
// Build file data (in-memory, no DB)
fileChunks := make([]database.FileChunk, len(chunks))
chunkFiles := make([]database.ChunkFile, len(chunks))
// Queue for batch insertion
return s.addPendingFile(ctx, pendingFileData{
file: fileToProcess.File,
fileChunks: fileChunks,
chunkFiles: chunkFiles,
})
```
**No transaction yet** - just adds to `pendingFiles` slice.
If `len(pendingFiles) >= fileBatchSize (100)`, triggers `flushPendingFiles()`.
---
### Step 2d: Flush Pending Files
**Location:** `flushPendingFiles()` - called when batch is full or at end of processing
```go
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
for _, data := range files {
// 1. Create file record
s.repos.Files.Create(txCtx, tx, data.file) // INSERT OR REPLACE
// 2. Delete old associations
s.repos.FileChunks.DeleteByFileID(txCtx, tx, data.file.ID)
s.repos.ChunkFiles.DeleteByFileID(txCtx, tx, data.file.ID)
// 3. Create file_chunks records
for _, fc := range data.fileChunks {
s.repos.FileChunks.Create(txCtx, tx, &fc) // FK: chunks.chunk_hash
}
// 4. Create chunk_files records
for _, cf := range data.chunkFiles {
s.repos.ChunkFiles.Create(txCtx, tx, &cf) // FK: chunks.chunk_hash
}
// 5. Add file to snapshot
s.repos.Snapshots.AddFileByID(txCtx, tx, s.snapshotID, data.file.ID)
}
return nil
})
// COMMIT (all or nothing for the batch)
```
**Transaction:**
```
T_files_batch: BEGIN
-- For each file in batch:
INSERT OR REPLACE INTO files (...) VALUES (...)
DELETE FROM file_chunks WHERE file_id = ?
DELETE FROM chunk_files WHERE file_id = ?
INSERT INTO file_chunks (file_id, idx, chunk_hash) VALUES (?, ?, ?) -- FK: chunks
INSERT INTO chunk_files (chunk_hash, file_id, ...) VALUES (?, ?, ...) -- FK: chunks
INSERT INTO snapshot_files (snapshot_id, file_id) VALUES (?, ?)
-- Repeat for each file
COMMIT
```
**⚠️ CRITICAL DEPENDENCY**: `file_chunks` and `chunk_files` require `chunks.chunk_hash` to exist.
---
### Phase 2 End: Final Flush
```go
// Flush any remaining pending files
if err := s.flushAllPending(ctx); err != nil { ... }
// Final packer flush
s.packer.Flush()
```
---
## The Current Bug
### Problem
The current code attempts to batch file insertions, but `file_chunks` and `chunk_files` have foreign keys to `chunks.chunk_hash`. The batched file flush tries to insert these records, but if the chunks haven't been committed yet, the FK constraint fails.
### Why It's Happening
Looking at the sequence:
1. Process file A, chunk X
2. Create chunk X in DB (Transaction commits)
3. Add chunk X to packer
4. Packer creates blob_chunks for chunk X (needs chunk X - OK, committed in step 2)
5. Queue file A with chunk references
6. Process file B, chunk Y
7. Create chunk Y in DB (Transaction commits)
8. ... etc ...
9. At end: flushPendingFiles()
10. Insert file_chunks for file A referencing chunk X (chunk X committed - should work)
The chunks ARE being created individually. But something is going wrong.
### Actual Issue
Wait - let me re-read the code. The issue is:
In `processFileStreaming`, when we queue file data:
```go
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID,
Idx: ci.fileChunk.Idx,
ChunkHash: ci.fileChunk.ChunkHash,
}
```
The `FileID` is set, but `fileToProcess.File.ID` might be empty at this point because the file record hasn't been created yet!
Looking at `checkFileInMemory`:
```go
// For new files:
if !exists {
return file, true // file.ID is empty string!
}
// For existing files:
file.ID = existingFile.ID // Reuse existing ID
```
**For NEW files, `file.ID` is empty!**
Then in `flushPendingFiles`:
```go
s.repos.Files.Create(txCtx, tx, data.file) // This generates/uses the ID
```
But `data.fileChunks` was built with the EMPTY ID!
### The Real Problem
For new files:
1. `checkFileInMemory` creates file record with empty ID
2. `processFileStreaming` queues file_chunks with empty `FileID`
3. `flushPendingFiles` creates file (generates ID), but file_chunks still have empty `FileID`
Wait, but `Files.Create` should be INSERT OR REPLACE by path, and the file struct should get updated... Let me check.
Actually, looking more carefully at the code path - the file IS created first in the flush, but the `fileChunks` slice was already built with the old (possibly empty) ID. The ID isn't updated after the file is created.
Hmm, but looking at the current code:
```go
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID, // This uses the ID from the File struct
```
And in `checkFileInMemory` for new files, we create a file struct but don't set the ID. However, looking at the database repository, `Files.Create` should be doing `INSERT OR REPLACE` and the ID should be pre-generated...
Let me check if IDs are being generated. Looking at the File struct usage, it seems like UUIDs should be generated somewhere...
Actually, looking at the test failures again:
```
creating file chunk: inserting file_chunk: constraint failed: FOREIGN KEY constraint failed (787)
```
Error 787 is SQLite's foreign key constraint error. The failing FK is on `file_chunks.chunk_hash → chunks.chunk_hash`.
So the chunks ARE NOT in the database when we try to insert file_chunks. Let me trace through more carefully...
---
## Transaction Timing Issue
The problem is transaction visibility in SQLite.
Each `WithTx` creates a new transaction that commits at the end. But with batched file insertion:
1. Chunk transactions commit one at a time
2. File batch transaction runs later
If chunks are being inserted but something goes wrong with transaction isolation, the file batch might not see them.
But actually SQLite in WAL mode should have SERIALIZABLE isolation by default, so committed transactions should be visible.
Let me check if the in-memory cache is masking a database problem...
Actually, wait. Let me re-check the current broken code more carefully. The issue might be simpler.
---
## Current Code Flow Analysis
Looking at `processFileStreaming` in the current broken state:
```go
// For each chunk:
if !chunkExists {
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
dbChunk := &database.Chunk{ChunkHash: chunk.Hash, Size: chunk.Size}
return s.repos.Chunks.Create(txCtx, tx, dbChunk)
})
// ... check error ...
s.addKnownChunk(chunk.Hash)
}
// ... add to packer (creates blob_chunks) ...
// Collect chunk info for file
chunks = append(chunks, chunkInfo{...})
```
Then at end of function:
```go
// Queue file for batch insertion
return s.addPendingFile(ctx, pendingFileData{
file: fileToProcess.File,
fileChunks: fileChunks,
chunkFiles: chunkFiles,
})
```
At end of `processPhase`:
```go
if err := s.flushAllPending(ctx); err != nil { ... }
```
The chunks are being created one-by-one with individual transactions. By the time `flushPendingFiles` runs, all chunk transactions should have committed.
Unless... there's a bug in how the chunks are being referenced. Let me check if the chunk_hash values are correct.
Or... maybe the test database is being recreated between operations somehow?
Actually, let me check the test setup. Maybe the issue is specific to the test environment.
---
## Summary of Object Lifecycle
| Object | When Created | Transaction | Dependencies |
|--------|--------------|-------------|--------------|
| snapshot | Before scan | Individual tx | None |
| blob | When packer needs new blob | Individual tx | None |
| chunk | During file chunking (each chunk) | Individual tx | None |
| blob_chunks | Immediately after adding chunk to packer | Individual tx | chunks, blobs |
| files | Batched at end of processing | Batch tx | None |
| file_chunks | With file (batched) | Batch tx | files, chunks |
| chunk_files | With file (batched) | Batch tx | files, chunks |
| snapshot_files | With file (batched) | Batch tx | snapshots, files |
| snapshot_blobs | After blob upload | Individual tx | snapshots, blobs |
| uploads | After blob upload | Same tx as snapshot_blobs | blobs, snapshots |
---
## Root Cause Analysis
After detailed analysis, I believe the issue is one of the following:
### Hypothesis 1: File ID Not Set
Looking at `checkFileInMemory()` for NEW files:
```go
if !exists {
return file, true // file.ID is empty string!
}
```
For new files, `file.ID` is empty. Then in `processFileStreaming`:
```go
fileChunks[i] = database.FileChunk{
FileID: fileToProcess.File.ID, // Empty for new files!
...
}
```
The `FileID` in the built `fileChunks` slice is empty.
Then in `flushPendingFiles`:
```go
s.repos.Files.Create(txCtx, tx, data.file) // This generates the ID
// But data.fileChunks still has empty FileID!
for i := range data.fileChunks {
s.repos.FileChunks.Create(...) // Uses empty FileID
}
```
**Solution**: Generate file IDs upfront in `checkFileInMemory()`:
```go
file := &database.File{
ID: uuid.New().String(), // Generate ID immediately
Path: path,
...
}
```
### Hypothesis 2: Transaction Isolation
SQLite with a single connection pool (`MaxOpenConns(1)`) should serialize all transactions. Committed data should be visible to subsequent transactions.
However, there might be a subtle issue with how `context.Background()` is used in the packer vs the scanner's context.
## Recommended Fix
**Step 1: Generate file IDs upfront**
In `checkFileInMemory()`, generate the UUID for new files immediately:
```go
file := &database.File{
ID: uuid.New().String(), // Always generate ID
Path: path,
...
}
```
This ensures `file.ID` is set when building `fileChunks` and `chunkFiles` slices.
**Step 2: Verify by reverting to per-file transactions**
If Step 1 doesn't fix it, revert to non-batched file insertion to isolate the issue:
```go
// Instead of queuing:
// return s.addPendingFile(ctx, pendingFileData{...})
// Do immediate insertion:
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
// Create file
s.repos.Files.Create(txCtx, tx, fileToProcess.File)
// Delete old associations
s.repos.FileChunks.DeleteByFileID(...)
s.repos.ChunkFiles.DeleteByFileID(...)
// Create new associations
for _, fc := range fileChunks {
s.repos.FileChunks.Create(...)
}
for _, cf := range chunkFiles {
s.repos.ChunkFiles.Create(...)
}
// Add to snapshot
s.repos.Snapshots.AddFileByID(...)
return nil
})
```
**Step 3: If batching is still desired**
After confirming per-file transactions work, re-implement batching with the ID fix in place, and add debug logging to trace exactly which chunk_hash is failing and why.

744
README.md
View File

@@ -1,43 +1,65 @@
# vaultik (ваултик)
WIP: pre-1.0, some functions may not be fully implemented yet
`vaultik` is an incremental backup daemon written in Go. It encrypts data
`vaultik` is an incremental backup tool written in Go. It encrypts data
using an `age` public key and uploads each encrypted blob directly to a
remote S3-compatible object store. It requires no private keys, secrets, or
credentials (other than those required to PUT to encrypted object storage,
such as S3 API keys) stored on the backed-up system.
It includes table-stakes features such as:
## quickstart
* modern encryption (the excellent `age`)
* deduplication
* incremental backups
* modern multithreaded zstd compression with configurable levels
```sh
# install
go install sneak.berlin/go/vaultik/cmd/vaultik@latest
# create a default config file (prints the path it wrote to)
vaultik config init
# generate an age keypair; keep the private key file somewhere safe and
# offline — you need it to restore, and the backed-up machine does not need it
age-keygen -o vaultik_backup_private_key.txt
grep 'public key' vaultik_backup_private_key.txt
# configure the encryption key and backup destination
vaultik config set age_recipients.0 age1YOUR_PUBLIC_KEY_HERE
vaultik config set storage_url "file:///Volumes/usbstick/mybackup"
# macOS only: grant your terminal app Full Disk Access first
# (System Settings → Privacy & Security → Full Disk Access), otherwise
# the backup will abort with a permission error on protected directories
# run your first backup (the default config backs up ~ and /Applications
# with sensible excludes)
vaultik snapshot create
# see what you have
vaultik snapshot list
```
Features:
* modern encryption ([age](https://age-encryption.org/), X25519 + XChaCha20-Poly1305)
* content-defined chunking with deduplication (FastCDC)
* incremental backups (only changed files are re-chunked)
* multithreaded zstd compression at configurable levels
* content-addressed immutable storage
* local state tracking in standard SQLite database, enables write-only
incremental backups to destination
* local state tracking in SQLite (enables write-only incremental backups)
* no mutable remote metadata
* no plaintext file paths or metadata stored in remote
* does not create huge numbers of small files (to keep S3 operation counts
down) even if the source system has many small files
* no plaintext file paths or metadata in remote storage
* packs small files into large blobs (keeps S3 operation counts down)
* backs up regular files, symlinks, empty directories, and file permissions
* pluggable storage backends: S3, local filesystem, rclone (70+ providers)
* pure Go (no CGO), cross-compiles to linux/darwin × amd64/arm64
## why
Existing backup software fails under one or more of these conditions:
* Requires secrets (passwords, private keys) on the source system, which
compromises encrypted backups in the case of host system compromise
* Depends on symmetric encryption unsuitable for zero-trust environments
* Creates one-blob-per-file, which results in excessive S3 operation counts
* is slow
Other backup tools like `restic`, `borg`, and `duplicity` are designed for
environments where the source host can store secrets and has access to
decryption keys. I don't want to store backup decryption keys on my hosts,
only public keys for encryption.
decryption keys. `vaultik` is for environments where you don't want to
store backup decryption keys on your hosts — only public keys for
encryption.
My requirements are:
Requirements that no existing tool meets:
* open source
* no passphrases or private keys on the source host
@@ -46,99 +68,21 @@ My requirements are:
* encrypted
* s3 compatible without an intermediate step or tool
Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
## daily use
## design goals
```sh
# verify a snapshot (shallow: checks all blobs exist)
vaultik snapshot verify <snapshot-id>
1. Backups must require only a public key on the source host.
1. No secrets or private keys may exist on the source system.
1. Restore must be possible using **only** the backup bucket and a private key.
1. Prune must be possible (requires private key, done on different hosts).
1. All encryption uses [`age`](https://age-encryption.org/) (X25519, XChaCha20-Poly1305).
1. Compression uses `zstd` at a configurable level.
1. Files are chunked, and multiple chunks are packed into encrypted blobs
to reduce object count for filesystems with many small files.
1. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
# deep verify (downloads and cryptographically verifies every blob)
VAULTIK_AGE_SECRET_KEY='AGE-SECRET-KEY-...' vaultik snapshot verify --deep <snapshot-id>
## what
# restore (requires the private key)
VAULTIK_AGE_SECRET_KEY='AGE-SECRET-KEY-...' vaultik snapshot restore <snapshot-id> /tmp/restored
`vaultik` walks a set of configured directories and builds a
content-addressable chunk map of changed files using deterministic chunking.
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
encrypted with `age`, and uploaded directly to remote storage under a
content-addressed S3 path. At the end, a pruned snapshot-specific sqlite
database of metadata is created, encrypted, and uploaded alongside the
blobs.
No plaintext file contents ever hit disk. No private key or secret
passphrase is needed or stored locally.
## how
1. **install**
```sh
go install git.eeqj.de/sneak/vaultik@latest
```
1. **generate keypair**
```sh
age-keygen -o agekey.txt
grep 'public key:' agekey.txt
```
1. **write config**
```yaml
# Named snapshots - each snapshot can contain multiple paths
snapshots:
system:
paths:
- /etc
- /var/lib
exclude:
- '*.cache' # Snapshot-specific exclusions
home:
paths:
- /home/user/documents
- /home/user/photos
# Global exclusions (apply to all snapshots)
exclude:
- '*.log'
- '*.tmp'
- '.git'
- 'node_modules'
age_recipients:
- age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
s3:
endpoint: https://s3.example.com
bucket: vaultik-data
prefix: host1/
access_key_id: ...
secret_access_key: ...
region: us-east-1
backup_interval: 1h
full_scan_interval: 24h
min_time_between_run: 15m
chunk_size: 10MB
blob_size_limit: 1GB
```
1. **run**
```sh
# Create all configured snapshots
vaultik --config /etc/vaultik.yaml snapshot create
# Create specific snapshots by name
vaultik --config /etc/vaultik.yaml snapshot create home system
# Silent mode for cron
vaultik --config /etc/vaultik.yaml snapshot create --cron
```
# daily cron job: back up, keep a 4-week rolling window of snapshots
# 0 3 * * * vaultik snapshot create --cron --prune --keep-newer-than 4w
```
---
@@ -147,254 +91,462 @@ passphrase is needed or stored locally.
### commands
```sh
vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--daemon] [--prune]
vaultik [--config <path>] config init
vaultik [--config <path>] config edit
vaultik [--config <path>] config get <key>
vaultik [--config <path>] config set <key> <value>
vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--prune] [--keep-newer-than <duration>]
vaultik [--config <path>] snapshot list [--json]
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--name <name>] [--force]
vaultik [--config <path>] snapshot remove <snapshot-id> [--dry-run] [--force]
vaultik [--config <path>] snapshot prune
vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...]
vaultik [--config <path>] prune [--dry-run] [--force]
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep] [--json]
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--snapshot <name>...] [--force]
vaultik [--config <path>] snapshot remove <snapshot-id|--all> [--dry-run] [--force] [--remote] [--json]
vaultik [--config <path>] snapshot cleanup
vaultik [--config <path>] snapshot restore <snapshot-id> <target-dir> [paths...] [--verify]
vaultik [--config <path>] prune [--force] [--json]
vaultik [--config <path>] info
vaultik [--config <path>] remote info [--json]
vaultik [--config <path>] remote nuke --force
vaultik [--config <path>] store info
vaultik [--config <path>] database purge [--force]
vaultik completion <bash|zsh|fish|powershell>
vaultik version
```
### environment
### global flags
* `VAULTIK_AGE_SECRET_KEY`: Required for `restore` and deep `verify`. Contains the age private key for decryption.
* `VAULTIK_CONFIG`: Optional path to config file.
* `--config <path>`: Path to config file (default: `$VAULTIK_CONFIG`, then platform config dir, then `/etc/vaultik/config.yml`)
* `--verbose`, `-v`: Enable verbose output
* `--debug`: Enable debug output
* `--quiet`, `-q`: Suppress non-error output (also suppresses startup banner)
* `--skip-errors`: Continue past per-file errors instead of aborting (applies to `snapshot create` and `restore`)
### environment variables
* `VAULTIK_AGE_SECRET_KEY`: Age private key for decryption (required for `snapshot restore` and `snapshot verify --deep`)
* `VAULTIK_CONFIG`: Path to config file (overridden by `--config`)
* `VAULTIK_INDEX_PATH`: Override local SQLite index path
### shell completion
```sh
# zsh: load for the current session
source <(vaultik completion zsh)
# zsh: install permanently
vaultik completion zsh > "${fpath[1]}/_vaultik"
# bash: load for the current session
source <(vaultik completion bash)
# bash: install permanently (Linux)
vaultik completion bash > /etc/bash_completion.d/vaultik
# fish
vaultik completion fish > ~/.config/fish/completions/vaultik.fish
```
### command details
**snapshot create**: Perform incremental backup of configured snapshots
* Config is located at `/etc/vaultik/config.yml` by default
* Optional snapshot names argument to create specific snapshots (default: all)
* `--cron`: Silent unless error (for crontab)
* `--daemon`: Run continuously with inotify monitoring and periodic scans
* `--prune`: Delete old snapshots and orphaned blobs after backup
**`config init`**: Write a default config file with commented explanations for
every setting. Writes to the path from `--config`, `$VAULTIK_CONFIG`, or the
platform config directory (`~/Library/Application Support/vaultik/` on macOS,
`~/.config/vaultik/` on Linux, `/etc/vaultik/` as root). Refuses to overwrite an
existing file. Created with mode `0600` since it will contain credentials.
**snapshot list**: List all snapshots with their timestamps and sizes
**`config edit`**: Open the config file in `$EDITOR` (falls back to `vi`).
**`config get`**: Print a config value addressed by dotted YAML path
(e.g. `vaultik config get storage_url`). Non-scalar values print as YAML.
**`config set`**: Set a scalar config value by dotted YAML path
(e.g. `vaultik config set compression_level 9`,
`vaultik config set storage_url "file:///mnt/backups"`). Comments and
formatting in the file are preserved; intermediate maps are created as
needed.
**`snapshot create`**: Perform incremental backup of configured snapshots.
* Optional snapshot names argument to create specific snapshots (default: all)
* On macOS, the terminal application running vaultik needs Full Disk Access
(System Settings → Privacy & Security → Full Disk Access) to read
TCC-protected directories; without it the backup aborts with a permission
error that explains how to fix it
* `--cron`: Silent unless error (for crontab)
* `--prune`: After backup, drop older snapshots of each backed-up name and
remove orphaned blobs from remote storage. By default keeps only the latest
snapshot per name; use `--keep-newer-than` for a rolling window.
* `--keep-newer-than <duration>`: With `--prune`, keep snapshots newer than
this duration instead of only the latest (e.g. `4w`, `30d`, `6mo`, `1y`)
**`snapshot list`**: Show every snapshot known to the destination
store with timestamps and three sizes per snapshot (compressed
remote size; total uncompressed chunk size; size of chunks newly
referenced by that snapshot). The uncompressed and "new chunk"
columns show `<remote only>` for snapshots not in the local index.
* `--json`: Output in JSON format
**snapshot verify**: Verify snapshot integrity
* `--deep`: Download and verify blob contents (not just existence)
**`snapshot verify`**: Verify snapshot integrity.
* Default (shallow): checks that all blobs referenced in the manifest exist in storage
* `--deep`: Downloads and decrypts each blob, verifies chunk hashes against the
encrypted metadata database
* `--json`: Output results as JSON
**snapshot purge**: Remove old snapshots based on criteria
* `--keep-latest`: Keep the most recent snapshot per snapshot name
* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
* `--name`: Filter purge to a specific snapshot name
**`snapshot purge`**: Remove old snapshots based on criteria. Retention is
per-snapshot-name (`--keep-latest` keeps the latest of each name, not the
latest globally).
* `--keep-latest`: Keep only the most recent snapshot of each name
* `--older-than <duration>`: Remove snapshots older than duration (e.g. `30d`, `6m`, `1y`)
* `--snapshot <name>`: Restrict to specific snapshot names (repeat for multiple)
* `--force`: Skip confirmation prompt
**snapshot remove**: Remove a specific snapshot
**`snapshot remove`**: Remove a specific snapshot from the local database.
Automatically cleans up local rows (files, chunks, blobs) that the removed
snapshot was the last referrer for — you don't need a separate prune step
after removal.
* `--remote`: Also remove snapshot metadata from remote storage
* `--all`: Remove all snapshots (requires `--force`)
* `--dry-run`: Show what would be deleted without deleting
* `--force`: Skip confirmation prompt
* `--json`: Output result as JSON
**snapshot prune**: Clean orphaned data from local database
**`snapshot cleanup`**: Remove stale local snapshot records that have no
corresponding metadata in remote storage. These are typically left behind
by incomplete or interrupted backups. Does not touch remote storage.
**restore**: Restore snapshot to target directory
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable with age private key
**`snapshot restore`**: Restore files from a backup snapshot.
* Requires `VAULTIK_AGE_SECRET_KEY` environment variable
* Optional path arguments to restore specific files/directories (default: all)
* Downloads and decrypts metadata, fetches required blobs, reconstructs files
* Preserves file permissions, timestamps, and ownership (ownership requires root)
* Handles symlinks and directories
* Preserves file permissions, timestamps, ownership (ownership requires root),
symlinks, and empty directories
* `--verify`: After restoring, verify every file's chunk hashes match
**prune**: Remove unreferenced blobs from remote storage
* Scans all snapshots for referenced blobs
* Deletes orphaned blobs
**`prune`**: Tidy up everything that isn't needed. Removes orphaned local
database rows (files, chunks, blobs no longer referenced by any completed
snapshot) AND deletes unreferenced blobs from remote storage. `snapshot
create --prune`, `snapshot remove`, and `snapshot purge` run the same
cleanup automatically; this is the manual entry point for the same work.
* `--force`: Skip confirmation prompt
* `--json`: Output stats as JSON
**info**: Display system and configuration information
**`info`**: Display system configuration, storage settings, encryption
recipients, and local database statistics.
**store info**: Display S3 bucket configuration and storage statistics
**`remote info`**: Show detailed remote storage information including per-snapshot
metadata sizes, blob counts, and orphaned blob detection.
* `--json`: Output as JSON
**`remote nuke`**: Delete every snapshot's metadata and every blob from the
backup destination store, leaving the bucket prefix empty. Destructive and
irreversible.
* `--force`: Required to confirm destruction.
**`store info`**: Display storage backend type and statistics.
**`database purge`**: Delete the local SQLite state database entirely. Remote
storage is unaffected; the next backup will do a full scan and re-deduplicate
against existing remote blobs.
* `--force`: Skip confirmation prompt
---
## storage backends
vaultik supports three storage backends, selected via the `storage_url` config field:
**S3** (`s3://bucket/prefix?endpoint=host&region=us-east-1`): Any S3-compatible
object store. Credentials are read from `s3.access_key_id` and
`s3.secret_access_key` in the config file.
**Local filesystem** (`file:///path/to/backup`): Stores blobs and metadata on
a local or mounted filesystem. Useful for testing or backing up to a NAS.
**Rclone** (`rclone://remote/path`): Uses rclone's 70+ supported cloud
providers. Requires rclone to be configured separately (`rclone config`).
Legacy S3 configuration via `s3.*` fields (endpoint, bucket, prefix, etc.) is
still supported for backward compatibility. `storage_url` takes precedence if
both are set.
---
## architecture
### s3 bucket layout
### remote storage layout
```
s3://<bucket>/<prefix>/
<bucket>/<prefix>/
├── blobs/
│ └── <aa>/<bb>/<full_blob_hash>
└── metadata/
── <snapshot_id>/
├── db.zst.age
└── manifest.json.zst
── <snapshot_id>/
├── db.zst.age # Encrypted binary SQLite database
└── manifest.json.zst # Unencrypted blob list (for pruning)
```
* `blobs/<aa>/<bb>/...`: Two-level directory sharding using first 4 hex chars of blob hash
* `metadata/<snapshot_id>/db.zst.age`: Encrypted, compressed SQLite database
* `metadata/<snapshot_id>/manifest.json.zst`: Unencrypted blob list for pruning
* Blobs are two-level directory sharded using the first 4 hex chars of the blob hash
* `db.zst.age` is a binary SQLite database (zstd compressed, age encrypted)
containing all file metadata, chunk mappings, and relationships for the snapshot
* `manifest.json.zst` is an unencrypted compressed JSON blob list, enabling
pruning without the private key
### blob manifest format
The `manifest.json.zst` file is unencrypted (compressed JSON) to enable pruning without decryption:
```json
{
"snapshot_id": "hostname_snapshotname_2025-01-01T12:00:00Z",
"blob_hashes": [
"aa1234567890abcdef...",
"bb2345678901bcdef0..."
]
}
```
Snapshot IDs follow the format `<hostname>_<snapshot-name>_<timestamp>` (e.g., `server1_home_2025-01-01T12:00:00Z`).
### local sqlite schema
```sql
CREATE TABLE files (
id TEXT PRIMARY KEY,
path TEXT NOT NULL UNIQUE,
mtime INTEGER NOT NULL,
size INTEGER NOT NULL,
mode INTEGER NOT NULL,
uid INTEGER NOT NULL,
gid INTEGER NOT NULL
);
CREATE TABLE file_chunks (
file_id TEXT NOT NULL,
idx INTEGER NOT NULL,
chunk_hash TEXT NOT NULL,
PRIMARY KEY (file_id, idx),
FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
);
CREATE TABLE chunks (
chunk_hash TEXT PRIMARY KEY,
size INTEGER NOT NULL
);
CREATE TABLE blobs (
id TEXT PRIMARY KEY,
blob_hash TEXT NOT NULL UNIQUE,
uncompressed INTEGER NOT NULL,
compressed INTEGER NOT NULL,
uploaded_at INTEGER
);
CREATE TABLE blob_chunks (
blob_hash TEXT NOT NULL,
chunk_hash TEXT NOT NULL,
offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (blob_hash, chunk_hash)
);
CREATE TABLE chunk_files (
chunk_hash TEXT NOT NULL,
file_id TEXT NOT NULL,
file_offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (chunk_hash, file_id)
);
CREATE TABLE snapshots (
id TEXT PRIMARY KEY,
hostname TEXT NOT NULL,
vaultik_version TEXT NOT NULL,
started_at INTEGER NOT NULL,
completed_at INTEGER,
file_count INTEGER NOT NULL,
chunk_count INTEGER NOT NULL,
blob_count INTEGER NOT NULL,
total_size INTEGER NOT NULL,
blob_size INTEGER NOT NULL,
compression_ratio REAL NOT NULL
);
CREATE TABLE snapshot_files (
snapshot_id TEXT NOT NULL,
file_id TEXT NOT NULL,
PRIMARY KEY (snapshot_id, file_id)
);
CREATE TABLE snapshot_blobs (
snapshot_id TEXT NOT NULL,
blob_id TEXT NOT NULL,
blob_hash TEXT NOT NULL,
PRIMARY KEY (snapshot_id, blob_id)
);
```
Snapshot IDs follow the format `<hostname>_<snapshot-name>_<RFC3339-timestamp>`
(e.g. `server1_home_2025-06-01T12:00:00Z`).
### data flow
#### backup
**backup:**
1. Load config, open local SQLite index
1. Walk source directories, check mtime/size against index
1. For changed/new files: chunk using content-defined chunking
1. For each chunk: hash, check if already uploaded, add to blob packer
1. When blob reaches threshold: compress, encrypt, upload to S3
1. Build snapshot metadata, compress, encrypt, upload
1. Create blob manifest (unencrypted) for pruning support
1. Open local SQLite index, load known files and chunks into memory
2. Walk source directories, compare mtime/size/mode against index
3. For changed/new files: chunk using content-defined chunking (FastCDC)
4. For symlinks and directories: record metadata (no chunking)
5. For each chunk: hash, check dedup, add to blob packer
6. When blob reaches size threshold: compress (zstd), encrypt (age), upload
7. Build snapshot metadata database, compress, encrypt, upload
8. Create unencrypted blob manifest for pruning support
#### restore
**restore:**
1. Download `metadata/<snapshot_id>/db.zst.age`
1. Decrypt and decompress SQLite database
1. Query files table (optionally filtered by paths)
1. For each file, get ordered chunk list from file_chunks
1. Download required blobs, decrypt, decompress
1. Extract chunks and reconstruct files
1. Restore permissions, mtime, uid/gid
1. Download and decrypt `metadata/<snapshot_id>/db.zst.age`
2. Open the binary SQLite database
3. Query files (optionally filtered by paths)
4. Download and decrypt required blobs
5. Extract chunks, reconstruct files
6. Restore permissions, timestamps, ownership, symlinks
#### prune
**prune:**
1. List all snapshot manifests
1. Build set of all referenced blob hashes
1. List all blobs in storage
1. Delete any blob not in referenced set
2. Build set of all referenced blob hashes
3. List all blobs in storage
4. Delete any blob not in the referenced set
### chunking
### chunking and deduplication
* Content-defined chunking using FastCDC algorithm
* Content-defined chunking using the FastCDC algorithm
* Average chunk size: configurable (default 10MB)
* Deduplication at chunk level
* Multiple chunks packed into blobs for efficiency
* Deduplication at file level (unchanged files skipped) and chunk level
(identical chunks across files stored once)
* Multiple chunks packed into blobs to reduce object count
### encryption
* Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
* Only public key needed on source host
* Each blob encrypted independently
* Metadata databases also encrypted
* Only the public key is needed on the source host
* Each blob and each metadata database is encrypted independently
* Multiple recipients supported (encrypt to multiple keys)
### compression
* zstd compression at configurable level
* Applied before encryption
* Blob-level compression for efficiency
* zstd compression at configurable level (1-19, default 3)
* Applied before encryption at the blob level
---
## does not
## configuration reference
* Store any secrets on the backed-up machine
* Require mutable remote metadata
* Use tarballs, restic, rsync, or ssh
* Require a symmetric passphrase or password
* Trust the source system with anything
Run `vaultik config init` to generate a fully commented config file.
Key fields:
## does
* Incremental deduplicated backup
* Blob-packed chunk encryption
* Content-addressed immutable blobs
* Public-key encryption only
* SQLite-based local and snapshot metadata
* Fully stream-processed storage
| Field | Default | Description |
|-------|---------|-------------|
| `age_recipients` | (required) | Age public keys for encryption |
| `snapshots` | (required) | Named snapshot definitions with paths and excludes |
| `storage_url` | | Storage backend URL (`s3://`, `file://`, `rclone://`) |
| `s3.*` | | Legacy S3 configuration (endpoint, bucket, credentials) |
| `exclude` | | Global exclude patterns (applied to all snapshots) |
| `chunk_size` | `10MB` | Average chunk size for content-defined chunking |
| `blob_size_limit` | `10GB` | Maximum blob size before splitting |
| `compression_level` | `3` | zstd compression level (1-19) |
| `hostname` | system hostname | Hostname used in snapshot IDs |
| `index_path` | platform data dir | Local SQLite index path |
---
## limitations
* **No extended attributes (xattrs).** ACLs, macOS Finder metadata,
quarantine flags, SELinux labels, and other extended attributes are not
backed up or restored.
* **No hard link detection.** Two hard links to the same inode are backed
up as independent files. Content deduplication means the data is stored
once, but the hard link relationship is lost on restore.
* **No sparse file support.** Sparse files are fully materialized during
backup. A 100 GB sparse VM disk that is mostly zeros will consume the
full (compressed) size in storage.
* **No bandwidth limiting.** Uploads and downloads use whatever bandwidth
is available. There is no `--bwlimit` flag yet.
* **No parallel blob downloads during restore.** Blobs are fetched
sequentially. Restore speed is bound by single-stream throughput.
* **Device nodes, named pipes, and sockets are silently skipped.** Only
regular files, directories, and symlinks are backed up.
* **No database migrations.** If the local SQLite schema changes between
versions, delete the local database (`vaultik database purge`) and run
a full backup. Remote storage is unaffected.
* **Files that change during backup may be inconsistent.** There is no
filesystem snapshot or freeze. If a file is modified between the scan
and chunk phases, the backed-up copy may reflect a partial write.
* **Ownership restoration requires root.** File uid/gid are recorded
and restored, but `chown` requires elevated privileges. Without root,
files are restored with the current user's ownership.
---
## roadmap
Items still to do before / shortly after 1.0. Loosely ordered by
priority.
### correctness and operability
* **Security audit of the encryption implementation.** Pre-1.0
blocker if we're advertising "secure" at the top of this README.
age + zstd + content-defined chunking is mostly off-the-shelf
pieces, but the seams (key handling, recipient parsing, manifest
trust boundary, restore-time identity validation) need an outside
read.
* **Error-condition tests.** Today's coverage is the happy path
plus a few specific regressions. Need fault-injection coverage:
network failures mid-blob, disk-full during restore, corrupted /
truncated / missing blobs, partial uploads, kill -9 between
manifest and db.zst.age writes.
* **Verify restored content end-to-end in CI.** The current
integration test does this for a small synthetic snapshot but
not at scale. A nightly job against a multi-GB representative
snapshot would catch silent regressions in the chunker, packer,
or restore planner.
### performance
* **Parallel blob downloads during restore.** Single-stream right
now. With a fast S3 endpoint and a multi-core machine restore is
bound by per-blob fetch + decrypt + decompress; running N of
those in parallel against the disk cache would close most of the
remaining gap. Needs to interact correctly with the locality
planner and sweeper.
* **Bandwidth limiting (`--bwlimit`).** Both upload and download.
Useful for backing up over a shared link. Tricky to make work
correctly with the parallel-download story.
* **Restart of interrupted restore.** Today restore is restartable
in the sense that re-running it overwrites partial output; it
doesn't resume from where it stopped or skip already-present
files. A `--resume` mode that checks targets before fetching
blobs would matter for very large restores.
### usability
* **Man pages and richer `--help` examples.** Cobra generates
basic help; man pages would be a separate target.
* **`--bwlimit` style human-readable size flags** across the
command surface where they're currently raw integers.
* **`vaultik snapshot diff <a> <b>`** — show which files changed
between two snapshots without restoring either.
* **Status reporting hook for `--cron`.** When a backup fails
silently in cron, the user has no idea. A configurable
webhook / email / `notify-send` hook on completion (success and
failure) would close the loop.
### infrastructure
* **Cross-machine restore documentation.** The "restore from
another host" workflow works but isn't documented as a
first-class operation in this README. Worth a dedicated section
once it's settled.
* **Schema migrations.** Currently nonexistent — pre-1.0 schema
changes are handled by `vaultik database purge` plus a full
re-scan. Post-1.0 we'll need a migration story to keep existing
index databases usable across upgrades.
* **Storage backend coverage tests.** S3, file://, and rclone://
all share the Storer interface but the rclone path is the least
exercised in CI.
---
## output style
All user-facing output goes through helpers in `internal/ui` and conforms
to a uniform style. Color is enabled when stdout is a TTY and the
`NO_COLOR` environment variable is unset (https://no-color.org/).
Message classes:
| Class | Marker | Alignment | Use for |
|-------|--------|-----------|---------|
| Banner | none | column 0 | The startup line printed once per invocation |
| Begin | `》` (white) | column 0 | An operation is about to start (present-continuous verb) |
| Complete | `》` (green) | column 0 | An operation just finished (past-tense verb) |
| Info | `》` (white) | column 0 | Neutral status update |
| Notice | `》` (cyan) | column 0 | Important note that is not a warning |
| Warning | `⚠️ Warning:` (orange/yellow) | column 0 | Recoverable problem |
| Error | `🛑 ERROR:` (red) | column 0 | Operation aborted |
| Progress | ` 》` (white) | column 2 | Heartbeat or per-item status during a long-running operation |
| Detail | ` 》` (white) | column 2 | Continuation/sub-line of a preceding Complete (visually identical to Progress) |
Conventions:
* Messages are complete English sentences ending with a period.
* Fully qualify terms — say "backup destination store" instead of
"storage", "snapshot source files enumeration" instead of "scan",
"local index database" instead of "database".
* Every operation that emits a Complete also emits a corresponding
Begin. Operations that print only a Begin (because completion is
obvious from a later Begin) should be rare and intentional.
* Use natural verb tense to signal state: "Uploading" for Begin,
"Uploaded" for Complete. Never write the words "begin" or "complete"
in the body — the marker color already conveys that.
* All elapsed and remaining-time fields are explicitly scoped to their
subject: write "blob upload elapsed: 30s, blob upload ETA: 03:15:00
(est remain 14s)", never just "elapsed 30s, ETA 14s".
* "ETA" means an absolute clock time (when the operation will finish),
not a remaining-duration. Use `ui.Time()` for the former and
`ui.Duration()` for the latter, and label both.
* `ui.Time` formats same-day times as `HH:MM:SS` and other-day times as
`YYYY-MM-DD HH:MM:SS`. No timezone — local time is implied.
Value colorizers in `internal/ui` colorize specific value types
consistently. Compose messages from these helpers rather than embedding
ANSI escapes inline:
| Helper | Color | Use for |
|--------|-------|---------|
| `Hex` | cyan | Blob hashes, chunk hashes (truncated to 12 chars + `...`) |
| `Snapshot` | bold cyan | Snapshot IDs (untruncated) |
| `Path` | blue | Filesystem paths |
| `Size` | magenta | Byte counts (human-readable) |
| `Speed` | magenta | Bytes-per-second rates |
| `Duration` | yellow | Elapsed or remaining time |
| `Time` | yellow | Absolute clock times |
| `Count` | magenta | Integer counts with thousands separators |
| `Percent` | magenta | Percentages |
When `NO_COLOR` is set or output is not a TTY, all helpers return plain
text and the marker prefixes (`》`, `Warning:`, `ERROR:`) emit without
ANSI escapes. The emoji prefixes on Warning and Error are always emitted
regardless of color setting (emoji are not color).
## requirements
* Go 1.24 or later
* S3-compatible object storage
* Sufficient disk space for local index (typically <1GB)
* Go 1.26 or later
* S3-compatible object storage (or local filesystem, or rclone remote)
## development workflow
All changes follow this workflow. No exceptions.
1. Create a feature branch off `main`.
2. Write tests.
3. Write the implementation.
4. Fix implementation errors until it compiles and tests pass.
5. Fix linting errors (`make lint`).
6. Update documentation and README as required by the change.
7. Format code (`make fmt`).
8. Run `make check` (lint + fmt-check + test). Fix any issues. Repeat until clean.
9. Commit on the branch.
10. Merge to `main`.
11. Push.
Do not commit directly to `main`. Do not skip steps.
Repository policies for AI agents are in [`AGENTS.md`](AGENTS.md).
## license

128
TODO.md
View File

@@ -1,128 +0,0 @@
# Vaultik 1.0 TODO
Linear list of tasks to complete before 1.0 release.
## Rclone Storage Backend (Complete)
Add rclone as a storage backend via Go library import, allowing vaultik to use any of rclone's 70+ supported cloud storage providers.
**Configuration:**
```yaml
storage_url: "rclone://myremote/path/to/backups"
```
User must have rclone configured separately (via `rclone config`).
**Implementation Steps:**
1. [x] Add rclone dependency to go.mod
2. [x] Create `internal/storage/rclone.go` implementing `Storer` interface
- `NewRcloneStorer(remote, path)` - init with `configfile.Install()` and `fs.NewFs()`
- `Put` / `PutWithProgress` - use `operations.Rcat()`
- `Get` - use `fs.NewObject()` then `obj.Open()`
- `Stat` - use `fs.NewObject()` for size/metadata
- `Delete` - use `obj.Remove()`
- `List` / `ListStream` - use `operations.ListFn()`
- `Info` - return remote name
3. [x] Update `internal/storage/url.go` - parse `rclone://remote/path` URLs
4. [x] Update `internal/storage/module.go` - add rclone case to `storerFromURL()`
5. [x] Test with real rclone remote
**Error Mapping:**
- `fs.ErrorObjectNotFound``ErrNotFound`
- `fs.ErrorDirNotFound``ErrNotFound`
- `fs.ErrorNotFoundInConfigFile``ErrRemoteNotFound` (new)
---
## CLI Polish (Priority)
1. Improve error messages throughout
- Ensure all errors include actionable context
- Add suggestions for common issues (e.g., "did you set VAULTIK_AGE_SECRET_KEY?")
## Security (Priority)
1. Audit encryption implementation
- Verify age encryption is used correctly
- Ensure no plaintext leaks in logs or errors
- Verify blob hashes are computed correctly
1. Secure memory handling for secrets
- Clear S3 credentials from memory after client init
- Document that age_secret_key is env-var only (already implemented)
## Testing
1. Write integration tests for restore command
1. Write end-to-end integration test
- Create backup
- Verify backup
- Restore backup
- Compare restored files to originals
1. Add tests for edge cases
- Empty directories
- Symlinks
- Special characters in filenames
- Very large files (multi-GB)
- Many small files (100k+)
1. Add tests for error conditions
- Network failures during upload
- Disk full during restore
- Corrupted blobs
- Missing blobs
## Performance
1. Profile and optimize restore performance
- Parallel blob downloads
- Streaming decompression/decryption
- Efficient chunk reassembly
1. Add bandwidth limiting option
- `--bwlimit` flag for upload/download speed limiting
## Documentation
1. Add man page or --help improvements
- Detailed help for each command
- Examples in help output
## Final Polish
1. Ensure version is set correctly in releases
1. Create release process
- Binary releases for supported platforms
- Checksums for binaries
- Release notes template
1. Final code review
- Remove debug statements
- Ensure consistent code style
1. Tag and release v1.0.0
---
## Post-1.0 (Daemon Mode)
1. Implement inotify file watcher for Linux
- Watch source directories for changes
- Track dirty paths in memory
1. Implement FSEvents watcher for macOS
- Watch source directories for changes
- Track dirty paths in memory
1. Implement backup scheduler in daemon mode
- Respect backup_interval config
- Trigger backup when dirty paths exist and interval elapsed
- Implement full_scan_interval for periodic full scans
1. Add proper signal handling for daemon
- Graceful shutdown on SIGTERM/SIGINT
- Complete in-progress backup before exit
1. Write tests for daemon mode

View File

@@ -5,7 +5,7 @@ import (
"runtime"
"runtime/pprof"
"git.eeqj.de/sneak/vaultik/internal/cli"
"sneak.berlin/go/vaultik/internal/cli"
)
func main() {

View File

@@ -291,21 +291,6 @@ storage_url: "rclone://las1stor1//srv/pool.2024.04/backups/heraklion"
# # Default: 5MB
# #part_size: 5MB
# How often to run backups in daemon mode
# Format: 1h, 30m, 24h, etc
# Default: 1h
#backup_interval: 1h
# How often to do a full filesystem scan in daemon mode
# Between full scans, inotify is used to detect changes
# Default: 24h
#full_scan_interval: 24h
# Minimum time between backup runs in daemon mode
# Prevents backups from running too frequently
# Default: 15m
#min_time_between_run: 15m
# Path to local SQLite index database
# This database tracks file state for incremental backups
# Default: /var/lib/vaultik/index.sqlite

View File

@@ -5,8 +5,14 @@
Vaultik uses a local SQLite database to track file metadata, chunk mappings, and blob associations during the backup process. This database serves as an index for incremental backups and enables efficient deduplication.
**Important Notes:**
- **No Migration Support**: Vaultik does not support database schema migrations. If the schema changes, the local database must be deleted and recreated by performing a full backup.
- **Version Compatibility**: In rare cases, you may need to use the same version of Vaultik to restore a backup as was used to create it. This ensures compatibility with the metadata format stored in S3.
- **No Migration Support (pre-1.0)**: Vaultik does not support database schema
migrations. The local index is treated as disposable — if the schema changes,
delete the local SQLite database (`vaultik database purge`) and run a full
backup. The remote storage is unaffected; the new index will re-deduplicate
against existing remote blobs.
- **Version Compatibility**: In rare cases, you may need to use the same version
of Vaultik to restore a backup as was used to create it. This ensures
compatibility with the metadata format stored in S3.
## Database Tables

View File

@@ -43,18 +43,19 @@ Blobs contain the actual file data from backups and must be encrypted for securi
Each snapshot has its own subdirectory named with the snapshot ID.
### Snapshot ID Format
- **Format**: `<hostname>-<YYYYMMDD>-<HHMMSSZ>`
- **Example**: `laptop-20240115-143052Z`
- **Format**: `<hostname>_<snapshot-name>_<RFC3339>` (or `<hostname>_<RFC3339>` if no
name was specified)
- **Example**: `laptop_home_2024-01-15T14:30:52Z`
- **Components**:
- Hostname (may contain hyphens)
- Date in YYYYMMDD format
- Time in HHMMSSZ format (Z indicates UTC)
- Short hostname (everything before the first dot is stripped from the FQDN)
- Snapshot name from the configured `snapshots:` map (optional)
- RFC3339 UTC timestamp
### Files in Each Snapshot Directory
#### `db.zst.age` - Encrypted Database Dump
- **What it contains**: Complete SQLite database dump for this snapshot
- **Format**: SQL dump → Zstandard compressed → Age encrypted
#### `db.zst.age` - Encrypted Database
- **What it contains**: Pruned binary SQLite database for this snapshot
- **Format**: Binary SQLite → Zstandard compressed → Age encrypted
- **Encryption**: Encrypted with Age
- **Purpose**: Contains full file metadata, chunk mappings, and all relationships
- **Why encrypted**: Contains sensitive metadata like file paths, permissions, and ownership
@@ -67,7 +68,7 @@ Each snapshot has its own subdirectory named with the snapshot ID.
- **Structure**:
```json
{
"snapshot_id": "laptop-20240115-143052Z",
"snapshot_id": "laptop_home_2024-01-15T14:30:52Z",
"timestamp": "2024-01-15T14:30:52Z",
"blob_count": 42,
"blobs": [

6
go.mod
View File

@@ -1,4 +1,4 @@
module git.eeqj.de/sneak/vaultik
module sneak.berlin/go/vaultik
go 1.26.1
@@ -17,9 +17,7 @@ require (
github.com/google/uuid v1.6.0
github.com/johannesboyne/gofakes3 v0.0.0-20250603205740-ed9094be7668
github.com/klauspost/compress v1.18.1
github.com/mattn/go-sqlite3 v1.14.29
github.com/rclone/rclone v1.72.1
github.com/schollz/progressbar/v3 v3.19.0
github.com/spf13/afero v1.15.0
github.com/spf13/cobra v1.10.1
github.com/stretchr/testify v1.11.1
@@ -187,7 +185,6 @@ require (
github.com/mattn/go-colorable v0.1.14 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/mattn/go-runewidth v0.0.19 // indirect
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db // indirect
github.com/mitchellh/go-homedir v1.1.0 // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
@@ -218,7 +215,6 @@ require (
github.com/relvacode/iso8601 v1.7.0 // indirect
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
github.com/rfjakob/eme v1.1.2 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/ryanuber/go-glob v1.0.0 // indirect
github.com/ryszard/goskiplist v0.0.0-20150312221310-2dfbae5fcf46 // indirect
github.com/sabhiram/go-gitignore v0.0.0-20210923224102-525f6e181f06 // indirect

10
go.sum
View File

@@ -202,8 +202,6 @@ github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UF
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/cevatbarisyilmaz/ara v0.0.4 h1:SGH10hXpBJhhTlObuZzTuFn1rrdmjQImITXnZVPSodc=
github.com/cevatbarisyilmaz/ara v0.0.4/go.mod h1:BfFOxnUd6Mj6xmcvRxHN3Sr21Z1T3U2MYkYOmoQe4Ts=
github.com/chengxilo/virtualterm v1.0.4 h1:Z6IpERbRVlfB8WkOmtbHiDbBANU7cimRIof7mk9/PwM=
github.com/chengxilo/virtualterm v1.0.4/go.mod h1:DyxxBZz/x1iqJjFxTFcr6/x+jSpqN0iwWCOK1q10rlY=
github.com/chilts/sid v0.0.0-20190607042430-660e94789ec9 h1:z0uK8UQqjMVYzvk4tiiu3obv2B44+XBsvgEJREQfnO8=
github.com/chilts/sid v0.0.0-20190607042430-660e94789ec9/go.mod h1:Jl2neWsQaDanWORdqZ4emBl50J4/aRBBS4FyyG9/PFo=
github.com/chzyer/logex v1.1.10/go.mod h1:+Ywpsq7O8HXn0nuIou7OrIPyXbp3wmkHB+jjWRnGsAI=
@@ -593,16 +591,12 @@ github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D
github.com/mattn/go-runewidth v0.0.3/go.mod h1:LwmH8dsx7+W8Uxz3IHJYH5QSwggIsqBzpuz5H//U1FU=
github.com/mattn/go-runewidth v0.0.19 h1:v++JhqYnZuu5jSKrk9RbgF5v4CGUjqRfBm05byFGLdw=
github.com/mattn/go-runewidth v0.0.19/go.mod h1:XBkDxAl56ILZc9knddidhrOlY5R/pDhgLpndooCuJAs=
github.com/mattn/go-sqlite3 v1.14.29 h1:1O6nRLJKvsi1H2Sj0Hzdfojwt8GiGKm+LOfLaBFaouQ=
github.com/mattn/go-sqlite3 v1.14.29/go.mod h1:Uh1q+B4BYcTPb+yiD3kU8Ct7aC0hY9fxUwlHK0RXw+Y=
github.com/matttproud/golang_protobuf_extensions v1.0.1/go.mod h1:D8He9yQNgCq6Z5Ld7szi9bcBfOoFv/3dc6xSMkL2PC0=
github.com/miekg/dns v1.1.26/go.mod h1:bPDLeHnStXmXAq1m/Ch/hvfNHr14JKNPMBo3VZKjuso=
github.com/miekg/dns v1.1.41 h1:WMszZWJG0XmzbK9FEmzH2TVcqYzFesusSIB41b8KHxY=
github.com/miekg/dns v1.1.41/go.mod h1:p6aan82bvRIyn+zDIv9xYNUpwa73JcSh9BKwknJysuI=
github.com/mitchellh/cli v1.0.0/go.mod h1:hNIlj7HEI86fIcpObd7a0FcrxTWetlwJDGcceTlRvqc=
github.com/mitchellh/cli v1.1.0/go.mod h1:xcISNoH86gajksDmfB23e/pu+B+GeFRMYmoHXxx3xhI=
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db h1:62I3jR2EmQ4l5rM/4FEfDWcRD+abF5XlKShorW5LRoQ=
github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db/go.mod h1:l0dey0ia/Uv7NcFFVbCLtqEBQbrT4OCwCSKTEv6enCw=
github.com/mitchellh/go-homedir v1.1.0 h1:lukF9ziXFxDFPkA1vsr5zpc1XuPDn/wFntq5mG+4E0Y=
github.com/mitchellh/go-homedir v1.1.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0=
github.com/mitchellh/go-wordwrap v1.0.0/go.mod h1:ZXFpozHsX6DPmq2I0TCekCxypsnAUbP2oI0UX1GXzOo=
@@ -707,8 +701,6 @@ github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo=
github.com/rfjakob/eme v1.1.2 h1:SxziR8msSOElPayZNFfQw4Tjx/Sbaeeh3eRvrHVMUs4=
github.com/rfjakob/eme v1.1.2/go.mod h1:cVvpasglm/G3ngEfcfT/Wt0GwhkuO32pf/poW6Nyk1k=
github.com/rivo/uniseg v0.4.7 h1:WUdvkW8uEhrYfLC4ZzdpI2ztxP1I582+49Oc5Mq64VQ=
github.com/rivo/uniseg v0.4.7/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88=
github.com/rogpeppe/go-internal v1.3.0/go.mod h1:M8bDsm7K2OlrFYOpmOWEs/qY81heoFRclV5y23lUDJ4=
github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ=
github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7so1lCWt35ZSgc=
@@ -723,8 +715,6 @@ github.com/sabhiram/go-gitignore v0.0.0-20210923224102-525f6e181f06 h1:OkMGxebDj
github.com/sabhiram/go-gitignore v0.0.0-20210923224102-525f6e181f06/go.mod h1:+ePHsJ1keEjQtpvf9HHw0f4ZeJ0TLRsxhunSI2hYJSs=
github.com/samber/lo v1.52.0 h1:Rvi+3BFHES3A8meP33VPAxiBZX/Aws5RxrschYGjomw=
github.com/samber/lo v1.52.0/go.mod h1:4+MXEGsJzbKGaUEQFKBq2xtfuznW9oz/WrgyzMzRoM0=
github.com/schollz/progressbar/v3 v3.19.0 h1:Ea18xuIRQXLAUidVDox3AbwfUhD0/1IvohyTutOIFoc=
github.com/schollz/progressbar/v3 v3.19.0/go.mod h1:IsO3lpbaGuzh8zIMzgY3+J8l4C8GjO0Y9S69eFvNsec=
github.com/sean-/seed v0.0.0-20170313163322-e2103e2c3529 h1:nn5Wsu0esKSJiIVhscUtVbo7ada43DJhG55ua/hjS5I=
github.com/sean-/seed v0.0.0-20170313163322-e2103e2c3529/go.mod h1:DxrIzT+xaE7yg65j358z/aeFdxmN0P9QXhEzd20vsDc=
github.com/sergi/go-diff v1.0.0/go.mod h1:0CfEIISq7TuYL3j771MWULgwwjU+GofnZX9QAmXWZgo=

View File

@@ -23,12 +23,12 @@ import (
"sync"
"time"
"git.eeqj.de/sneak/vaultik/internal/blobgen"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/google/uuid"
"github.com/spf13/afero"
"sneak.berlin/go/vaultik/internal/blobgen"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/types"
)
// BlobHandler is a callback function invoked when a blob is finalized and ready for upload.

View File

@@ -10,11 +10,11 @@ import (
"testing"
"filippo.io/age"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/klauspost/compress/zstd"
"github.com/spf13/afero"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/types"
)
const (

View File

@@ -7,19 +7,21 @@ import (
"os"
"os/signal"
"path/filepath"
"strings"
"syscall"
"time"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/pidlock"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/adrg/xdg"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/config"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/globals"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/pidlock"
"sneak.berlin/go/vaultik/internal/snapshot"
"sneak.berlin/go/vaultik/internal/storage"
"sneak.berlin/go/vaultik/internal/ui"
"sneak.berlin/go/vaultik/internal/vaultik"
)
// AppOptions contains common options for creating the fx application.
@@ -32,16 +34,36 @@ type AppOptions struct {
Invokes []fx.Option
}
// setupGlobals sets up the globals with application startup time
func setupGlobals(lc fx.Lifecycle, g *globals.Globals) {
// setupGlobals records the startup time and, when an output-suppression
// flag is active, marks the UI writer quiet so that Begin/Complete/
// Info/Notice/Detail/Progress are silenced. Warning and Error are NOT
// silenced — per the documented convention that --quiet suppresses
// non-error output only. The startup banner is printed by CLIEntry
// before cobra parses arguments, gated by the same arg-level check.
func setupGlobals(lc fx.Lifecycle, g *globals.Globals, v *vaultik.Vaultik, opts log.LogOptions) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
g.StartTime = time.Now().UTC()
if opts.Cron || opts.Quiet {
v.UI.SetQuiet(true)
}
return nil
},
})
}
// writeStartupBanner prints the two-line application banner followed by a
// blank line. Used both from the fx hook (for subcommand invocations) and
// from the root cobra Run handler (for `vaultik` with no subcommand).
func writeStartupBanner(w *ui.Writer, startTime time.Time, shortCommit string) {
w.Banner("%s %s by %s (commit %s, built on %s) starting up at %s.",
globals.Appname, globals.Version, globals.Author,
shortCommit, globals.CommitDate,
startTime.Format(time.RFC3339))
w.Banner("%s", globals.Homepage)
w.Banner("")
}
// NewApp creates a new fx application with common modules.
// It sets up the base modules (config, database, logging, globals) and
// combines them with any additional modules specified in the options.
@@ -68,6 +90,24 @@ func NewApp(opts AppOptions) *fx.App {
return fx.New(allOptions...)
}
// cleanStartupError strips fx's dependency-injection call-chain noise from
// startup errors. fx wraps the underlying error with messages like
//
// could not build arguments for function "X" (file:line): failed to build T:
// could not build arguments for function "Y" (file:line): failed to build U:
// received non-nil error from function "Z" (file:line): <real error>
//
// Users care about the real error, not the DI plumbing. We strip everything
// up through the last "): " (which is always the close-paren of an fx
// function-location annotation followed by the wrapped error).
func cleanStartupError(err error) error {
msg := err.Error()
if idx := strings.LastIndex(msg, "): "); idx >= 0 {
msg = msg[idx+3:]
}
return errors.New(msg)
}
// RunApp starts and stops the fx application within the given context.
// It handles graceful shutdown on interrupt signals (SIGINT, SIGTERM) and
// ensures the application stops cleanly. The function blocks until the
@@ -83,7 +123,7 @@ func RunApp(ctx context.Context, app *fx.App) error {
// Start the app
if err := app.Start(ctx); err != nil {
return fmt.Errorf("failed to start app: %w", err)
return cleanStartupError(err)
}
// Handle shutdown
@@ -125,7 +165,7 @@ func RunApp(ctx context.Context, app *fx.App) error {
// It acquires a PID lock before starting to prevent concurrent instances.
func RunWithApp(ctx context.Context, opts AppOptions) error {
// Acquire PID lock to prevent concurrent instances
lockDir := filepath.Join(xdg.DataHome, "berlin.sneak.app.vaultik")
lockDir := filepath.Join(xdg.DataHome, "vaultik")
lock, err := pidlock.Acquire(lockDir)
if err != nil {
if errors.Is(err, pidlock.ErrAlreadyRunning) {

39
internal/cli/app_test.go Normal file
View File

@@ -0,0 +1,39 @@
package cli
import (
"errors"
"testing"
)
func TestCleanStartupError(t *testing.T) {
tests := []struct {
name string
in string
want string
}{
{
name: "real fx error chain",
in: `could not build arguments for function "sneak.berlin/go/vaultik/internal/cli".newSnapshotCreateCommand.func1.1 (/Users/user/dev/vaultik/internal/cli/snapshot.go:71): failed to build *vaultik.Vaultik: could not build arguments for function "sneak.berlin/go/vaultik/internal/vaultik".New (/Users/user/dev/vaultik/internal/vaultik/vaultik.go:59): failed to build storage.Storer: received non-nil error from function "sneak.berlin/go/vaultik/internal/storage".NewStorer (/Users/user/dev/vaultik/internal/storage/module.go:23): creating base path: mkdir /Volumes/BACKUPS: permission denied`,
want: `creating base path: mkdir /Volumes/BACKUPS: permission denied`,
},
{
name: "no fx wrapping",
in: "plain error",
want: "plain error",
},
{
name: "single fx wrapping",
in: `received non-nil error from function "foo" (file.go:1): underlying problem`,
want: "underlying problem",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := cleanStartupError(errors.New(tt.in)).Error()
if got != tt.want {
t.Errorf("got %q, want %q", got, tt.want)
}
})
}
}

523
internal/cli/config.go Normal file
View File

@@ -0,0 +1,523 @@
package cli
import (
"fmt"
"os"
"os/exec"
"path/filepath"
"strconv"
"strings"
"github.com/spf13/cobra"
"gopkg.in/yaml.v3"
)
const defaultConfigTemplate = `# vaultik configuration
# Documentation: https://sneak.berlin/go/vaultik
# ─── REQUIRED ────────────────────────────────────────────────────────────────
# Age recipient public keys for encryption.
# Backups are encrypted to ALL listed recipients. Any one of the corresponding
# private keys can decrypt. Generate a keypair with:
# age-keygen -o vaultik_backup_private_key.txt
# grep 'public key' vaultik_backup_private_key.txt
age_recipients:
- age1REPLACE_WITH_YOUR_PUBLIC_KEY
# Named snapshots. Each snapshot backs up one or more paths and can have its
# own exclude patterns in addition to the global excludes below.
#
# Exclude pattern semantics:
# - Patterns starting with / are anchored to the snapshot path root
# (e.g. "/Library/Caches" matches only ~/Library/Caches in a ~ snapshot)
# - Patterns without a leading / match anywhere in the tree
# (e.g. ".cache" matches any directory named .cache at any depth)
# - Globs are supported: *, **, ?
snapshots:
home:
paths:
- "~"
exclude:
# Trash, temp, and filesystem metadata
- "/.Trash"
- "/.Trashes"
- "/.fseventsd"
- "/.Spotlight-V100"
- "/.TemporaryItems"
- "/tmp"
- "/.rnd"
- ".DS_Store"
# Caches and package manager state (rebuildable)
- ".cache"
- ".bundle"
- "/.cpan/build"
- "/.cpan/sources"
- "/.gradle/caches"
- "/.dropbox"
- "/.minikube/cache"
- "/.local/share/containers/podman/machine"
- "/.persepolis"
- "/Library/Caches"
- "/Library/Logs"
- "/Library/Cookies"
- "/Library/Metadata"
- "/Library/Suggestions"
- "/Library/PubSub"
- "/Library/Homebrew"
- "/Library/Developer"
- "/Library/Google/GoogleSoftwareUpdate"
- "/Library/Preferences/Macromedia/Flash Player"
- "/Library/Preferences/SDMHelpData"
- "/Library/VoiceTrigger/SAT"
# Language/toolchain package caches (rebuildable from registries)
- "/.npm"
- "/.cargo/registry"
- "/.cargo/git"
- "/.rustup/toolchains"
- "/go/pkg/mod"
- "/.m2/repository"
- "/.vagrant.d/boxes"
- "node_modules"
- "__pycache__"
- ".venv"
# Virtual machine disk images (huge; remove these lines to back them up)
- "/Parallels"
- "/Virtual Machines.localized"
- "/VirtualBox VMs"
- "/.orbstack"
- "/Library/Containers/com.utmapp.UTM"
# Downloaded LLM models (huge, re-downloadable)
- "/.ollama/models"
- "/.lmstudio/models"
# Cloud-synced storage. These are synced to a provider already, and on
# modern macOS may contain dataless placeholder files that the backup
# would force-download in full.
- "/Library/CloudStorage"
- "/Library/Mobile Documents"
# Android SDK and emulator images (re-downloadable)
- "/Library/Android/sdk"
- "/.android/avd"
# Cloud-synced or restorable-from-server data
- "/Library/Mail"
- "/Library/Mail Downloads"
- "/Library/Safari"
- "/Library/Application Support/Evernote"
- "/Library/Application Support/MobileSync"
- "/Library/Application Support/SyncServices"
- "/Library/Application Support/protonmail/bridge/cache"
- "/Library/Application Support/Syncthing/index-*"
- "/Library/Syncthing/folders"
- "/Documents/Dropbox/.dropbox.cache"
# Large rebuildable app data (games, media caches, device backups)
- "/Applications/Fortnite"
- "/Documents/Steam Content"
- "/Library/Application Support/Ableton"
- "/Library/Application Support/CrossOver Games"
- "/Library/Application Support/SecondLife/cache"
- "/Library/Application Support/Steam/SteamApps"
- "/Library/Containers/com.docker.docker"
- "/Library/Group Containers/group.com.apple.secure-control-center-preferences"
- "/Library/iTunes/iPad Software Updates"
- "/Library/iTunes/iPhone Software Updates"
- "/Movies/CacheClip"
- "/Movies/ProxyMedia"
- "/Music/iTunes/Album Artwork"
- "/Pictures/iPod Photo Cache"
# Third-party applications. OS-provided apps live in /System/Applications
# on modern macOS and are never in /Applications, but Apple-installed
# App Store apps (Safari, GarageBand, iWork, iMovie) are excluded since
# they are re-downloadable.
apps:
paths:
- /Applications
exclude:
- ".DS_Store"
- "/Safari.app"
- "/GarageBand.app"
- "/iMovie.app"
- "/Keynote.app"
- "/Numbers.app"
- "/Pages.app"
- "/Xcode.app"
- "/Spotify.app"
- "/Steam.app"
- "/VirtualBox.app"
- "/Utilities/Adobe Installers"
# Storage backend (pick ONE of the three forms below).
#
# S3-compatible:
# storage_url: "s3://mybucket/backups?endpoint=s3.example.com&region=us-east-1"
# (also set s3.access_key_id and s3.secret_access_key below)
#
# Local filesystem:
# storage_url: "file:///mnt/backups/vaultik"
#
# Rclone (requires rclone configured separately):
# storage_url: "rclone://myremote/path/to/backups"
storage_url: ""
# ─── S3 CREDENTIALS (required for s3:// storage_url) ────────────────────────
# s3:
# access_key_id: YOUR_ACCESS_KEY
# secret_access_key: YOUR_SECRET_KEY
# # region: us-east-1 # Default: us-east-1
# # use_ssl: true # Default: true
# # part_size: 5MB # Multipart upload part size. Default: 5MB
# ─── OPTIONAL ────────────────────────────────────────────────────────────────
# Global exclude patterns applied to ALL snapshots.
# Snapshot-specific excludes are additive.
# exclude:
# - "*.log"
# - "*.tmp"
# - ".git"
# - "node_modules"
# Average chunk size for content-defined chunking (FastCDC).
# Smaller = better deduplication but more metadata overhead.
# Accepts: 1MB, 10M, 64KB, etc.
# Default: 10MB
# chunk_size: 10MB
# Maximum blob size before splitting into a new blob.
# Accepts: 1GB, 10G, 500MB, etc.
# Default: 10GB
# blob_size_limit: 10GB
# Zstd compression level (1-19). Higher = better ratio but slower.
# Default: 3
# compression_level: 3
# Hostname used in snapshot IDs. Default: system hostname.
# hostname: myserver
# Path to the local SQLite index database.
# Default: the platform data directory, e.g.
# macOS: ~/Library/Application Support/vaultik/index.sqlite
# Linux: ~/.local/share/vaultik/index.sqlite
# index_path: /path/to/index.sqlite
`
// NewConfigCommand creates the config command group.
func NewConfigCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "config",
Short: "Manage the configuration file",
Long: "Commands for creating, editing, and querying the vaultik config file.",
}
cmd.AddCommand(newConfigInitCommand())
cmd.AddCommand(newConfigEditCommand())
cmd.AddCommand(newConfigGetCommand())
cmd.AddCommand(newConfigSetCommand())
return cmd
}
// newConfigInitCommand creates the 'config init' subcommand.
func newConfigInitCommand() *cobra.Command {
return &cobra.Command{
Use: "init",
Short: "Write a default config file",
Long: `Creates a default configuration file with commented explanations
for every setting. If a config file already exists at the target path,
the command refuses to overwrite it.
The config is written to the path from --config, $VAULTIK_CONFIG, or
the platform default config directory (e.g. ~/Library/Application Support/
on macOS, ~/.config/ on Linux, /etc/vaultik/ as root).`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
path := configPathForInit()
if _, err := os.Stat(path); err == nil {
return fmt.Errorf("config file already exists: %s", path)
}
dir := filepath.Dir(path)
if err := os.MkdirAll(dir, 0o755); err != nil {
return fmt.Errorf("creating config directory %s: %w", dir, err)
}
if err := os.WriteFile(path, []byte(defaultConfigTemplate), 0o600); err != nil {
return fmt.Errorf("writing config file: %w", err)
}
fmt.Printf("Config written to %s\n", path)
fmt.Println("Edit it to set your age_recipients, snapshots, and storage_url.")
return nil
},
}
}
// newConfigEditCommand creates the 'config edit' subcommand.
func newConfigEditCommand() *cobra.Command {
return &cobra.Command{
Use: "edit",
Short: "Open the config file in $EDITOR",
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
path, err := ResolveConfigPath()
if err != nil {
return err
}
editor := os.Getenv("EDITOR")
if editor == "" {
editor = "vi"
}
ed := exec.Command(editor, path)
ed.Stdin = os.Stdin
ed.Stdout = os.Stdout
ed.Stderr = os.Stderr
return ed.Run()
},
}
}
// newConfigGetCommand creates the 'config get' subcommand.
func newConfigGetCommand() *cobra.Command {
return &cobra.Command{
Use: "get <key>",
Short: "Print a config value by dotted path (e.g. storage_url, compression_level)",
Args: cobra.ExactArgs(1),
RunE: func(cmd *cobra.Command, args []string) error {
path, err := ResolveConfigPath()
if err != nil {
return err
}
root, err := loadYAMLFile(path)
if err != nil {
return err
}
node, err := yamlPathGet(root, strings.Split(args[0], "."))
if err != nil {
return err
}
if node.Kind == yaml.ScalarNode {
fmt.Println(node.Value)
return nil
}
out, err := yaml.Marshal(node)
if err != nil {
return fmt.Errorf("marshaling value: %w", err)
}
fmt.Print(string(out))
return nil
},
}
}
// newConfigSetCommand creates the 'config set' subcommand.
func newConfigSetCommand() *cobra.Command {
return &cobra.Command{
Use: "set <key> <value>",
Short: "Set a config value by dotted path (e.g. compression_level 5)",
Long: `Sets a scalar config value addressed by dotted YAML path and writes
the file back, preserving comments and formatting. Intermediate maps
are created as needed.
Examples:
vaultik config set storage_url "file:///mnt/backups"
vaultik config set storage_url "s3://bucket/prefix?endpoint=host&region=us-east-1"
vaultik config set compression_level 9
vaultik config set s3.bucket mybucket # legacy S3 fields still supported`,
Args: cobra.ExactArgs(2),
RunE: func(cmd *cobra.Command, args []string) error {
path, err := ResolveConfigPath()
if err != nil {
return err
}
root, err := loadYAMLFile(path)
if err != nil {
return err
}
if err := yamlPathSet(root, strings.Split(args[0], "."), args[1]); err != nil {
return err
}
out, err := yaml.Marshal(root)
if err != nil {
return fmt.Errorf("marshaling config: %w", err)
}
mode := os.FileMode(0o600)
if info, err := os.Stat(path); err == nil {
mode = info.Mode().Perm()
}
if err := os.WriteFile(path, out, mode); err != nil {
return fmt.Errorf("writing config file: %w", err)
}
fmt.Printf("%s = %s\n", args[0], args[1])
return nil
},
}
}
// loadYAMLFile parses a YAML file into a yaml.Node document tree,
// which preserves comments and ordering for round-tripping.
func loadYAMLFile(path string) (*yaml.Node, error) {
data, err := os.ReadFile(path)
if err != nil {
return nil, fmt.Errorf("reading config file: %w", err)
}
var root yaml.Node
if err := yaml.Unmarshal(data, &root); err != nil {
return nil, fmt.Errorf("parsing config file: %w", err)
}
// An empty file yields a zero node; normalize to an empty mapping document.
if root.Kind == 0 {
root = yaml.Node{
Kind: yaml.DocumentNode,
Content: []*yaml.Node{{Kind: yaml.MappingNode}},
}
}
return &root, nil
}
// yamlPathGet navigates a dotted key path through mapping and sequence
// nodes and returns the value node. Numeric path components index into
// sequences (e.g. "age_recipients.0").
func yamlPathGet(root *yaml.Node, keys []string) (*yaml.Node, error) {
node := root
if node.Kind == yaml.DocumentNode {
if len(node.Content) == 0 {
return nil, fmt.Errorf("empty config file")
}
node = node.Content[0]
}
for i, key := range keys {
switch node.Kind {
case yaml.MappingNode:
found := false
for j := 0; j+1 < len(node.Content); j += 2 {
if node.Content[j].Value == key {
node = node.Content[j+1]
found = true
break
}
}
if !found {
return nil, fmt.Errorf("key not found: %s", strings.Join(keys[:i+1], "."))
}
case yaml.SequenceNode:
idx, err := strconv.Atoi(key)
if err != nil {
return nil, fmt.Errorf("key %q is a list; use a numeric index", strings.Join(keys[:i], "."))
}
if idx < 0 || idx >= len(node.Content) {
return nil, fmt.Errorf("index %d out of range for %s (len %d)", idx, strings.Join(keys[:i], "."), len(node.Content))
}
node = node.Content[idx]
default:
return nil, fmt.Errorf("key %q is not a map or list", strings.Join(keys[:i], "."))
}
}
return node, nil
}
// yamlPathSet navigates a dotted key path, creating intermediate maps as
// needed, and sets the final key to the given scalar value. Numeric path
// components index into sequences; an index equal to the sequence length
// appends a new element (e.g. "age_recipients.1" on a 1-element list).
func yamlPathSet(root *yaml.Node, keys []string, value string) error {
node := root
if node.Kind == yaml.DocumentNode {
if len(node.Content) == 0 {
node.Content = []*yaml.Node{{Kind: yaml.MappingNode}}
}
node = node.Content[0]
}
for i, key := range keys {
last := i == len(keys)-1
switch node.Kind {
case yaml.MappingNode:
var valueNode *yaml.Node
for j := 0; j+1 < len(node.Content); j += 2 {
if node.Content[j].Value == key {
valueNode = node.Content[j+1]
break
}
}
if valueNode == nil {
keyNode := &yaml.Node{Kind: yaml.ScalarNode, Value: key}
valueNode = &yaml.Node{Kind: yaml.MappingNode}
if last {
valueNode = &yaml.Node{Kind: yaml.ScalarNode, Value: value}
}
node.Content = append(node.Content, keyNode, valueNode)
} else if last {
setScalar(valueNode, value)
}
node = valueNode
case yaml.SequenceNode:
idx, err := strconv.Atoi(key)
if err != nil {
return fmt.Errorf("key %q is a list; use a numeric index", strings.Join(keys[:i], "."))
}
if idx < 0 || idx > len(node.Content) {
return fmt.Errorf("index %d out of range for %s (len %d)", idx, strings.Join(keys[:i], "."), len(node.Content))
}
if idx == len(node.Content) {
newNode := &yaml.Node{Kind: yaml.MappingNode}
if last {
newNode = &yaml.Node{Kind: yaml.ScalarNode, Value: value}
}
node.Content = append(node.Content, newNode)
} else if last {
setScalar(node.Content[idx], value)
}
node = node.Content[idx]
default:
return fmt.Errorf("key %q is not a map or list", strings.Join(keys[:i], "."))
}
}
return nil
}
// setScalar overwrites a node in place with a plain scalar value.
func setScalar(n *yaml.Node, value string) {
n.Kind = yaml.ScalarNode
n.Tag = ""
n.Value = value
n.Content = nil
n.Style = 0
}
// configPathForInit returns the config path to write, checking --config flag,
// VAULTIK_CONFIG env, and the platform default.
func configPathForInit() string {
if rootFlags.ConfigPath != "" {
return rootFlags.ConfigPath
}
if envPath := os.Getenv("VAULTIK_CONFIG"); envPath != "" {
return envPath
}
return DefaultConfigPath()
}

161
internal/cli/config_test.go Normal file
View File

@@ -0,0 +1,161 @@
package cli
import (
"strings"
"testing"
"gopkg.in/yaml.v3"
"sneak.berlin/go/vaultik/internal/config"
)
// TestDefaultConfigTemplateParses ensures the init template is valid YAML
// that unmarshals into the Config struct with the expected snapshots.
func TestDefaultConfigTemplateParses(t *testing.T) {
var cfg config.Config
if err := yaml.Unmarshal([]byte(defaultConfigTemplate), &cfg); err != nil {
t.Fatalf("default config template is not valid YAML: %v", err)
}
if len(cfg.AgeRecipients) != 1 {
t.Errorf("expected 1 placeholder age recipient, got %d", len(cfg.AgeRecipients))
}
home, ok := cfg.Snapshots["home"]
if !ok {
t.Fatal("expected 'home' snapshot in default config")
}
if len(home.Paths) == 0 {
t.Error("home snapshot should have at least one path")
}
if len(home.Exclude) == 0 {
t.Error("home snapshot should have exclude patterns")
}
apps, ok := cfg.Snapshots["apps"]
if !ok {
t.Fatal("expected 'apps' snapshot in default config")
}
if len(apps.Paths) != 1 || apps.Paths[0] != "/Applications" {
t.Errorf("apps snapshot should back up /Applications, got %v", apps.Paths)
}
if len(apps.Exclude) == 0 {
t.Error("apps snapshot should have exclude patterns")
}
}
const testYAML = `# top comment
compression_level: 3
age_recipients:
- age1aaa
s3:
bucket: oldbucket # inline comment
region: us-east-1
snapshots:
home:
paths:
- "~"
`
func parseTestYAML(t *testing.T) *yaml.Node {
t.Helper()
var root yaml.Node
if err := yaml.Unmarshal([]byte(testYAML), &root); err != nil {
t.Fatalf("parsing test yaml: %v", err)
}
return &root
}
func TestYAMLPathGet(t *testing.T) {
root := parseTestYAML(t)
tests := []struct {
path string
want string
err bool
}{
{"compression_level", "3", false},
{"s3.bucket", "oldbucket", false},
{"s3.region", "us-east-1", false},
{"age_recipients.0", "age1aaa", false},
{"age_recipients.5", "", true},
{"age_recipients.notanumber", "", true},
{"s3.nonexistent", "", true},
{"nonexistent", "", true},
{"compression_level.sub", "", true},
}
for _, tt := range tests {
t.Run(tt.path, func(t *testing.T) {
node, err := yamlPathGet(root, splitPath(tt.path))
if tt.err {
if err == nil {
t.Fatalf("expected error for %q", tt.path)
}
return
}
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if node.Value != tt.want {
t.Errorf("get %q = %q, want %q", tt.path, node.Value, tt.want)
}
})
}
}
func TestYAMLPathSet(t *testing.T) {
root := parseTestYAML(t)
// Overwrite existing nested value
if err := yamlPathSet(root, splitPath("s3.bucket"), "newbucket"); err != nil {
t.Fatalf("set s3.bucket: %v", err)
}
// Create new nested key with intermediate map
if err := yamlPathSet(root, splitPath("s3.endpoint"), "s3.example.com"); err != nil {
t.Fatalf("set s3.endpoint: %v", err)
}
if err := yamlPathSet(root, splitPath("newmap.newkey"), "val"); err != nil {
t.Fatalf("set newmap.newkey: %v", err)
}
// Overwrite a sequence element and append a new one
if err := yamlPathSet(root, splitPath("age_recipients.0"), "age1bbb"); err != nil {
t.Fatalf("set age_recipients.0: %v", err)
}
if err := yamlPathSet(root, splitPath("age_recipients.1"), "age1ccc"); err != nil {
t.Fatalf("append age_recipients.1: %v", err)
}
if err := yamlPathSet(root, splitPath("age_recipients.5"), "age1ddd"); err == nil {
t.Error("expected out-of-range append to fail")
}
// Round-trip and verify values + comment preservation
out, err := yaml.Marshal(root)
if err != nil {
t.Fatalf("marshal: %v", err)
}
text := string(out)
for _, want := range []string{"newbucket", "s3.example.com", "newkey: val", "# top comment", "# inline comment", "age1bbb", "age1ccc"} {
if !contains(text, want) {
t.Errorf("round-tripped YAML missing %q:\n%s", want, text)
}
}
got, err := yamlPathGet(root, splitPath("s3.bucket"))
if err != nil {
t.Fatalf("get after set: %v", err)
}
if got.Value != "newbucket" {
t.Errorf("s3.bucket = %q after set, want newbucket", got.Value)
}
}
func splitPath(s string) []string {
return strings.Split(s, ".")
}
func contains(haystack, needle string) bool {
return strings.Contains(haystack, needle)
}

View File

@@ -4,9 +4,9 @@ import (
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/spf13/cobra"
"sneak.berlin/go/vaultik/internal/config"
"sneak.berlin/go/vaultik/internal/log"
)
// NewDatabaseCommand creates the database command group

View File

@@ -2,14 +2,67 @@ package cli
import (
"os"
"strings"
"time"
"sneak.berlin/go/vaultik/internal/globals"
"sneak.berlin/go/vaultik/internal/ui"
)
// CLIEntry is the main entry point for the CLI application.
// It creates the root command, executes it, and exits with status 1
// if an error occurs. This function should be called from main().
// It prints the startup banner (unless a quiet flag is present in os.Args),
// executes the root cobra command, and routes any returned error through
// the ui.Writer so the user sees a properly formatted "🛑 ERROR:" line.
func CLIEntry() {
if !bannerSuppressedInArgs(os.Args[1:]) {
short := globals.Commit
if len(short) > 12 {
short = short[:12]
}
writeStartupBanner(ui.New(os.Stdout), time.Now().UTC(), short)
}
rootCmd := NewRootCommand()
rootCmd.SilenceErrors = true
if err := rootCmd.Execute(); err != nil {
ReportError("%s", err.Error())
os.Exit(1)
}
}
// ReportError emits a user-facing error to stderr in the standard
// 🛑 ERROR: format. Use it from goroutine error paths (where returning
// an error to cobra isn't an option) and anywhere else a CLI command
// must surface a failure outside the normal RunE return path.
func ReportError(format string, args ...any) {
ui.New(os.Stderr).Error(format, args...)
}
// bannerSuppressedInArgs reports whether any of args is a flag that
// should suppress the startup banner (--quiet/-q/--cron). Stops at the
// "--" argument terminator. Recognizes both long forms and short -q,
// including combined short flags like "-qv".
func bannerSuppressedInArgs(args []string) bool {
for _, a := range args {
if a == "--" {
return false
}
switch a {
case "--quiet", "-q", "--cron":
return true
}
if strings.HasPrefix(a, "--quiet=") || strings.HasPrefix(a, "--cron=") {
return true
}
// Combined short flags like -qv or -vq.
if len(a) > 1 && a[0] == '-' && a[1] != '-' {
for _, c := range a[1:] {
if c == 'q' {
return true
}
}
}
}
return false
}

View File

@@ -18,7 +18,7 @@ func TestCLIEntry(t *testing.T) {
}
// Verify all subcommands are registered
expectedCommands := []string{"snapshot", "store", "restore", "prune", "verify", "info", "version"}
expectedCommands := []string{"config", "snapshot", "store", "prune", "info", "version", "remote", "database"}
for _, expected := range expectedCommands {
found := false
for _, cmd := range cmd.Commands() {
@@ -38,7 +38,7 @@ func TestCLIEntry(t *testing.T) {
t.Errorf("Failed to find snapshot command: %v", err)
} else {
// Check snapshot subcommands
expectedSubCommands := []string{"create", "list", "purge", "verify"}
expectedSubCommands := []string{"create", "list", "purge", "verify", "cleanup", "restore"}
for _, expected := range expectedSubCommands {
found := false
for _, subcmd := range snapshotCmd.Commands() {

View File

@@ -4,10 +4,10 @@ import (
"context"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/vaultik"
)
// NewInfoCommand creates the info command
@@ -47,6 +47,7 @@ func NewInfoCommand() *cobra.Command {
if err := v.ShowInfo(); err != nil {
if err != context.Canceled {
log.Error("Failed to show info", "error", err)
ReportError("Failed to show info: %v", err)
os.Exit(1)
}
}

View File

@@ -4,10 +4,10 @@ import (
"context"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/vaultik"
)
// NewPruneCommand creates the prune command
@@ -16,14 +16,19 @@ func NewPruneCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "prune",
Short: "Remove unreferenced blobs",
Long: `Removes blobs that are not referenced by any snapshot.
Short: "Tidy local database and remote storage",
Long: `Removes orphaned data from both the local index database and
unreferenced blobs from the backup destination store.
This command scans all snapshots and their manifests to build a list of
referenced blobs, then removes any blobs in storage that are not in this list.
Local cleanup drops incomplete snapshots and any files, chunks, or
blobs no longer referenced by a completed snapshot. Remote cleanup
scans every snapshot manifest in the destination store, builds the
set of still-referenced blob hashes, and deletes any blob not in that
set.
Use this command after deleting snapshots with 'vaultik purge' to reclaim
storage space.`,
Snapshot create --prune and snapshot remove run the same cleanup
automatically; this command is the manual entry point for the same
work (e.g. after a crashed backup or to reclaim storage).`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Use unified config resolution
@@ -49,10 +54,11 @@ storage space.`,
// Start the prune operation in a goroutine
go func() {
// Run the prune operation
if err := v.PruneBlobs(opts); err != nil {
if err := v.Prune(opts); err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Prune operation failed", "error", err)
ReportError("Prune failed: %v", err)
}
os.Exit(1)
}

View File

@@ -1,101 +0,0 @@
package cli
import (
"context"
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// NewPurgeCommand creates the purge command
func NewPurgeCommand() *cobra.Command {
opts := &vaultik.SnapshotPurgeOptions{}
cmd := &cobra.Command{
Use: "purge",
Short: "Purge old snapshots",
Long: `Removes snapshots based on age or count criteria.
This command allows you to:
- Keep only the latest snapshot per name (--keep-latest)
- Remove snapshots older than a specific duration (--older-than)
- Filter to a specific snapshot name (--name)
When --keep-latest is used, retention is applied per snapshot name. For example,
if you have snapshots named "home" and "system", --keep-latest keeps the most
recent of each.
Use --name to restrict the purge to a single snapshot name.
Config is located at /etc/vaultik/config.yml by default, but can be overridden by
specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Validate flags
if !opts.KeepLatest && opts.OlderThan == "" {
return fmt.Errorf("must specify either --keep-latest or --older-than")
}
if opts.KeepLatest && opts.OlderThan != "" {
return fmt.Errorf("cannot specify both --keep-latest and --older-than")
}
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
// Use the app framework like other commands
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Start the purge operation in a goroutine
go func() {
// Run the purge operation
if err := v.PurgeSnapshotsWithOptions(opts); err != nil {
if err != context.Canceled {
log.Error("Purge operation failed", "error", err)
os.Exit(1)
}
}
// Shutdown the app when purge completes
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping purge operation")
v.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().BoolVar(&opts.KeepLatest, "keep-latest", false, "Keep only the latest snapshot per name")
cmd.Flags().StringVar(&opts.OlderThan, "older-than", "", "Remove snapshots older than duration (e.g. 30d, 6m, 1y)")
cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompts")
cmd.Flags().StringVar(&opts.Name, "name", "", "Filter purge to a specific snapshot name")
return cmd
}

View File

@@ -2,12 +2,13 @@ package cli
import (
"context"
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/vaultik"
)
// NewRemoteCommand creates the remote command and subcommands
@@ -20,6 +21,73 @@ func NewRemoteCommand() *cobra.Command {
// Add subcommands
cmd.AddCommand(newRemoteInfoCommand())
cmd.AddCommand(newRemoteNukeCommand())
return cmd
}
// newRemoteNukeCommand creates the 'remote nuke' subcommand.
func newRemoteNukeCommand() *cobra.Command {
var force bool
cmd := &cobra.Command{
Use: "nuke",
Short: "Delete ALL snapshot metadata and blobs from the backup destination store",
Long: `Removes every snapshot's metadata and every blob from remote
storage. After this command completes successfully the bucket prefix is
empty and the next backup starts from scratch.
This is destructive and irreversible. Requires --force.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
if !force {
return fmt.Errorf("remote nuke requires --force (this deletes ALL remote snapshots and blobs)")
}
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet,
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
if err := v.NukeRemote(true); err != nil {
if err != context.Canceled {
log.Error("Remote nuke failed", "error", err)
ReportError("Remote nuke failed: %v", err)
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
v.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().BoolVar(&force, "force", false, "Required: confirm destruction of ALL remote data")
return cmd
}
@@ -62,6 +130,7 @@ func newRemoteInfoCommand() *cobra.Command {
if err != context.Canceled {
if !jsonOutput {
log.Error("Failed to get remote info", "error", err)
ReportError("Failed to get remote info: %v", err)
}
os.Exit(1)
}

View File

@@ -3,7 +3,10 @@ package cli
import (
"fmt"
"os"
"path/filepath"
"strings"
"github.com/adrg/xdg"
"github.com/spf13/cobra"
)
@@ -14,6 +17,7 @@ type RootFlags struct {
Verbose bool
Debug bool
Quiet bool
SkipErrors bool
}
var rootFlags RootFlags
@@ -25,23 +29,30 @@ func NewRootCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "vaultik",
Short: "Secure incremental backup tool with asymmetric encryption",
Long: `vaultik is a secure incremental backup daemon that encrypts data using age
Long: `vaultik is a secure incremental backup tool that encrypts data using age
public keys and uploads to S3-compatible storage. No private keys are needed
on the source system.`,
SilenceUsage: true,
// Bare 'vaultik' (no subcommand): print help. The banner is
// printed once at process startup by CLIEntry, before cobra
// parses arguments, so it appears even when cobra rejects
// args (e.g. "requires at least 2 arg(s)") and on --help.
Run: func(cmd *cobra.Command, args []string) {
_ = cmd.Help()
},
}
// Add global flags
cmd.PersistentFlags().StringVar(&rootFlags.ConfigPath, "config", "", "Path to config file (default: $VAULTIK_CONFIG or /etc/vaultik/config.yml)")
cmd.PersistentFlags().StringVar(&rootFlags.ConfigPath, "config", "", "Path to config file (default: $VAULTIK_CONFIG or platform config dir)")
cmd.PersistentFlags().BoolVarP(&rootFlags.Verbose, "verbose", "v", false, "Enable verbose output")
cmd.PersistentFlags().BoolVar(&rootFlags.Debug, "debug", false, "Enable debug output")
cmd.PersistentFlags().BoolVarP(&rootFlags.Quiet, "quiet", "q", false, "Suppress non-error output")
cmd.PersistentFlags().BoolVar(&rootFlags.SkipErrors, "skip-errors", false, "Continue past per-file errors instead of aborting (applies to snapshot create and restore)")
// Add subcommands
cmd.AddCommand(
NewRestoreCommand(),
NewConfigCommand(),
NewPruneCommand(),
NewVerifyCommand(),
NewStoreCommand(),
NewSnapshotCommand(),
NewInfoCommand(),
@@ -60,25 +71,49 @@ func GetRootFlags() RootFlags {
}
// ResolveConfigPath resolves the config file path from flags, environment, or default.
// It checks in order: 1) --config flag, 2) VAULTIK_CONFIG environment variable,
// 3) default location /etc/vaultik/config.yml. Returns an error if no valid
// config file can be found through any of these methods.
// Search order: --config flag, VAULTIK_CONFIG env, XDG config dir, /etc/vaultik/config.yml.
// Explicit paths from --config and $VAULTIK_CONFIG are checked for existence
// so the user gets a clear error instead of a downstream YAML parser failure.
func ResolveConfigPath() (string, error) {
// First check global flag
if rootFlags.ConfigPath != "" {
return rootFlags.ConfigPath, nil
if path := rootFlags.ConfigPath; path != "" {
if _, err := os.Stat(path); err != nil {
return "", fmt.Errorf("config file from --config not found: %s (run 'vaultik config init --config %s' to create it)", path, path)
}
return path, nil
}
// Then check environment variable
if envPath := os.Getenv("VAULTIK_CONFIG"); envPath != "" {
return envPath, nil
if path := os.Getenv("VAULTIK_CONFIG"); path != "" {
if _, err := os.Stat(path); err != nil {
return "", fmt.Errorf("config file from $VAULTIK_CONFIG not found: %s (unset VAULTIK_CONFIG, point it at an existing file, or run 'vaultik config init')", path)
}
return path, nil
}
// Finally check default location
defaultPath := "/etc/vaultik/config.yml"
if _, err := os.Stat(defaultPath); err == nil {
return defaultPath, nil
for _, path := range defaultConfigPaths() {
if _, err := os.Stat(path); err == nil {
return path, nil
}
}
return "", fmt.Errorf("no config file specified, VAULTIK_CONFIG not set, and %s not found", defaultPath)
return "", fmt.Errorf("no config file found at %s (run 'vaultik config init' to create the default config, or pass --config <path>)", strings.Join(defaultConfigPaths(), " or "))
}
// defaultConfigPaths returns the ordered list of config paths to search.
// On macOS: ~/Library/Application Support/vaultik/config.yml
// On Linux: ~/.config/vaultik/config.yml
// Fallback: /etc/vaultik/config.yml
func defaultConfigPaths() []string {
return []string{
filepath.Join(xdg.ConfigHome, "vaultik", "config.yml"),
"/etc/vaultik/config.yml",
}
}
// DefaultConfigPath returns the platform-appropriate default config path.
// Used by the init command and in help text.
func DefaultConfigPath() string {
if os.Getuid() == 0 {
return "/etc/vaultik/config.yml"
}
return filepath.Join(xdg.ConfigHome, "vaultik", "config.yml")
}

View File

@@ -5,10 +5,10 @@ import (
"fmt"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/vaultik"
)
// NewSnapshotCommand creates the snapshot command and subcommands
@@ -25,7 +25,8 @@ func NewSnapshotCommand() *cobra.Command {
cmd.AddCommand(newSnapshotPurgeCommand())
cmd.AddCommand(newSnapshotVerifyCommand())
cmd.AddCommand(newSnapshotRemoveCommand())
cmd.AddCommand(newSnapshotPruneCommand())
cmd.AddCommand(newSnapshotCleanupCommand())
cmd.AddCommand(newSnapshotRestoreCommand())
return cmd
}
@@ -48,6 +49,8 @@ specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
RunE: func(cmd *cobra.Command, args []string) error {
// Pass snapshot names from args
opts.Snapshots = args
// --skip-errors is a global flag on the root command.
opts.SkipErrors = rootFlags.SkipErrors
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
@@ -71,10 +74,12 @@ specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
OnStart: func(ctx context.Context) error {
// Start the snapshot creation in a goroutine
go func() {
// Run the snapshot creation
// --cron suppression is wired through v.UI by setupGlobals.
if err := v.CreateSnapshot(opts); err != nil {
if err != context.Canceled {
log.Error("Snapshot creation failed", "error", err)
ReportError("Snapshot creation failed: %v", err)
os.Exit(1)
}
}
@@ -98,10 +103,9 @@ specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
},
}
cmd.Flags().BoolVar(&opts.Daemon, "daemon", false, "Run in daemon mode with inotify monitoring")
cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
cmd.Flags().BoolVar(&opts.Prune, "prune", false, "Delete all previous snapshots and unreferenced blobs after backup")
cmd.Flags().BoolVar(&opts.SkipErrors, "skip-errors", false, "Skip file read errors (log them loudly but continue)")
cmd.Flags().BoolVar(&opts.Prune, "prune", false, "After backup, drop older snapshots of the same name and remove orphaned blobs")
cmd.Flags().StringVar(&opts.KeepNewerThan, "keep-newer-than", "", "With --prune: keep snapshots newer than this duration (e.g. 4w, 30d, 6mo) instead of only the latest")
return cmd
}
@@ -140,6 +144,7 @@ func newSnapshotListCommand() *cobra.Command {
if err := v.ListSnapshots(jsonOutput); err != nil {
if err != context.Canceled {
log.Error("Failed to list snapshots", "error", err)
ReportError("Failed to list snapshots: %v", err)
os.Exit(1)
}
}
@@ -174,11 +179,9 @@ func newSnapshotPurgeCommand() *cobra.Command {
Short: "Purge old snapshots",
Long: `Removes snapshots based on age or count criteria.
When --keep-latest is used, retention is applied per snapshot name. For example,
if you have snapshots named "home" and "system", --keep-latest keeps the most
recent of each.
Use --name to restrict the purge to a single snapshot name.`,
Retention is per-snapshot-name: --keep-latest keeps the latest of each
configured snapshot name, not the latest globally. Use --snapshot to
restrict the operation to specific snapshot names.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Validate flags
@@ -212,6 +215,7 @@ Use --name to restrict the purge to a single snapshot name.`,
if err := v.PurgeSnapshotsWithOptions(opts); err != nil {
if err != context.Canceled {
log.Error("Failed to purge snapshots", "error", err)
ReportError("Failed to purge snapshots: %v", err)
os.Exit(1)
}
}
@@ -232,10 +236,10 @@ Use --name to restrict the purge to a single snapshot name.`,
},
}
cmd.Flags().BoolVar(&opts.KeepLatest, "keep-latest", false, "Keep only the latest snapshot per name")
cmd.Flags().BoolVar(&opts.KeepLatest, "keep-latest", false, "Keep only the latest snapshot of each name")
cmd.Flags().StringVar(&opts.OlderThan, "older-than", "", "Remove snapshots older than duration (e.g., 30d, 6m, 1y)")
cmd.Flags().BoolVar(&opts.Force, "force", false, "Skip confirmation prompt")
cmd.Flags().StringVar(&opts.Name, "name", "", "Filter purge to a specific snapshot name")
cmd.Flags().StringArrayVar(&opts.Names, "snapshot", nil, "Restrict to snapshots with these names (repeat for multiple)")
return cmd
}
@@ -281,16 +285,11 @@ func newSnapshotVerifyCommand() *cobra.Command {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
var err error
if opts.Deep {
err = v.RunDeepVerify(snapshotID, opts)
} else {
err = v.VerifySnapshotWithOptions(snapshotID, opts)
}
if err != nil {
if err := v.VerifySnapshotWithOptions(snapshotID, opts); err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Verification failed", "error", err)
ReportError("Verification failed: %v", err)
}
os.Exit(1)
}
@@ -384,6 +383,7 @@ Use --all --force to remove all snapshots.`,
if err != context.Canceled {
if !opts.JSON {
log.Error("Failed to remove snapshot", "error", err)
ReportError("Failed to remove snapshot: %v", err)
}
os.Exit(1)
}
@@ -414,18 +414,18 @@ Use --all --force to remove all snapshots.`,
return cmd
}
// newSnapshotPruneCommand creates the 'snapshot prune' subcommand
func newSnapshotPruneCommand() *cobra.Command {
// newSnapshotCleanupCommand creates the 'snapshot cleanup' subcommand
func newSnapshotCleanupCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "prune",
Short: "Remove orphaned data from local database",
Long: `Removes orphaned files, chunks, and blobs from the local database.
Use: "cleanup",
Short: "Remove stale local snapshot records not found in remote storage",
Long: `Removes local database records for snapshots whose metadata no longer
exists in remote storage. These are typically left behind by incomplete
or interrupted backups.
This cleans up data that is no longer referenced by any snapshot, which can
accumulate from incomplete backups or deleted snapshots.`,
This command does not delete anything from remote storage.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
@@ -445,9 +445,10 @@ accumulate from incomplete backups or deleted snapshots.`,
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
go func() {
if _, err := v.PruneDatabase(); err != nil {
if err := v.CleanupLocalSnapshots(); err != nil {
if err != context.Canceled {
log.Error("Failed to prune database", "error", err)
log.Error("Cleanup failed", "error", err)
ReportError("Cleanup failed: %v", err)
os.Exit(1)
}
}

View File

@@ -2,14 +2,15 @@ package cli
import (
"context"
"os"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/config"
"sneak.berlin/go/vaultik/internal/globals"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/storage"
"sneak.berlin/go/vaultik/internal/vaultik"
)
// RestoreOptions contains options for the restore command
@@ -28,13 +29,13 @@ type RestoreApp struct {
Shutdowner fx.Shutdowner
}
// NewRestoreCommand creates the restore command
func NewRestoreCommand() *cobra.Command {
// newSnapshotRestoreCommand creates the 'snapshot restore' subcommand
func newSnapshotRestoreCommand() *cobra.Command {
opts := &RestoreOptions{}
cmd := &cobra.Command{
Use: "restore <snapshot-id> <target-dir> [paths...]",
Short: "Restore files from backup",
Short: "Restore files from a snapshot",
Long: `Download and decrypt files from a backup snapshot.
This command will restore files from the specified snapshot to the target directory.
@@ -45,16 +46,16 @@ Requires the VAULTIK_AGE_SECRET_KEY environment variable to be set with the age
Examples:
# Restore entire snapshot
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore
vaultik snapshot restore myhost_docs_2025-01-01T12:00:00Z /restore
# Restore specific file
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/important.txt
vaultik snapshot restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/important.txt
# Restore specific directory
vaultik restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/documents/
vaultik snapshot restore myhost_docs_2025-01-01T12:00:00Z /restore /home/user/documents/
# Restore and verify all files
vaultik restore --verify myhost_docs_2025-01-01T12:00:00Z /restore`,
vaultik snapshot restore --verify myhost_docs_2025-01-01T12:00:00Z /restore`,
Args: cobra.MinimumNArgs(2),
RunE: func(cmd *cobra.Command, args []string) error {
return runRestore(cmd, args, opts)
@@ -126,10 +127,13 @@ func buildRestoreInvokes(snapshotID string, opts *RestoreOptions) []fx.Option {
TargetDir: opts.TargetDir,
Paths: opts.Paths,
Verify: opts.Verify,
SkipErrors: GetRootFlags().SkipErrors,
}
if err := app.Vaultik.Restore(restoreOpts); err != nil {
if err != context.Canceled {
log.Error("Restore operation failed", "error", err)
ReportError("Restore failed: %v", err)
os.Exit(1)
}
}

View File

@@ -6,10 +6,10 @@ import (
"strings"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/storage"
"github.com/spf13/cobra"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/storage"
)
// StoreApp contains dependencies for store commands

View File

@@ -1,98 +0,0 @@
package cli
import (
"context"
"os"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// NewVerifyCommand creates the verify command
func NewVerifyCommand() *cobra.Command {
opts := &vaultik.VerifyOptions{}
cmd := &cobra.Command{
Use: "verify <snapshot-id>",
Short: "Verify snapshot integrity",
Long: `Verifies that all blobs referenced in a snapshot exist and optionally verifies their contents.
Shallow verification (default):
- Downloads and decompresses manifest
- Checks existence of all blobs in S3
- Reports missing blobs
Deep verification (--deep):
- Downloads and decrypts database
- Verifies blob lists match between manifest and database
- Downloads, decrypts, and decompresses each blob
- Verifies SHA256 hash of each chunk matches database
- Ensures chunks are ordered correctly
The command will fail immediately on any verification error and exit with non-zero status.`,
Args: cobra.ExactArgs(1),
RunE: func(cmd *cobra.Command, args []string) error {
snapshotID := args[0]
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
// Use the app framework for all verification
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Quiet: rootFlags.Quiet || opts.JSON, // Suppress log output in JSON mode
},
Modules: []fx.Option{},
Invokes: []fx.Option{
fx.Invoke(func(v *vaultik.Vaultik, lc fx.Lifecycle) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Run the verify operation directly
go func() {
var err error
if opts.Deep {
err = v.RunDeepVerify(snapshotID, opts)
} else {
err = v.VerifySnapshotWithOptions(snapshotID, opts)
}
if err != nil {
if err != context.Canceled {
if !opts.JSON {
log.Error("Verification failed", "error", err)
}
os.Exit(1)
}
}
if err := v.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping verify operation")
v.Cancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().BoolVar(&opts.Deep, "deep", false, "Perform deep verification by downloading and verifying all blob contents")
cmd.Flags().BoolVar(&opts.JSON, "json", false, "Output verification results as JSON")
return cmd
}

View File

@@ -4,8 +4,8 @@ import (
"fmt"
"runtime"
"git.eeqj.de/sneak/vaultik/internal/globals"
"github.com/spf13/cobra"
"sneak.berlin/go/vaultik/internal/globals"
)
// NewVersionCommand creates the version command
@@ -17,9 +17,19 @@ func NewVersionCommand() *cobra.Command {
Args: cobra.NoArgs,
Run: func(cmd *cobra.Command, args []string) {
fmt.Printf("vaultik %s\n", globals.Version)
fmt.Printf(" commit: %s\n", globals.Commit)
fmt.Printf(" go: %s\n", runtime.Version())
fmt.Printf(" os/arch: %s/%s\n", runtime.GOOS, runtime.GOARCH)
fmt.Printf(" commit: %s\n", globals.Commit)
fmt.Printf(" build date: %s\n", globals.CommitDate)
fmt.Printf(" go: %s\n", runtime.Version())
fmt.Printf(" os/arch: %s/%s\n", runtime.GOOS, runtime.GOARCH)
fmt.Printf(" author: %s\n", globals.Author)
fmt.Printf(" homepage: %s\n", globals.Homepage)
fmt.Printf(" license: %s\n", globals.License)
if globals.Version == "dev" {
fmt.Println()
fmt.Println("This is a development build (no version information embedded).")
fmt.Println("Build a release binary with 'make vaultik' or download from")
fmt.Println("https://sneak.berlin/go/vaultik for embedded version metadata.")
}
},
}

View File

@@ -6,17 +6,16 @@ import (
"path/filepath"
"sort"
"strings"
"time"
"filippo.io/age"
"git.eeqj.de/sneak/smartconfig"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/adrg/xdg"
"go.uber.org/fx"
"gopkg.in/yaml.v3"
"sneak.berlin/go/vaultik/internal/log"
)
const appName = "berlin.sneak.app.vaultik"
const appName = "vaultik"
// expandTilde expands ~ at the start of a path to the user's home directory.
func expandTilde(path string) string {
@@ -83,19 +82,16 @@ func (c *Config) SnapshotNames() []string {
// encryption recipients, storage configuration, and performance tuning parameters.
// Configuration is typically loaded from a YAML file.
type Config struct {
AgeRecipients []string `yaml:"age_recipients"`
AgeSecretKey string `yaml:"age_secret_key"`
BackupInterval time.Duration `yaml:"backup_interval"`
BlobSizeLimit Size `yaml:"blob_size_limit"`
ChunkSize Size `yaml:"chunk_size"`
Exclude []string `yaml:"exclude"` // Global excludes applied to all snapshots
FullScanInterval time.Duration `yaml:"full_scan_interval"`
Hostname string `yaml:"hostname"`
IndexPath string `yaml:"index_path"`
MinTimeBetweenRun time.Duration `yaml:"min_time_between_run"`
S3 S3Config `yaml:"s3"`
Snapshots map[string]SnapshotConfig `yaml:"snapshots"`
CompressionLevel int `yaml:"compression_level"`
AgeRecipients []string `yaml:"age_recipients"`
AgeSecretKey string `yaml:"age_secret_key"`
BlobSizeLimit Size `yaml:"blob_size_limit"`
ChunkSize Size `yaml:"chunk_size"`
Exclude []string `yaml:"exclude"` // Global excludes applied to all snapshots
Hostname string `yaml:"hostname"`
IndexPath string `yaml:"index_path"`
S3 S3Config `yaml:"s3"`
Snapshots map[string]SnapshotConfig `yaml:"snapshots"`
CompressionLevel int `yaml:"compression_level"`
// StorageURL specifies the storage backend using a URL format.
// Takes precedence over S3Config if set.
@@ -155,13 +151,10 @@ func Load(path string) (*Config, error) {
cfg := &Config{
// Set defaults
BlobSizeLimit: Size(10 * 1024 * 1024 * 1024), // 10GB
ChunkSize: Size(10 * 1024 * 1024), // 10MB
BackupInterval: 1 * time.Hour,
FullScanInterval: 24 * time.Hour,
MinTimeBetweenRun: 15 * time.Minute,
IndexPath: filepath.Join(xdg.DataHome, appName, "index.sqlite"),
CompressionLevel: 3,
BlobSizeLimit: Size(10 * 1024 * 1024 * 1024), // 10GB
ChunkSize: Size(10 * 1024 * 1024), // 10MB
IndexPath: filepath.Join(xdg.DataHome, appName, "index.sqlite"),
CompressionLevel: 3,
}
// Convert smartconfig data to YAML then unmarshal
@@ -243,11 +236,11 @@ func Load(path string) (*Config, error) {
// Returns an error describing the first validation failure encountered.
func (c *Config) Validate() error {
if len(c.AgeRecipients) == 0 {
return fmt.Errorf("at least one age_recipient is required")
return fmt.Errorf("at least one age_recipient is required (generate with: age-keygen)")
}
if len(c.Snapshots) == 0 {
return fmt.Errorf("at least one snapshot must be configured")
return fmt.Errorf("at least one snapshot must be configured (see config.example.yml)")
}
for name, snap := range c.Snapshots {
@@ -306,7 +299,7 @@ func (c *Config) validateStorage() error {
// Legacy S3 configuration
if c.S3.Endpoint == "" {
return fmt.Errorf("s3.endpoint is required (or set storage_url)")
return fmt.Errorf("storage not configured; set storage_url or provide s3.endpoint + s3.bucket + credentials")
}
if c.S3.Bucket == "" {

View File

@@ -6,7 +6,7 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
func TestBlobChunkRepository(t *testing.T) {

View File

@@ -6,7 +6,7 @@ import (
"fmt"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/log"
)
type BlobRepository struct {
@@ -130,6 +130,51 @@ func (r *BlobRepository) GetByID(ctx context.Context, id string) (*Blob, error)
return &blob, nil
}
// GetAll returns every blob row keyed by blob ID. Useful at restore
// start to translate the per-chunk blob_id references in chunkToBlobMap
// into blob hashes without doing one GetByID query per chunk.
func (r *BlobRepository) GetAll(ctx context.Context) (map[string]*Blob, error) {
query := `
SELECT id, blob_hash, created_ts, finished_ts, uncompressed_size, compressed_size, uploaded_ts
FROM blobs
`
rows, err := r.db.conn.QueryContext(ctx, query)
if err != nil {
return nil, fmt.Errorf("querying blobs: %w", err)
}
defer CloseRows(rows)
out := make(map[string]*Blob)
for rows.Next() {
var blob Blob
var createdTSUnix int64
var finishedTSUnix, uploadedTSUnix sql.NullInt64
if err := rows.Scan(
&blob.ID,
&blob.Hash,
&createdTSUnix,
&finishedTSUnix,
&blob.UncompressedSize,
&blob.CompressedSize,
&uploadedTSUnix,
); err != nil {
return nil, fmt.Errorf("scanning blob: %w", err)
}
blob.CreatedTS = time.Unix(createdTSUnix, 0).UTC()
if finishedTSUnix.Valid {
ts := time.Unix(finishedTSUnix.Int64, 0).UTC()
blob.FinishedTS = &ts
}
if uploadedTSUnix.Valid {
ts := time.Unix(uploadedTSUnix.Int64, 0).UTC()
blob.UploadedTS = &ts
}
out[blob.ID.String()] = &blob
}
return out, rows.Err()
}
// UpdateFinished updates a blob when it's finalized
func (r *BlobRepository) UpdateFinished(ctx context.Context, tx *sql.Tx, id string, hash string, uncompressedSize, compressedSize int64) error {
query := `

View File

@@ -5,7 +5,7 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
func TestBlobRepository(t *testing.T) {

View File

@@ -6,7 +6,7 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
// TestCascadeDeleteDebug tests cascade delete with debug output

View File

@@ -5,7 +5,7 @@ import (
"database/sql"
"fmt"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
type ChunkFileRepository struct {

View File

@@ -5,7 +5,7 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
func TestChunkFileRepository(t *testing.T) {

View File

@@ -5,7 +5,7 @@ import (
"database/sql"
"fmt"
"git.eeqj.de/sneak/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/log"
)
type ChunkRepository struct {

View File

@@ -4,7 +4,7 @@ import (
"context"
"testing"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
func TestChunkRepository(t *testing.T) {

View File

@@ -6,24 +6,32 @@
// multiple source files. Blobs are content-addressed, meaning their filename
// is derived from their SHA256 hash after compression and encryption.
//
// The database does not support migrations. If the schema changes, delete
// the local database and perform a full backup to recreate it.
// Schema is managed via numbered SQL migrations embedded in the schema/
// directory. Migration 000.sql bootstraps the schema_migrations tracking
// table; subsequent migrations (001, 002, …) are applied in order.
package database
import (
"context"
"database/sql"
_ "embed"
"embed"
"fmt"
"os"
"path/filepath"
"sort"
"strconv"
"strings"
"git.eeqj.de/sneak/vaultik/internal/log"
_ "modernc.org/sqlite"
"sneak.berlin/go/vaultik/internal/log"
)
//go:embed schema.sql
var schemaSQL string
//go:embed schema/*.sql
var schemaFS embed.FS
// bootstrapVersion is the migration that creates the schema_migrations
// table itself. It is applied before the normal migration loop.
const bootstrapVersion = 0
// DB represents the Vaultik local index database connection.
// It uses SQLite to track file metadata, content-defined chunks, and blob associations.
@@ -35,6 +43,46 @@ type DB struct {
path string
}
// ParseMigrationVersion extracts the numeric version prefix from a migration
// filename. Filenames must follow the pattern "<version>.sql" or
// "<version>_<description>.sql", where version is a zero-padded numeric
// string (e.g. "001", "002"). Returns the version as an integer and an
// error if the filename does not match the expected pattern.
func ParseMigrationVersion(filename string) (int, error) {
name := strings.TrimSuffix(filename, filepath.Ext(filename))
if name == "" {
return 0, fmt.Errorf("invalid migration filename %q: empty name", filename)
}
// Split on underscore to separate version from description.
// If there's no underscore, the entire stem is the version.
versionStr := name
if idx := strings.IndexByte(name, '_'); idx >= 0 {
versionStr = name[:idx]
}
if versionStr == "" {
return 0, fmt.Errorf("invalid migration filename %q: empty version prefix", filename)
}
// Validate the version is purely numeric.
for _, ch := range versionStr {
if ch < '0' || ch > '9' {
return 0, fmt.Errorf(
"invalid migration filename %q: version %q contains non-numeric character %q",
filename, versionStr, string(ch),
)
}
}
version, err := strconv.Atoi(versionStr)
if err != nil {
return 0, fmt.Errorf("invalid migration filename %q: %w", filename, err)
}
return version, nil
}
// New creates a new database connection at the specified path.
// It creates the schema if needed and configures SQLite with WAL mode for
// better concurrency. SQLite handles crash recovery automatically when
@@ -72,9 +120,9 @@ func New(ctx context.Context, path string) (*DB, error) {
}
db := &DB{conn: conn, path: path}
if err := db.createSchema(ctx); err != nil {
if err := applyMigrations(ctx, conn); err != nil {
_ = conn.Close()
return nil, fmt.Errorf("creating schema: %w", err)
return nil, fmt.Errorf("applying migrations: %w", err)
}
return db, nil
}
@@ -125,9 +173,9 @@ func New(ctx context.Context, path string) (*DB, error) {
}
db := &DB{conn: conn, path: path}
if err := db.createSchema(ctx); err != nil {
if err := applyMigrations(ctx, conn); err != nil {
_ = conn.Close()
return nil, fmt.Errorf("creating schema: %w", err)
return nil, fmt.Errorf("applying migrations: %w", err)
}
log.Debug("Database connection established successfully", "path", path)
@@ -198,9 +246,120 @@ func (db *DB) QueryRowWithLog(
return db.conn.QueryRowContext(ctx, query, args...)
}
func (db *DB) createSchema(ctx context.Context) error {
_, err := db.conn.ExecContext(ctx, schemaSQL)
return err
// collectMigrations reads the embedded schema directory and returns
// migration filenames sorted lexicographically.
func collectMigrations() ([]string, error) {
entries, err := schemaFS.ReadDir("schema")
if err != nil {
return nil, fmt.Errorf("failed to read schema directory: %w", err)
}
var migrations []string
for _, entry := range entries {
if !entry.IsDir() && strings.HasSuffix(entry.Name(), ".sql") {
migrations = append(migrations, entry.Name())
}
}
sort.Strings(migrations)
return migrations, nil
}
// bootstrapMigrationsTable ensures the schema_migrations table exists
// by applying 000.sql if the table is missing.
func bootstrapMigrationsTable(ctx context.Context, db *sql.DB) error {
var tableExists int
err := db.QueryRowContext(ctx,
"SELECT COUNT(*) FROM sqlite_master WHERE type='table' AND name='schema_migrations'",
).Scan(&tableExists)
if err != nil {
return fmt.Errorf("failed to check for migrations table: %w", err)
}
if tableExists > 0 {
return nil
}
content, err := schemaFS.ReadFile("schema/000.sql")
if err != nil {
return fmt.Errorf("failed to read bootstrap migration 000.sql: %w", err)
}
log.Info("applying bootstrap migration", "version", bootstrapVersion)
_, err = db.ExecContext(ctx, string(content))
if err != nil {
return fmt.Errorf("failed to apply bootstrap migration: %w", err)
}
return nil
}
// applyMigrations applies all pending migrations to db. It first bootstraps
// the schema_migrations table via 000.sql, then iterates through remaining
// migration files in order.
func applyMigrations(ctx context.Context, db *sql.DB) error {
if err := bootstrapMigrationsTable(ctx, db); err != nil {
return err
}
migrations, err := collectMigrations()
if err != nil {
return err
}
for _, migration := range migrations {
version, parseErr := ParseMigrationVersion(migration)
if parseErr != nil {
return parseErr
}
// Check if already applied.
var count int
err := db.QueryRowContext(ctx,
"SELECT COUNT(*) FROM schema_migrations WHERE version = ?",
version,
).Scan(&count)
if err != nil {
return fmt.Errorf("failed to check migration status: %w", err)
}
if count > 0 {
log.Debug("migration already applied", "version", version)
continue
}
// Read and apply migration.
content, readErr := schemaFS.ReadFile(filepath.Join("schema", migration))
if readErr != nil {
return fmt.Errorf("failed to read migration %s: %w", migration, readErr)
}
log.Info("applying migration", "version", version)
_, execErr := db.ExecContext(ctx, string(content))
if execErr != nil {
return fmt.Errorf("failed to apply migration %s: %w", migration, execErr)
}
// Record migration as applied.
_, recErr := db.ExecContext(ctx,
"INSERT INTO schema_migrations (version) VALUES (?)",
version,
)
if recErr != nil {
return fmt.Errorf("failed to record migration %s: %w", migration, recErr)
}
log.Info("migration applied successfully", "version", version)
}
return nil
}
// NewTestDB creates an in-memory SQLite database for testing purposes.

View File

@@ -2,6 +2,7 @@ package database
import (
"context"
"database/sql"
"fmt"
"path/filepath"
"testing"
@@ -26,9 +27,10 @@ func TestDatabase(t *testing.T) {
t.Fatal("database connection is nil")
}
// Test schema creation (already done in New)
// Test schema creation (already done in New via migrations)
// Verify tables exist
tables := []string{
"schema_migrations",
"files", "file_chunks", "chunks", "blobs",
"blob_chunks", "chunk_files", "snapshots",
}
@@ -99,3 +101,139 @@ func TestDatabaseConcurrentAccess(t *testing.T) {
t.Errorf("expected 10 chunks, got %d", count)
}
}
func TestParseMigrationVersion(t *testing.T) {
tests := []struct {
name string
filename string
wantVer int
wantError bool
}{
{name: "valid 000.sql", filename: "000.sql", wantVer: 0, wantError: false},
{name: "valid 001.sql", filename: "001.sql", wantVer: 1, wantError: false},
{name: "valid 099.sql", filename: "099.sql", wantVer: 99, wantError: false},
{name: "valid with description", filename: "001_initial_schema.sql", wantVer: 1, wantError: false},
{name: "valid large version", filename: "123_big_migration.sql", wantVer: 123, wantError: false},
{name: "invalid alpha version", filename: "abc.sql", wantVer: 0, wantError: true},
{name: "invalid mixed chars", filename: "12a.sql", wantVer: 0, wantError: true},
{name: "invalid no extension", filename: "schema.sql", wantVer: 0, wantError: true},
{name: "empty string", filename: "", wantVer: 0, wantError: true},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
got, err := ParseMigrationVersion(tc.filename)
if tc.wantError {
if err == nil {
t.Errorf("ParseMigrationVersion(%q) = %d, nil; want error", tc.filename, got)
}
return
}
if err != nil {
t.Errorf("ParseMigrationVersion(%q) unexpected error: %v", tc.filename, err)
return
}
if got != tc.wantVer {
t.Errorf("ParseMigrationVersion(%q) = %d; want %d", tc.filename, got, tc.wantVer)
}
})
}
}
func TestApplyMigrations_Idempotent(t *testing.T) {
ctx := context.Background()
conn, err := sql.Open("sqlite", ":memory:?_foreign_keys=ON")
if err != nil {
t.Fatalf("failed to open database: %v", err)
}
defer func() {
if err := conn.Close(); err != nil {
t.Errorf("failed to close database: %v", err)
}
}()
conn.SetMaxOpenConns(1)
conn.SetMaxIdleConns(1)
// First run: apply all migrations.
if err := applyMigrations(ctx, conn); err != nil {
t.Fatalf("first applyMigrations failed: %v", err)
}
// Count rows in schema_migrations after first run.
var countBefore int
if err := conn.QueryRowContext(ctx, "SELECT COUNT(*) FROM schema_migrations").Scan(&countBefore); err != nil {
t.Fatalf("failed to count schema_migrations after first run: %v", err)
}
// Second run: must be a no-op.
if err := applyMigrations(ctx, conn); err != nil {
t.Fatalf("second applyMigrations failed: %v", err)
}
// Count rows in schema_migrations after second run — must be unchanged.
var countAfter int
if err := conn.QueryRowContext(ctx, "SELECT COUNT(*) FROM schema_migrations").Scan(&countAfter); err != nil {
t.Fatalf("failed to count schema_migrations after second run: %v", err)
}
if countBefore != countAfter {
t.Errorf("schema_migrations row count changed: before=%d, after=%d", countBefore, countAfter)
}
}
func TestBootstrapMigrationsTable_FreshDatabase(t *testing.T) {
ctx := context.Background()
conn, err := sql.Open("sqlite", ":memory:?_foreign_keys=ON")
if err != nil {
t.Fatalf("failed to open database: %v", err)
}
defer func() {
if err := conn.Close(); err != nil {
t.Errorf("failed to close database: %v", err)
}
}()
conn.SetMaxOpenConns(1)
conn.SetMaxIdleConns(1)
// Verify schema_migrations does NOT exist yet.
var tableBefore int
if err := conn.QueryRowContext(ctx,
"SELECT COUNT(*) FROM sqlite_master WHERE type='table' AND name='schema_migrations'",
).Scan(&tableBefore); err != nil {
t.Fatalf("failed to check for table before bootstrap: %v", err)
}
if tableBefore != 0 {
t.Fatal("schema_migrations table should not exist before bootstrap")
}
// Run bootstrap.
if err := bootstrapMigrationsTable(ctx, conn); err != nil {
t.Fatalf("bootstrapMigrationsTable failed: %v", err)
}
// Verify schema_migrations now exists.
var tableAfter int
if err := conn.QueryRowContext(ctx,
"SELECT COUNT(*) FROM sqlite_master WHERE type='table' AND name='schema_migrations'",
).Scan(&tableAfter); err != nil {
t.Fatalf("failed to check for table after bootstrap: %v", err)
}
if tableAfter != 1 {
t.Fatalf("schema_migrations table should exist after bootstrap, got count=%d", tableAfter)
}
// Verify version 0 row exists.
var version int
if err := conn.QueryRowContext(ctx,
"SELECT version FROM schema_migrations WHERE version = 0",
).Scan(&version); err != nil {
t.Fatalf("version 0 row not found in schema_migrations: %v", err)
}
if version != 0 {
t.Errorf("expected version 0, got %d", version)
}
}

View File

@@ -5,7 +5,7 @@ import (
"database/sql"
"fmt"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
type FileChunkRepository struct {

View File

@@ -6,7 +6,7 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
func TestFileChunkRepository(t *testing.T) {

View File

@@ -6,8 +6,8 @@ import (
"fmt"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/types"
)
type FileRepository struct {

View File

@@ -5,7 +5,7 @@ package database
import (
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
// File represents a file or directory in the backup system.

View File

@@ -6,9 +6,9 @@ import (
"os"
"path/filepath"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/log"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/config"
"sneak.berlin/go/vaultik/internal/log"
)
// Module provides database dependencies

View File

@@ -7,7 +7,7 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
func TestRepositoriesTransaction(t *testing.T) {

View File

@@ -7,7 +7,7 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
// TestFileRepositoryUUIDGeneration tests that files get unique UUIDs

View File

@@ -7,7 +7,7 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
// TestFileRepositoryEdgeCases tests edge cases for file repository

View File

@@ -0,0 +1,9 @@
-- Migration 000: Schema migrations tracking table
-- Applied as a bootstrap step before the normal migration loop.
CREATE TABLE IF NOT EXISTS schema_migrations (
version INTEGER PRIMARY KEY,
applied_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
INSERT OR IGNORE INTO schema_migrations (version) VALUES (0);

View File

@@ -1,6 +1,5 @@
-- Vaultik Database Schema
-- Note: This database does not support migrations. If the schema changes,
-- delete the local database and perform a full backup to recreate it.
-- Migration 001: Initial Vaultik schema
-- All core tables for tracking files, chunks, blobs, snapshots, and uploads.
-- Files table: stores metadata about files in the filesystem
CREATE TABLE IF NOT EXISTS files (
@@ -133,4 +132,4 @@ CREATE TABLE IF NOT EXISTS uploads (
);
-- Index for efficient snapshot lookups
CREATE INDEX IF NOT EXISTS idx_uploads_snapshot_id ON uploads(snapshot_id);
CREATE INDEX IF NOT EXISTS idx_uploads_snapshot_id ON uploads(snapshot_id);

View File

@@ -1,11 +0,0 @@
-- Track blob upload metrics
CREATE TABLE IF NOT EXISTS uploads (
blob_hash TEXT PRIMARY KEY,
uploaded_at TIMESTAMP NOT NULL,
size INTEGER NOT NULL,
duration_ms INTEGER NOT NULL,
FOREIGN KEY (blob_hash) REFERENCES blobs(blob_hash)
);
CREATE INDEX idx_uploads_uploaded_at ON uploads(uploaded_at);
CREATE INDEX idx_uploads_duration ON uploads(duration_ms);

View File

@@ -6,7 +6,7 @@ import (
"fmt"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
type SnapshotRepository struct {
@@ -331,6 +331,43 @@ func (r *SnapshotRepository) AddFilesByIDBatch(ctx context.Context, tx *sql.Tx,
return nil
}
// PopulateReferencedBlobs ensures snapshot_blobs contains an entry for
// every blob that holds a chunk referenced by any file in the snapshot.
// This is necessary because the AddBlob hook only runs when a blob is
// newly uploaded during a snapshot — fully-deduplicated snapshots (where
// every chunk already exists in storage from a prior run) would otherwise
// have an empty snapshot_blobs set and be impossible to restore.
//
// Returns the number of rows inserted (i.e. blobs that were previously
// referenced indirectly via file_chunks but not yet recorded in
// snapshot_blobs for this snapshot).
func (r *SnapshotRepository) PopulateReferencedBlobs(ctx context.Context, tx *sql.Tx, snapshotID string) (int64, error) {
query := `
INSERT OR IGNORE INTO snapshot_blobs (snapshot_id, blob_id, blob_hash)
SELECT DISTINCT ?, blobs.id, blobs.blob_hash
FROM blobs
JOIN blob_chunks ON blob_chunks.blob_id = blobs.id
JOIN file_chunks ON file_chunks.chunk_hash = blob_chunks.chunk_hash
JOIN snapshot_files ON snapshot_files.file_id = file_chunks.file_id
WHERE snapshot_files.snapshot_id = ?
AND blobs.blob_hash IS NOT NULL
`
var result sql.Result
var err error
if tx != nil {
result, err = tx.ExecContext(ctx, query, snapshotID, snapshotID)
} else {
result, err = r.db.ExecWithLog(ctx, query, snapshotID, snapshotID)
}
if err != nil {
return 0, fmt.Errorf("populating referenced blobs: %w", err)
}
n, _ := result.RowsAffected()
return n, nil
}
// AddBlob adds a blob to a snapshot
func (r *SnapshotRepository) AddBlob(ctx context.Context, tx *sql.Tx, snapshotID string, blobID types.BlobID, blobHash types.BlobHash) error {
query := `
@@ -397,6 +434,65 @@ func (r *SnapshotRepository) GetSnapshotTotalCompressedSize(ctx context.Context,
return totalSize, nil
}
// GetSnapshotUncompressedChunkSize returns the sum of plaintext sizes of all unique
// chunks referenced by a snapshot (via snapshot_files → file_chunks → chunks).
func (r *SnapshotRepository) GetSnapshotUncompressedChunkSize(ctx context.Context, snapshotID string) (int64, error) {
query := `
SELECT COALESCE(SUM(c.size), 0)
FROM (
SELECT DISTINCT fc.chunk_hash
FROM snapshot_files sf
JOIN file_chunks fc ON sf.file_id = fc.file_id
WHERE sf.snapshot_id = ?
) sc
JOIN chunks c ON sc.chunk_hash = c.chunk_hash
`
var totalSize int64
err := r.db.conn.QueryRowContext(ctx, query, snapshotID).Scan(&totalSize)
if err != nil {
return 0, fmt.Errorf("querying uncompressed chunk size: %w", err)
}
return totalSize, nil
}
// GetSnapshotNewChunkSize returns the sum of plaintext sizes of chunks that are
// referenced by this snapshot but not by any earlier completed snapshot known to
// the local database. The result is the marginal uncompressed data this snapshot
// added to the dedup pool — i.e., the delta from prior snapshots.
func (r *SnapshotRepository) GetSnapshotNewChunkSize(ctx context.Context, snapshotID string) (int64, error) {
query := `
WITH this_snap_chunks AS (
SELECT DISTINCT fc.chunk_hash
FROM snapshot_files sf
JOIN file_chunks fc ON sf.file_id = fc.file_id
WHERE sf.snapshot_id = ?
),
prior_chunks AS (
SELECT DISTINCT fc.chunk_hash
FROM snapshots s
JOIN snapshot_files sf ON sf.snapshot_id = s.id
JOIN file_chunks fc ON fc.file_id = sf.file_id
WHERE s.completed_at IS NOT NULL
AND s.id != ?
AND s.started_at < (SELECT started_at FROM snapshots WHERE id = ?)
)
SELECT COALESCE(SUM(c.size), 0)
FROM chunks c
JOIN this_snap_chunks t ON c.chunk_hash = t.chunk_hash
WHERE c.chunk_hash NOT IN (SELECT chunk_hash FROM prior_chunks)
`
var totalSize int64
err := r.db.conn.QueryRowContext(ctx, query, snapshotID, snapshotID, snapshotID).Scan(&totalSize)
if err != nil {
return 0, fmt.Errorf("querying new chunk size: %w", err)
}
return totalSize, nil
}
// GetIncompleteSnapshots returns all snapshots that haven't been completed
func (r *SnapshotRepository) GetIncompleteSnapshots(ctx context.Context) ([]*Snapshot, error) {
query := `

View File

@@ -7,7 +7,7 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
const (

View File

@@ -5,7 +5,7 @@ import (
"database/sql"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/log"
)
// Upload represents a blob upload record

View File

@@ -13,19 +13,42 @@ var Version string = "dev"
// Commit is the git commit hash, populated from main().
var Commit string = "unknown"
// CommitDate is the ISO-8601 date of the commit, populated from main().
var CommitDate string = "unknown"
// Author identifies the upstream author of vaultik.
const Author = "Jeffrey Paul <sneak@sneak.berlin>"
// Homepage is the canonical URL for vaultik.
const Homepage = "https://sneak.berlin/go/vaultik"
// License is the SPDX identifier for the project license.
const License = "MIT"
// Globals contains application-wide configuration and metadata.
type Globals struct {
Appname string
Version string
Commit string
StartTime time.Time
Appname string
Version string
Commit string
CommitDate string
StartTime time.Time
}
// New creates and returns a new Globals instance initialized with the package-level variables.
func New() (*Globals, error) {
return &Globals{
Appname: Appname,
Version: Version,
Commit: Commit,
Appname: Appname,
Version: Version,
Commit: Commit,
CommitDate: CommitDate,
}, nil
}
// ShortCommit returns the first 12 chars of the commit hash, or the
// whole string if it's shorter (e.g. "unknown").
func (g *Globals) ShortCommit() string {
if len(g.Commit) > 12 {
return g.Commit[:12]
}
return g.Commit
}

View File

@@ -63,10 +63,3 @@ type Chunk struct {
Offset int64
Length int64
}
// DirtyPath represents a path marked for backup by inotify
type DirtyPath struct {
Path string
MarkedAt time.Time
EventType string // "create", "modify", "delete"
}

View File

@@ -2,6 +2,7 @@ package s3
import (
"context"
"errors"
"io"
"sync/atomic"
@@ -10,6 +11,7 @@ import (
"github.com/aws/aws-sdk-go-v2/credentials"
"github.com/aws/aws-sdk-go-v2/feature/s3/manager"
"github.com/aws/aws-sdk-go-v2/service/s3"
s3types "github.com/aws/aws-sdk-go-v2/service/s3/types"
"github.com/aws/smithy-go/logging"
)
@@ -203,9 +205,12 @@ func (c *Client) HeadObject(ctx context.Context, key string) (bool, error) {
Key: aws.String(fullKey),
})
if err != nil {
// Check if it's a not found error
// TODO: Add proper error type checking
return false, nil
var notFound *s3types.NotFound
var noSuchKey *s3types.NoSuchKey
if errors.As(err, &notFound) || errors.As(err, &noSuchKey) {
return false, nil
}
return false, err
}
return true, nil
}

View File

@@ -6,7 +6,7 @@ import (
"io"
"testing"
"git.eeqj.de/sneak/vaultik/internal/s3"
"sneak.berlin/go/vaultik/internal/s3"
)
func TestClient(t *testing.T) {

View File

@@ -3,8 +3,8 @@ package s3
import (
"context"
"git.eeqj.de/sneak/vaultik/internal/config"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/config"
)
// Module exports S3 functionality as an fx module.

View File

@@ -13,8 +13,8 @@ import (
"testing/fstest"
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/types"
)
// MockS3Client is a mock implementation of S3 operations for testing

View File

@@ -7,12 +7,12 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
"github.com/stretchr/testify/require"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/snapshot"
"sneak.berlin/go/vaultik/internal/types"
)
func setupExcludeTestFS(t *testing.T) afero.Fs {

View File

@@ -6,13 +6,13 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/snapshot"
"sneak.berlin/go/vaultik/internal/types"
)
// TestFileContentChange verifies that when a file's content changes,

View File

@@ -1,16 +1,18 @@
package snapshot
import (
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/storage"
"github.com/spf13/afero"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/config"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/storage"
"sneak.berlin/go/vaultik/internal/ui"
)
// ScannerParams holds parameters for scanner creation
type ScannerParams struct {
EnableProgress bool
UI *ui.Writer // Where user-facing scanner messages go; nil = discard
Fs afero.Fs
Exclude []string // Exclude patterns (combined global + snapshot-specific)
SkipErrors bool // Skip file read errors (log loudly but continue)
@@ -46,6 +48,7 @@ func provideScannerFactory(cfg *config.Config, repos *database.Repositories, sto
CompressionLevel: cfg.CompressionLevel,
AgeRecipients: cfg.AgeRecipients,
EnableProgress: params.EnableProgress,
UI: params.UI,
Exclude: excludes,
SkipErrors: params.SkipErrors,
})

View File

@@ -0,0 +1,42 @@
package snapshot
import (
"errors"
"fmt"
"os"
"runtime"
"strings"
"testing"
)
func TestWrapPermissionError(t *testing.T) {
// Non-permission errors pass through unchanged.
plain := errors.New("disk on fire")
if got := wrapPermissionError("/some/path", plain); got != plain {
t.Errorf("non-permission error should pass through, got %v", got)
}
// Permission errors get remediation instructions.
permErr := fmt.Errorf("open /x: %w", os.ErrPermission)
wrapped := wrapPermissionError("/Users/u/Library/Calendars", permErr)
if !errors.Is(wrapped, os.ErrPermission) {
t.Error("wrapped error should still match os.ErrPermission")
}
if !strings.Contains(wrapped.Error(), "/Users/u/Library/Calendars") {
t.Error("wrapped error should name the offending path")
}
if runtime.GOOS == "darwin" {
if !strings.Contains(wrapped.Error(), "Full Disk Access") {
t.Errorf("macOS permission error should mention Full Disk Access:\n%s", wrapped.Error())
}
if !strings.Contains(wrapped.Error(), "System Settings") {
t.Errorf("macOS permission error should point at System Settings:\n%s", wrapped.Error())
}
} else {
if !strings.Contains(wrapped.Error(), "--skip-errors") {
t.Errorf("non-macOS permission error should mention --skip-errors:\n%s", wrapped.Error())
}
}
}

View File

@@ -10,8 +10,8 @@ import (
"syscall"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/dustin/go-humanize"
"sneak.berlin/go/vaultik/internal/log"
)
const (

View File

@@ -5,21 +5,24 @@ import (
"database/sql"
"errors"
"fmt"
"io"
"os"
"path/filepath"
"runtime"
"strings"
"sync"
"time"
"git.eeqj.de/sneak/vaultik/internal/blob"
"git.eeqj.de/sneak/vaultik/internal/chunker"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/dustin/go-humanize"
"github.com/gobwas/glob"
"github.com/spf13/afero"
"sneak.berlin/go/vaultik/internal/blob"
"sneak.berlin/go/vaultik/internal/chunker"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/storage"
"sneak.berlin/go/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/ui"
)
// FileToProcess holds information about a file that needs processing
@@ -58,7 +61,8 @@ type Scanner struct {
exclude []string // Glob patterns for files/directories to exclude
compiledExclude []compiledPattern // Compiled glob patterns
progress *ProgressReporter
skipErrors bool // Skip file read errors (log loudly but continue)
skipErrors bool // Skip file read errors (log loudly but continue)
ui *ui.Writer // User-facing output; never nil (defaults to a discarding writer)
// In-memory cache of known chunk hashes for fast existence checks
knownChunks map[string]struct{}
@@ -89,10 +93,11 @@ type ScannerConfig struct {
Storage storage.Storer
MaxBlobSize int64
CompressionLevel int
AgeRecipients []string // Optional, empty means no encryption
EnableProgress bool // Enable progress reporting
Exclude []string // Glob patterns for files/directories to exclude
SkipErrors bool // Skip file read errors (log loudly but continue)
AgeRecipients []string // Optional, empty means no encryption
EnableProgress bool // Enable the live progress reporter (ETAs, throughput)
UI *ui.Writer // Where user-facing scanner messages go; nil = discard
Exclude []string // Glob patterns for files/directories to exclude
SkipErrors bool // Skip file read errors (log loudly but continue)
}
// ScanResult contains the results of a scan operation
@@ -139,6 +144,11 @@ func NewScanner(cfg ScannerConfig) *Scanner {
// Compile exclude patterns
compiledExclude := compileExcludePatterns(cfg.Exclude)
uiw := cfg.UI
if uiw == nil {
uiw = ui.NewWithColor(io.Discard, false)
}
return &Scanner{
fs: cfg.FS,
chunker: chunker.NewChunker(cfg.ChunkSize),
@@ -152,6 +162,7 @@ func NewScanner(cfg ScannerConfig) *Scanner {
compiledExclude: compiledExclude,
progress: progress,
skipErrors: cfg.SkipErrors,
ui: uiw,
pendingChunkHashes: make(map[string]struct{}),
}
}
@@ -202,7 +213,7 @@ func (s *Scanner) Scan(ctx context.Context, path string, snapshotID string) (*Sc
// Phase 1c: Associate unchanged files with this snapshot (no new records needed)
if len(scanResult.UnchangedFileIDs) > 0 {
fmt.Printf("Associating %s unchanged files with snapshot...\n", formatNumber(len(scanResult.UnchangedFileIDs)))
s.ui.Begin("Associating %s unchanged files with the snapshot.", s.ui.Count(len(scanResult.UnchangedFileIDs)))
if err := s.batchAddFilesToSnapshot(ctx, scanResult.UnchangedFileIDs); err != nil {
return nil, fmt.Errorf("associating unchanged files: %w", err)
}
@@ -213,13 +224,13 @@ func (s *Scanner) Scan(ctx context.Context, path string, snapshotID string) (*Sc
// Phase 2: Process files and create chunks
if len(filesToProcess) > 0 {
fmt.Printf("Processing %s files...\n", formatNumber(len(filesToProcess)))
s.ui.Begin("Backing up %s snapshot source files (chunking, compressing, encrypting, uploading).", s.ui.Count(len(filesToProcess)))
log.Info("Phase 2/3: Creating snapshot (chunking, compressing, encrypting, and uploading blobs)")
if err := s.processPhase(ctx, filesToProcess, result); err != nil {
return nil, fmt.Errorf("process phase failed: %w", err)
}
} else {
fmt.Printf("No files need processing. Creating metadata-only snapshot.\n")
s.ui.Info("Snapshot file backup skipped: no changed files (creating metadata-only snapshot).")
log.Info("Phase 2/3: Skipping (no files need processing, metadata-only snapshot)")
}
@@ -232,18 +243,18 @@ func (s *Scanner) Scan(ctx context.Context, path string, snapshotID string) (*Sc
// loadDatabaseState loads known files and chunks from the database into memory for fast lookup
// This avoids per-file and per-chunk database queries during the scan and process phases
func (s *Scanner) loadDatabaseState(ctx context.Context, path string) (map[string]*database.File, error) {
fmt.Println("Loading known files from database...")
s.ui.Begin("Loading known files from local index database.")
knownFiles, err := s.loadKnownFiles(ctx, path)
if err != nil {
return nil, fmt.Errorf("loading known files: %w", err)
}
fmt.Printf("Loaded %s known files from database\n", formatNumber(len(knownFiles)))
s.ui.Complete("Loaded %s known files from local index database.", s.ui.Count(len(knownFiles)))
fmt.Println("Loading known chunks from database...")
s.ui.Begin("Loading known chunks from local index database.")
if err := s.loadKnownChunks(ctx); err != nil {
return nil, fmt.Errorf("loading known chunks: %w", err)
}
fmt.Printf("Loaded %s known chunks from database\n", formatNumber(len(s.knownChunks)))
s.ui.Complete("Loaded %s known chunks from local index database.", s.ui.Count(len(s.knownChunks)))
return knownFiles, nil
}
@@ -267,17 +278,17 @@ func (s *Scanner) summarizeScanPhase(result *ScanResult, filesToProcess []*FileT
"files_skipped", result.FilesSkipped,
"bytes_skipped", humanize.Bytes(uint64(result.BytesSkipped)))
fmt.Printf("Scan complete: %s examined (%s), %s to process (%s)",
formatNumber(result.FilesScanned),
humanize.Bytes(uint64(totalSizeToProcess+result.BytesSkipped)),
formatNumber(len(filesToProcess)),
humanize.Bytes(uint64(totalSizeToProcess)))
msg := fmt.Sprintf("Enumerated %s snapshot source files (%s total), %s to back up (%s)",
s.ui.Count(result.FilesScanned),
s.ui.Size(totalSizeToProcess+result.BytesSkipped),
s.ui.Count(len(filesToProcess)),
s.ui.Size(totalSizeToProcess))
if result.FilesDeleted > 0 {
fmt.Printf(", %s deleted (%s)",
formatNumber(result.FilesDeleted),
humanize.Bytes(uint64(result.BytesDeleted)))
msg += fmt.Sprintf(", %s deleted (%s)",
s.ui.Count(result.FilesDeleted),
s.ui.Size(result.BytesDeleted))
}
fmt.Println()
s.ui.Complete("%s.", msg)
}
// finalizeScanResult populates final blob statistics in the scan result
@@ -618,12 +629,12 @@ func (s *Scanner) scanPhase(ctx context.Context, path string, result *ScanResult
err := afero.Walk(s.fs, path, func(filePath string, info os.FileInfo, err error) error {
if err != nil {
if s.skipErrors {
log.Error("ERROR: Failed to access file (skipping due to --skip-errors)", "path", filePath, "error", err)
fmt.Printf("ERROR: Failed to access %s: %v (skipping)\n", filePath, err)
log.Error("Failed to access file (skipping due to --skip-errors)", "path", filePath, "error", err)
s.ui.Error("Failed to access %s: %v. Skipping (--skip-errors).", s.ui.Path(filePath), err)
return nil // Continue scanning
}
log.Debug("Error accessing filesystem entry", "path", filePath, "error", err)
return err
return wrapPermissionError(filePath, err)
}
// Check context cancellation
@@ -641,7 +652,40 @@ func (s *Scanner) scanPhase(ctx context.Context, path string, result *ScanResult
return nil
}
// Skip non-regular files for processing (but still count them)
// Handle symlinks
if info.Mode()&os.ModeSymlink != 0 {
file := s.buildSymlinkEntry(filePath, info)
if file != nil {
existingFiles[filePath] = struct{}{}
mu.Lock()
filesToProcess = append(filesToProcess, &FileToProcess{
Path: filePath,
FileInfo: info,
File: file,
})
filesScanned++
mu.Unlock()
s.updateScanEntryStats(result, true, info)
}
return nil
}
// Handle directories (record for permission/ownership preservation and empty-dir support)
if info.IsDir() {
file := s.buildDirectoryEntry(filePath, info)
existingFiles[filePath] = struct{}{}
mu.Lock()
filesToProcess = append(filesToProcess, &FileToProcess{
Path: filePath,
FileInfo: info,
File: file,
})
filesScanned++
mu.Unlock()
return nil
}
// Skip other non-regular files (devices, sockets, etc.)
if !info.Mode().IsRegular() {
return nil
}
@@ -673,7 +717,7 @@ func (s *Scanner) scanPhase(ctx context.Context, path string, result *ScanResult
// Output periodic status
if time.Since(lastStatusTime) >= statusInterval {
printScanProgressLine(filesScanned, changedCount, estimatedTotal, startTime)
s.printScanProgressLine(filesScanned, changedCount, estimatedTotal, startTime)
lastStatusTime = time.Now()
}
@@ -714,7 +758,7 @@ func (s *Scanner) updateScanEntryStats(result *ScanResult, needsProcessing bool,
// printScanProgressLine prints a periodic progress line during the scan phase,
// showing files scanned, percentage complete (if estimate available), and ETA
func printScanProgressLine(filesScanned int64, changedCount int, estimatedTotal int64, startTime time.Time) {
func (s *Scanner) printScanProgressLine(filesScanned int64, changedCount int, estimatedTotal int64, startTime time.Time) {
elapsed := time.Since(startTime)
rate := float64(filesScanned) / elapsed.Seconds()
@@ -732,26 +776,97 @@ func printScanProgressLine(filesScanned int64, changedCount int, estimatedTotal
if rate > 0 && remaining > 0 {
eta = time.Duration(float64(remaining)/rate) * time.Second
}
fmt.Printf("Scan: %s files (~%.0f%%), %s changed/new, %.0f files/sec, %s elapsed",
formatNumber(int(filesScanned)),
pct,
formatNumber(changedCount),
rate,
elapsed.Round(time.Second))
if eta > 0 {
fmt.Printf(", ETA %s", eta.Round(time.Second))
s.ui.Progress("Snapshot source files enumeration: %s files (~%s), %s changed or new, %.0f files/sec, enumeration elapsed: %s, enumeration ETA: %s (est remain %s).",
s.ui.Count(int(filesScanned)),
s.ui.Percent(pct),
s.ui.Count(changedCount),
rate,
s.ui.Duration(elapsed),
s.ui.Time(time.Now().Add(eta)),
s.ui.Duration(eta))
} else {
s.ui.Progress("Snapshot source files enumeration: %s files (~%s), %s changed or new, %.0f files/sec, enumeration elapsed: %s.",
s.ui.Count(int(filesScanned)),
s.ui.Percent(pct),
s.ui.Count(changedCount),
rate,
s.ui.Duration(elapsed))
}
fmt.Println()
} else {
// First backup - no estimate available
fmt.Printf("Scan: %s files, %s changed/new, %.0f files/sec, %s elapsed\n",
formatNumber(int(filesScanned)),
formatNumber(changedCount),
s.ui.Progress("Snapshot source files enumeration: %s files seen, %s changed or new, %.0f files/sec, enumeration elapsed: %s.",
s.ui.Count(int(filesScanned)),
s.ui.Count(changedCount),
rate,
elapsed.Round(time.Second))
s.ui.Duration(elapsed))
}
}
// buildSymlinkEntry creates a File record for a symlink.
// Returns nil if the link target cannot be read.
func (s *Scanner) buildSymlinkEntry(path string, info os.FileInfo) *database.File {
target, err := os.Readlink(path)
if err != nil {
log.Debug("Cannot read symlink target", "path", path, "error", err)
return nil
}
var uid, gid uint32
if stat, ok := info.Sys().(interface {
Uid() uint32
Gid() uint32
}); ok {
uid = stat.Uid()
gid = stat.Gid()
}
return &database.File{
ID: types.NewFileID(),
Path: types.FilePath(path),
SourcePath: types.SourcePath(s.currentSourcePath),
MTime: info.ModTime(),
Size: 0,
Mode: uint32(info.Mode()),
UID: uid,
GID: gid,
LinkTarget: types.FilePath(target),
}
}
// buildDirectoryEntry creates a File record for a directory.
func (s *Scanner) buildDirectoryEntry(path string, info os.FileInfo) *database.File {
var uid, gid uint32
if stat, ok := info.Sys().(interface {
Uid() uint32
Gid() uint32
}); ok {
uid = stat.Uid()
gid = stat.Gid()
}
return &database.File{
ID: types.NewFileID(),
Path: types.FilePath(path),
SourcePath: types.SourcePath(s.currentSourcePath),
MTime: info.ModTime(),
Size: 0,
Mode: uint32(info.Mode()),
UID: uid,
GID: gid,
}
}
// recordNonRegularFile writes a symlink or directory entry to the database
// and associates it with the current snapshot. No chunking is performed.
func (s *Scanner) recordNonRegularFile(ctx context.Context, ftp *FileToProcess) error {
return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
if err := s.repos.Files.Create(txCtx, tx, ftp.File); err != nil {
return fmt.Errorf("creating non-regular file record: %w", err)
}
return s.repos.Snapshots.AddFileByID(txCtx, tx, s.snapshotID, ftp.File.ID)
})
}
// checkFileInMemory checks if a file needs processing using the in-memory map
// No database access is performed - this is purely CPU/memory work
func (s *Scanner) checkFileInMemory(path string, info os.FileInfo, knownFiles map[string]*database.File) (*database.File, bool) {
@@ -849,16 +964,16 @@ func (s *Scanner) batchAddFilesToSnapshot(ctx context.Context, fileIDs []types.F
elapsed := time.Since(startTime)
rate := float64(end) / elapsed.Seconds()
pct := float64(end) / float64(len(fileIDs)) * 100
fmt.Printf("Associating files: %s/%s (%.1f%%), %.0f files/sec\n",
formatNumber(end), formatNumber(len(fileIDs)), pct, rate)
s.ui.Progress("Snapshot unchanged-file association: %s/%s (%s), %.0f files/sec.",
s.ui.Count(end), s.ui.Count(len(fileIDs)), s.ui.Percent(pct), rate)
lastStatusTime = time.Now()
}
}
elapsed := time.Since(startTime)
rate := float64(len(fileIDs)) / elapsed.Seconds()
fmt.Printf("Associated %s unchanged files in %s (%.0f files/sec)\n",
formatNumber(len(fileIDs)), elapsed.Round(time.Second), rate)
s.ui.Complete("Associated %s unchanged files with the snapshot in %s (%.0f files/sec).",
s.ui.Count(len(fileIDs)), s.ui.Duration(elapsed), rate)
return nil
}
@@ -905,7 +1020,7 @@ func (s *Scanner) processPhase(ctx context.Context, filesToProcess []*FileToProc
// Output periodic status
if time.Since(lastStatusTime) >= statusInterval {
printProcessingProgress(filesProcessed, totalFiles, bytesProcessed, totalBytes, startTime)
s.printProcessingProgress(filesProcessed, totalFiles, bytesProcessed, totalBytes, startTime)
lastStatusTime = time.Now()
}
}
@@ -926,8 +1041,8 @@ func (s *Scanner) processFileWithErrorHandling(ctx context.Context, fileToProces
}
// Skip file read errors if --skip-errors is enabled
if s.skipErrors {
log.Error("ERROR: Failed to process file (skipping due to --skip-errors)", "path", fileToProcess.Path, "error", err)
fmt.Printf("ERROR: Failed to process %s: %v (skipping)\n", fileToProcess.Path, err)
log.Error("Failed to process file (skipping due to --skip-errors)", "path", fileToProcess.Path, "error", err)
s.ui.Error("Failed to process %s: %v. Skipping (--skip-errors).", s.ui.Path(fileToProcess.Path), err)
result.FilesSkipped++
return true, nil
}
@@ -938,7 +1053,7 @@ func (s *Scanner) processFileWithErrorHandling(ctx context.Context, fileToProces
// printProcessingProgress prints a periodic progress line during the process phase,
// showing files processed, bytes transferred, throughput, and ETA
func printProcessingProgress(filesProcessed, totalFiles int, bytesProcessed, totalBytes int64, startTime time.Time) {
func (s *Scanner) printProcessingProgress(filesProcessed, totalFiles int, bytesProcessed, totalBytes int64, startTime time.Time) {
elapsed := time.Since(startTime)
pct := float64(bytesProcessed) / float64(totalBytes) * 100
byteRate := float64(bytesProcessed) / elapsed.Seconds()
@@ -951,20 +1066,29 @@ func printProcessingProgress(filesProcessed, totalFiles int, bytesProcessed, tot
eta = time.Duration(float64(remainingBytes)/byteRate) * time.Second
}
// Format: Progress [5.7k/610k] 6.7 GB/44 GB (15.4%), 106MB/sec, 500 files/sec, running for 1m30s, ETA: 5m49s
fmt.Printf("Progress [%s/%s] %s/%s (%.1f%%), %s/sec, %.0f files/sec, running for %s",
formatCompact(filesProcessed),
formatCompact(totalFiles),
humanize.Bytes(uint64(bytesProcessed)),
humanize.Bytes(uint64(totalBytes)),
pct,
humanize.Bytes(uint64(byteRate)),
fileRate,
elapsed.Round(time.Second))
if eta > 0 {
fmt.Printf(", ETA: %s", eta.Round(time.Second))
s.ui.Progress("Snapshot backup: %s/%s files (%s), %s/%s, %s, %.0f files/sec, backup elapsed: %s, backup ETA: %s (est remain %s).",
s.ui.Count(filesProcessed),
s.ui.Count(totalFiles),
s.ui.Percent(pct),
s.ui.Size(bytesProcessed),
s.ui.Size(totalBytes),
s.ui.Speed(byteRate),
fileRate,
s.ui.Duration(elapsed),
s.ui.Time(time.Now().Add(eta)),
s.ui.Duration(eta))
} else {
s.ui.Progress("Snapshot backup: %s/%s files (%s), %s/%s, %s, %.0f files/sec, backup elapsed: %s.",
s.ui.Count(filesProcessed),
s.ui.Count(totalFiles),
s.ui.Percent(pct),
s.ui.Size(bytesProcessed),
s.ui.Size(totalBytes),
s.ui.Speed(byteRate),
fileRate,
s.ui.Duration(elapsed))
}
fmt.Println()
}
// finalizeProcessPhase flushes the packer, writes remaining pending files to the database,
@@ -1056,12 +1180,15 @@ func (s *Scanner) uploadBlobIfNeeded(ctx context.Context, blobPath string, blobW
if _, err := s.storage.Stat(ctx, blobPath); err == nil {
log.Info("Blob already exists in storage, skipping upload",
"hash", finishedBlob.Hash, "size", humanize.Bytes(uint64(finishedBlob.Compressed)))
fmt.Printf("Blob exists: %s (%s, skipped upload)\n",
finishedBlob.Hash[:12]+"...", humanize.Bytes(uint64(finishedBlob.Compressed)))
s.ui.Info("Blob %s (%s) already exists in backup destination store. Skipping upload.",
s.ui.Hex(finishedBlob.Hash), s.ui.Size(finishedBlob.Compressed))
return true, nil
}
progressCallback := s.makeUploadProgressCallback(ctx, finishedBlob)
s.ui.Begin("Uploading blob %s (%s) to backup destination store.",
s.ui.Hex(finishedBlob.Hash), s.ui.Size(finishedBlob.Compressed))
progressCallback := s.makeUploadProgressCallback(ctx, finishedBlob, startTime)
if err := s.storage.PutWithProgress(ctx, blobPath, blobWithReader.Reader, finishedBlob.Compressed, progressCallback); err != nil {
log.Error("Failed to upload blob", "hash", finishedBlob.Hash, "error", err)
@@ -1071,11 +1198,11 @@ func (s *Scanner) uploadBlobIfNeeded(ctx context.Context, blobPath string, blobW
uploadDuration := time.Since(startTime)
uploadSpeedBps := float64(finishedBlob.Compressed) / uploadDuration.Seconds()
fmt.Printf("Blob stored: %s (%s, %s/sec, %s)\n",
finishedBlob.Hash[:12]+"...",
humanize.Bytes(uint64(finishedBlob.Compressed)),
humanize.Bytes(uint64(uploadSpeedBps)),
uploadDuration.Round(time.Millisecond))
s.ui.Complete("Uploaded blob %s (%s) in %s at %s.",
s.ui.Hex(finishedBlob.Hash),
s.ui.Size(finishedBlob.Compressed),
s.ui.Duration(uploadDuration),
s.ui.Speed(uploadSpeedBps))
log.Info("Successfully uploaded blob to storage",
"path", blobPath,
@@ -1093,10 +1220,14 @@ func (s *Scanner) uploadBlobIfNeeded(ctx context.Context, blobPath string, blobW
return false, nil
}
// makeUploadProgressCallback creates a progress callback for blob uploads
func (s *Scanner) makeUploadProgressCallback(ctx context.Context, finishedBlob *blob.FinishedBlob) func(int64) error {
// makeUploadProgressCallback creates a progress callback for blob uploads.
// It updates the live progress reporter ~twice/sec for ETAs and prints a
// human-readable status line to s.output at most every 15 seconds.
func (s *Scanner) makeUploadProgressCallback(ctx context.Context, finishedBlob *blob.FinishedBlob, uploadStart time.Time) func(int64) error {
lastProgressTime := time.Now()
lastProgressBytes := int64(0)
lastStdoutTime := time.Now()
const stdoutInterval = 15 * time.Second
return func(uploaded int64) error {
now := time.Now()
@@ -1110,6 +1241,28 @@ func (s *Scanner) makeUploadProgressCallback(ctx context.Context, finishedBlob *
lastProgressTime = now
lastProgressBytes = uploaded
}
// Periodic stdout status line so the user knows the upload is alive.
if now.Sub(lastStdoutTime) >= stdoutInterval {
totalElapsed := now.Sub(uploadStart)
pct := float64(uploaded) / float64(finishedBlob.Compressed) * 100
avgSpeed := float64(uploaded) / totalElapsed.Seconds()
var eta time.Duration
if avgSpeed > 0 {
eta = time.Duration(float64(finishedBlob.Compressed-uploaded)/avgSpeed) * time.Second
}
s.ui.Progress("Blob upload %s: %s / %s (%s) at %s, blob upload elapsed: %s, blob upload ETA: %s (est remain %s).",
s.ui.Hex(finishedBlob.Hash),
s.ui.Size(uploaded),
s.ui.Size(finishedBlob.Compressed),
s.ui.Percent(pct),
s.ui.Speed(avgSpeed),
s.ui.Duration(totalElapsed),
s.ui.Time(now.Add(eta)),
s.ui.Duration(eta))
lastStdoutTime = now
}
select {
case <-ctx.Done():
return ctx.Err()
@@ -1176,9 +1329,15 @@ type streamingChunkInfo struct {
// processFileStreaming processes a file by streaming chunks directly to the packer
func (s *Scanner) processFileStreaming(ctx context.Context, fileToProcess *FileToProcess, result *ScanResult) error {
// Symlinks and directories have no data to chunk — just record them in the DB.
mode := os.FileMode(fileToProcess.File.Mode)
if mode&os.ModeSymlink != 0 || mode.IsDir() {
return s.recordNonRegularFile(ctx, fileToProcess)
}
file, err := s.fs.Open(fileToProcess.Path)
if err != nil {
return fmt.Errorf("opening file: %w", err)
return fmt.Errorf("opening file: %w", wrapPermissionError(fileToProcess.Path, err))
}
defer func() { _ = file.Close() }()
@@ -1329,12 +1488,30 @@ func (s *Scanner) detectDeletedFilesFromMap(ctx context.Context, knownFiles map[
}
if result.FilesDeleted > 0 {
fmt.Printf("Found %s deleted files\n", formatNumber(result.FilesDeleted))
s.ui.Info("Snapshot source files enumeration detected %s deleted files.", s.ui.Count(result.FilesDeleted))
}
return nil
}
// wrapPermissionError augments permission errors with platform-specific
// remediation instructions. On macOS, TCC-protected directories (Calendars,
// Reminders, Photos, etc.) return EPERM unless the running application has
// been granted Full Disk Access.
func wrapPermissionError(path string, err error) error {
if !errors.Is(err, os.ErrPermission) {
return err
}
if runtime.GOOS == "darwin" {
return fmt.Errorf("cannot read %s: %w\n\n"+
"macOS is blocking access to this path. Grant Full Disk Access to your\n"+
"terminal application (or the app running vaultik):\n\n"+
" System Settings → Privacy & Security → Full Disk Access\n\n"+
"then quit and reopen the terminal and re-run the backup", path, err)
}
return fmt.Errorf("cannot read %s: %w (check file permissions, or run with --skip-errors to continue past unreadable files)", path, err)
}
// compileExcludePatterns compiles the exclude patterns into glob matchers
func compileExcludePatterns(patterns []string) []compiledPattern {
var compiled []compiledPattern
@@ -1433,25 +1610,3 @@ func (s *Scanner) shouldExclude(filePath, rootPath string) bool {
return false
}
// formatNumber formats a number with comma separators
func formatNumber(n int) string {
if n < 1000 {
return fmt.Sprintf("%d", n)
}
return humanize.Comma(int64(n))
}
// formatCompact formats a number compactly with k/M suffixes (e.g., 5.7k, 1.2M)
func formatCompact(n int) string {
if n < 1000 {
return fmt.Sprintf("%d", n)
}
if n < 10000 {
return fmt.Sprintf("%.1fk", float64(n)/1000)
}
if n < 1000000 {
return fmt.Sprintf("%.0fk", float64(n)/1000)
}
return fmt.Sprintf("%.1fM", float64(n)/1000000)
}

View File

@@ -7,11 +7,11 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/spf13/afero"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/snapshot"
"sneak.berlin/go/vaultik/internal/types"
)
func TestScannerSimpleDirectory(t *testing.T) {
@@ -110,15 +110,15 @@ func TestScannerSimpleDirectory(t *testing.T) {
t.Errorf("expected at least 97 bytes scanned, got %d", result.BytesScanned)
}
// Verify files in database - only regular files are stored
// Verify files in database - includes regular files and directories
files, err := repos.Files.ListByPrefix(ctx, "/source")
if err != nil {
t.Fatalf("failed to list files: %v", err)
}
// We should have 6 files (directories are not stored)
if len(files) != 6 {
t.Errorf("expected 6 files in database, got %d", len(files))
// 6 regular files + 3 directories (/source, /source/subdir, /source/subdir2)
if len(files) != 9 {
t.Errorf("expected 9 entries in database (6 files + 3 dirs), got %d", len(files))
}
// Verify specific file

View File

@@ -44,15 +44,15 @@ import (
"strings"
"time"
"git.eeqj.de/sneak/vaultik/internal/blobgen"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/dustin/go-humanize"
"github.com/spf13/afero"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/blobgen"
"sneak.berlin/go/vaultik/internal/config"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/storage"
"sneak.berlin/go/vaultik/internal/types"
)
// SnapshotManager handles snapshot creation and metadata export
@@ -180,10 +180,20 @@ func (sm *SnapshotManager) UpdateSnapshotStatsExtended(ctx context.Context, snap
})
}
// CompleteSnapshot marks a snapshot as completed and exports its metadata
// CompleteSnapshot marks a snapshot as completed and ensures snapshot_blobs
// is populated with every blob holding any chunk referenced by the
// snapshot's files (including deduplicated blobs uploaded by prior
// snapshots). Without this, fully-deduplicated snapshots are unrestorable.
func (sm *SnapshotManager) CompleteSnapshot(ctx context.Context, snapshotID string) error {
// Mark the snapshot as completed
err := sm.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
added, err := sm.repos.Snapshots.PopulateReferencedBlobs(ctx, tx, snapshotID)
if err != nil {
return err
}
if added > 0 {
log.Info("Populated snapshot_blobs with dedup-referenced blobs",
"snapshot_id", snapshotID, "added", added)
}
return sm.repos.Snapshots.MarkComplete(ctx, tx, snapshotID)
})

View File

@@ -7,10 +7,10 @@ import (
"path/filepath"
"testing"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/spf13/afero"
"sneak.berlin/go/vaultik/internal/config"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
)
const (

View File

@@ -23,9 +23,8 @@ type FileStorer struct {
// Uses the real OS filesystem by default; call SetFilesystem to override for testing.
func NewFileStorer(basePath string) (*FileStorer, error) {
fs := afero.NewOsFs()
// Ensure base path exists
if err := fs.MkdirAll(basePath, 0755); err != nil {
return nil, fmt.Errorf("creating base path: %w", err)
return nil, fmt.Errorf("file:// storage: cannot create or access %s: %w (check that the volume is mounted and writable)", basePath, err)
}
return &FileStorer{
fs: fs,

View File

@@ -5,9 +5,9 @@ import (
"fmt"
"strings"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/s3"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/config"
"sneak.berlin/go/vaultik/internal/s3"
)
// Module exports storage functionality as an fx module.

View File

@@ -5,7 +5,7 @@ import (
"fmt"
"io"
"git.eeqj.de/sneak/vaultik/internal/s3"
"sneak.berlin/go/vaultik/internal/s3"
)
// S3Storer wraps the existing s3.Client to implement Storer.

288
internal/ui/ui.go Normal file
View File

@@ -0,0 +1,288 @@
// Package ui provides consistent user-facing output formatting for vaultik.
// All status updates, banners, errors, and warnings printed to the user
// should go through a *Writer from this package.
//
// Message classes (see Writer methods):
//
// - Begin — operation start, left-aligned, marker "》" (white)
// - Complete— operation completion, left-aligned, marker "》" (green)
// - Info — left-aligned neutral status, marker "》" (white)
// - Notice — left-aligned important note, marker "》" (cyan)
// - Warning — left-aligned warning, full word "Warning: " (orange/yellow)
// - Error — left-aligned error, full word "ERROR: " (red)
// - Progress— indented heartbeat / per-item update, marker " 》" (white)
// - Banner — application banner line, left-aligned, no marker
//
// Value formatters (Hex, Size, Duration, Time, Path, Snapshot, Speed,
// Count, Percent) return ANSI-colored strings the caller composes into
// the message body. When color is disabled (non-TTY output or NO_COLOR
// set) all formatters return plain text.
package ui
import (
"fmt"
"io"
"os"
"time"
"github.com/dustin/go-humanize"
"golang.org/x/term"
)
// ANSI SGR escape sequences.
const (
ansiReset = "\033[0m"
ansiBold = "\033[1m"
ansiRed = "\033[31m"
ansiGreen = "\033[32m"
ansiYellow = "\033[33m" // used for orange "Warning:" and for durations
ansiBlue = "\033[34m"
ansiMagenta = "\033[35m"
ansiCyan = "\033[36m"
ansiWhite = "\033[37m"
)
// Marker is the chevron prefix used for all non-error/warning lines.
const Marker = "》"
// Writer formats and emits user-facing messages with optional ANSI color.
// It also counts warnings and errors emitted so the caller can summarize at
// the end of an operation ("Finished successfully." vs "Finished with
// warnings.").
//
// When Quiet is set, Begin/Complete/Info/Notice/Detail/Progress/Banner
// are silently dropped, but Warning and Error always emit. This honors
// the convention that --quiet "Suppresses non-error output" — warnings
// and errors are by definition not suppressible.
type Writer struct {
out io.Writer
color bool
quiet bool
warnings int
errors int
}
// New returns a Writer that emits to out. Color is enabled when out is a
// TTY and the NO_COLOR environment variable is unset.
// https://no-color.org/
func New(out io.Writer) *Writer {
return &Writer{out: out, color: shouldColor(out)}
}
// NewWithColor returns a Writer with an explicit color setting, ignoring
// TTY detection. Useful for tests and for piped output that the caller
// wants to colorize anyway.
func NewWithColor(out io.Writer, color bool) *Writer {
return &Writer{out: out, color: color}
}
// SetQuiet toggles the writer's quiet mode. In quiet mode all message
// classes are silenced except Warning and Error.
func (w *Writer) SetQuiet(quiet bool) { w.quiet = quiet }
// Quiet reports whether the writer is in quiet mode.
func (w *Writer) Quiet() bool { return w.quiet }
// Out returns the underlying writer.
func (w *Writer) Out() io.Writer { return w.out }
// Color reports whether color is enabled on this writer.
func (w *Writer) Color() bool { return w.color }
// shouldColor returns true when w is a real TTY and NO_COLOR is unset.
func shouldColor(w io.Writer) bool {
if os.Getenv("NO_COLOR") != "" {
return false
}
f, ok := w.(*os.File)
if !ok {
return false
}
return term.IsTerminal(int(f.Fd()))
}
// paint wraps s in the given ANSI color when color is enabled.
func (w *Writer) paint(color, s string) string {
if !w.color {
return s
}
return color + s + ansiReset
}
// ───────────────────────── message methods ─────────────────────────
// Begin prints an operation-start line, left-aligned with a white marker.
func (w *Writer) Begin(format string, args ...any) {
if w.quiet {
return
}
w.emit(ansiWhite, Marker, "", format, args)
}
// Complete prints an operation-completion line in green, left-aligned.
func (w *Writer) Complete(format string, args ...any) {
if w.quiet {
return
}
w.emit(ansiGreen, Marker, ansiGreen, format, args)
}
// Info prints a neutral status line, left-aligned with a white marker.
func (w *Writer) Info(format string, args ...any) {
if w.quiet {
return
}
w.emit(ansiWhite, Marker, "", format, args)
}
// Notice prints an attention-worthy informational line, marker in cyan.
func (w *Writer) Notice(format string, args ...any) {
if w.quiet {
return
}
w.emit(ansiCyan, Marker, "", format, args)
}
// Warning prints "⚠️ Warning: " in orange/yellow followed by the message.
func (w *Writer) Warning(format string, args ...any) {
w.warnings++
prefix := "⚠️ " + w.paint(ansiYellow+ansiBold, "Warning: ")
_, _ = fmt.Fprintln(w.out, prefix+fmt.Sprintf(format, args...))
}
// Error prints "🛑 ERROR: " in red followed by the message. Goes to the
// same writer as everything else; callers that want stderr should
// construct a separate Writer for it.
func (w *Writer) Error(format string, args ...any) {
w.errors++
prefix := "🛑 " + w.paint(ansiRed+ansiBold, "ERROR: ")
_, _ = fmt.Fprintln(w.out, prefix+fmt.Sprintf(format, args...))
}
// Detail prints an indented continuation line under a preceding Complete
// (or other top-level message). Marker " 》" (white) at column 2.
// Distinct from Progress (semantically a "heartbeat") in usage but
// visually identical.
func (w *Writer) Detail(format string, args ...any) {
if w.quiet {
return
}
w.emit(ansiWhite, " "+Marker, "", format, args)
}
// WarningCount returns the number of Warning() calls this writer has emitted.
func (w *Writer) WarningCount() int { return w.warnings }
// ErrorCount returns the number of Error() calls this writer has emitted.
func (w *Writer) ErrorCount() int { return w.errors }
// Progress prints an indented heartbeat / per-item update, marker in white.
func (w *Writer) Progress(format string, args ...any) {
if w.quiet {
return
}
w.emit(ansiWhite, " "+Marker, "", format, args)
}
// Banner prints a line with no marker, left-aligned. Bold when color
// is enabled. Used for the application startup banner only.
func (w *Writer) Banner(format string, args ...any) {
if w.quiet {
return
}
body := fmt.Sprintf(format, args...)
if w.color {
body = ansiBold + body + ansiReset
}
_, _ = fmt.Fprintln(w.out, body)
}
// emit writes "<prefix> <body>\n" with the prefix painted in prefixColor
// and the body optionally painted in bodyColor (empty = no body color).
func (w *Writer) emit(prefixColor, prefix, bodyColor, format string, args []any) {
body := fmt.Sprintf(format, args...)
if bodyColor != "" {
body = w.paint(bodyColor, body)
}
_, _ = fmt.Fprintln(w.out, w.paint(prefixColor, prefix)+" "+body)
}
// ───────────────────────── value formatters ─────────────────────────
//
// These return ANSI-colored strings the caller composes into a message
// body. When color is disabled they return plain text.
// Hex colorizes a hex identifier (blob hash, chunk hash, snapshot id).
// Long hashes are abbreviated to first 12 chars with "...".
func (w *Writer) Hex(s string) string {
short := s
if len(s) > 12 {
short = s[:12] + "..."
}
return w.paint(ansiCyan, short)
}
// Snapshot colorizes a snapshot ID (full, no abbreviation).
func (w *Writer) Snapshot(id string) string {
return w.paint(ansiCyan+ansiBold, id)
}
// Path colorizes a filesystem path.
func (w *Writer) Path(p string) string {
return w.paint(ansiBlue, p)
}
// Size colorizes a byte count using humanize.Bytes.
func (w *Writer) Size(bytes int64) string {
return w.paint(ansiMagenta, humanize.Bytes(uint64(bytes)))
}
// Speed colorizes a network transfer rate. Input is bytes/sec; output is
// bits/sec with an appropriate SI unit (bit/s, Kbit/s, Mbit/s, Gbit/s) —
// network transfer rates are conventionally expressed in bits.
func (w *Writer) Speed(bytesPerSec float64) string {
if bytesPerSec <= 0 {
return w.paint(ansiMagenta, "N/A")
}
bitsPerSec := bytesPerSec * 8
var s string
switch {
case bitsPerSec >= 1e9:
s = fmt.Sprintf("%.1f Gbit/sec", bitsPerSec/1e9)
case bitsPerSec >= 1e6:
s = fmt.Sprintf("%.0f Mbit/sec", bitsPerSec/1e6)
case bitsPerSec >= 1e3:
s = fmt.Sprintf("%.0f Kbit/sec", bitsPerSec/1e3)
default:
s = fmt.Sprintf("%.0f bit/sec", bitsPerSec)
}
return w.paint(ansiMagenta, s)
}
// Duration colorizes a time.Duration rounded to the nearest second.
func (w *Writer) Duration(d time.Duration) string {
return w.paint(ansiYellow, d.Round(time.Second).String())
}
// Time colorizes an absolute clock time. If t falls on today's local
// calendar date the output is "HH:MM:SS"; otherwise it is
// "YYYY-MM-DD HH:MM:SS". No timezone is included — values are
// displayed in the process's local zone.
func (w *Writer) Time(t time.Time) string {
t = t.Local()
now := time.Now()
if t.Year() == now.Year() && t.YearDay() == now.YearDay() {
return w.paint(ansiYellow, t.Format("15:04:05"))
}
return w.paint(ansiYellow, t.Format("2006-01-02 15:04:05"))
}
// Count colorizes an integer count with thousands separators.
func (w *Writer) Count(n int) string {
return w.paint(ansiMagenta, humanize.Comma(int64(n)))
}
// Percent colorizes a 0..100 percentage.
func (w *Writer) Percent(p float64) string {
return w.paint(ansiMagenta, fmt.Sprintf("%.1f%%", p))
}

134
internal/ui/ui_test.go Normal file
View File

@@ -0,0 +1,134 @@
package ui
import (
"bytes"
"strings"
"testing"
"time"
)
func newTestWriter(color bool) (*Writer, *bytes.Buffer) {
buf := &bytes.Buffer{}
return NewWithColor(buf, color), buf
}
func TestMessageMethodsPlain(t *testing.T) {
tests := []struct {
method string
fn func(*Writer)
want string
}{
{"Begin", func(w *Writer) { w.Begin("starting %s", "thing") }, "》 starting thing\n"},
{"Complete", func(w *Writer) { w.Complete("done %s", "thing") }, "》 done thing\n"},
{"Info", func(w *Writer) { w.Info("status") }, "》 status\n"},
{"Notice", func(w *Writer) { w.Notice("note") }, "》 note\n"},
{"Warning", func(w *Writer) { w.Warning("oops") }, "⚠️ Warning: oops\n"},
{"Error", func(w *Writer) { w.Error("boom") }, "🛑 ERROR: boom\n"},
{"Progress", func(w *Writer) { w.Progress("p") }, " 》 p\n"},
{"Detail", func(w *Writer) { w.Detail("d") }, " 》 d\n"},
{"Banner", func(w *Writer) { w.Banner("hello") }, "hello\n"}, // plain mode, no bold
}
for _, tt := range tests {
t.Run(tt.method, func(t *testing.T) {
w, buf := newTestWriter(false)
tt.fn(w)
if got := buf.String(); got != tt.want {
t.Errorf("got %q, want %q", got, tt.want)
}
})
}
}
func TestWarningErrorCounters(t *testing.T) {
w, _ := newTestWriter(false)
if w.WarningCount() != 0 || w.ErrorCount() != 0 {
t.Fatalf("expected fresh writer to have zero counts")
}
w.Info("normal")
w.Warning("first warn")
w.Warning("second warn")
w.Error("only error")
if got, want := w.WarningCount(), 2; got != want {
t.Errorf("WarningCount: got %d, want %d", got, want)
}
if got, want := w.ErrorCount(), 1; got != want {
t.Errorf("ErrorCount: got %d, want %d", got, want)
}
}
func TestColorOutputContainsANSI(t *testing.T) {
w, buf := newTestWriter(true)
w.Error("boom")
out := buf.String()
if !strings.Contains(out, "\033[") {
t.Errorf("expected ANSI escapes in color output, got %q", out)
}
if !strings.Contains(out, "ERROR: ") {
t.Errorf("expected 'ERROR: ' text in output, got %q", out)
}
}
func TestBannerBoldWhenColor(t *testing.T) {
w, buf := newTestWriter(true)
w.Banner("hello")
out := buf.String()
if !strings.Contains(out, "\033[1m") {
t.Errorf("expected bold ANSI escape in colored Banner output, got %q", out)
}
}
func TestValueFormattersPlain(t *testing.T) {
w, _ := newTestWriter(false)
if got := w.Hex("0123456789abcdef0123"); got != "0123456789ab..." {
t.Errorf("Hex long: got %q", got)
}
if got := w.Hex("short"); got != "short" {
t.Errorf("Hex short: got %q", got)
}
if got := w.Size(1024); got != "1.0 kB" {
t.Errorf("Size: got %q", got)
}
if got := w.Duration(90 * time.Second); got != "1m30s" {
t.Errorf("Duration: got %q", got)
}
if got := w.Count(12345); got != "12,345" {
t.Errorf("Count: got %q", got)
}
if got := w.Percent(12.34); got != "12.3%" {
t.Errorf("Percent: got %q", got)
}
// Speed: input is bytes/sec, output is bits/sec.
if got := w.Speed(0); got != "N/A" {
t.Errorf("Speed(0): got %q, want N/A", got)
}
if got := w.Speed(125_000_000); got != "1.0 Gbit/sec" { // 1 Gbit/s = 125 MB/s
t.Errorf("Speed(125e6): got %q", got)
}
if got := w.Speed(125_000); got != "1 Mbit/sec" {
t.Errorf("Speed(125e3): got %q", got)
}
// Time format: today → HH:MM:SS, other day → YYYY-MM-DD HH:MM:SS.
today := time.Date(time.Now().Year(), time.Now().Month(), time.Now().Day(), 14, 30, 45, 0, time.Local)
if got := w.Time(today); got != "14:30:45" {
t.Errorf("Time today: got %q, want 14:30:45", got)
}
other := time.Date(2030, 1, 2, 3, 4, 5, 0, time.Local)
if got := w.Time(other); got != "2030-01-02 03:04:05" {
t.Errorf("Time other day: got %q", got)
}
}
func TestValueFormattersColored(t *testing.T) {
w, _ := newTestWriter(true)
hex := w.Hex("0123456789abcdef0123")
if !strings.Contains(hex, "\033[") {
t.Errorf("expected ANSI in colored Hex output, got %q", hex)
}
if !strings.Contains(hex, "0123456789ab") {
t.Errorf("expected hex content in output, got %q", hex)
}
}

View File

@@ -6,9 +6,11 @@ import (
"encoding/hex"
"fmt"
"io"
"time"
"filippo.io/age"
"git.eeqj.de/sneak/vaultik/internal/blobgen"
"sneak.berlin/go/vaultik/internal/blobgen"
"sneak.berlin/go/vaultik/internal/log"
)
// hashVerifyReader wraps a blobgen.Reader and verifies the double-SHA-256 hash
@@ -75,19 +77,34 @@ func (v *Vaultik) FetchAndDecryptBlob(ctx context.Context, blobHash string, expe
}
// FetchBlob downloads a blob and returns a reader for the encrypted data.
// Times the Storage.Get and Storage.Stat round-trips separately at
// debug level so we can see whether the size-only Stat (which is an
// extra request on every fetch) is hurting throughput.
func (v *Vaultik) FetchBlob(ctx context.Context, blobHash string, expectedSize int64) (io.ReadCloser, int64, error) {
blobPath := fmt.Sprintf("blobs/%s/%s/%s", blobHash[:2], blobHash[2:4], blobHash)
t0 := time.Now()
rc, err := v.Storage.Get(ctx, blobPath)
getDur := time.Since(t0)
if err != nil {
return nil, 0, fmt.Errorf("downloading blob %s: %w", blobHash[:16], err)
}
t0 = time.Now()
info, err := v.Storage.Stat(ctx, blobPath)
statDur := time.Since(t0)
if err != nil {
_ = rc.Close()
return nil, 0, fmt.Errorf("stat blob %s: %w", blobHash[:16], err)
}
log.Debug("FetchBlob round-trips",
"hash", blobHash[:16],
"ms_storage_get", getDur.Milliseconds(),
"ms_storage_stat", statDur.Milliseconds(),
"expected_size", expectedSize,
"stat_size", info.Size,
)
return rc, info.Size, nil
}

View File

@@ -10,8 +10,8 @@ import (
"testing"
"filippo.io/age"
"git.eeqj.de/sneak/vaultik/internal/blobgen"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"sneak.berlin/go/vaultik/internal/blobgen"
"sneak.berlin/go/vaultik/internal/vaultik"
)
// TestFetchAndDecryptBlobVerifiesHash verifies that FetchAndDecryptBlob checks

View File

@@ -2,6 +2,7 @@ package vaultik
import (
"fmt"
"io"
"os"
"path/filepath"
"sync"
@@ -15,9 +16,22 @@ type blobDiskCacheEntry struct {
next *blobDiskCacheEntry
}
// blobDiskCache is an LRU cache that stores blobs on disk instead of in memory.
// Blobs are written to a temp directory keyed by their hash. When total size
// exceeds maxBytes, the least-recently-used entries are evicted (deleted from disk).
// blobDiskCache stores blobs on disk keyed by hash. It exposes ReadAt
// for slice reads (the restore path uses this so chunk extraction
// never reads a whole blob into memory) plus Get/Put for whole-blob
// access.
//
// Eviction policy is caller-controlled. The cache keeps an LRU list
// internally and will fall back to LRU eviction if curBytes exceeds
// maxBytes. Restore passes math.MaxInt64 as maxBytes and drives
// eviction itself via Delete() through restoreSweeper, which deletes
// each blob the moment every file that references its chunks has been
// written. LRU never fires under that configuration; it is kept as a
// safety net for callers that don't manage eviction themselves.
//
// Get/ReadAt/peak-Len counters are debugging instrumentation used by
// tests to assert that the restore code path uses ReadAt rather than
// Get and to bound peak disk-cache occupancy.
type blobDiskCache struct {
mu sync.Mutex
dir string
@@ -26,6 +40,11 @@ type blobDiskCache struct {
items map[string]*blobDiskCacheEntry
head *blobDiskCacheEntry // most recent
tail *blobDiskCacheEntry // least recent
// Instrumentation. Mutated under mu; readable via the methods below.
getCalls int
readAtCalls int
peakLen int
}
// newBlobDiskCache creates a new disk-based blob cache with the given max size.
@@ -115,12 +134,77 @@ func (c *blobDiskCache) Put(key string, data []byte) error {
c.evictLRU()
}
if n := len(c.items); n > c.peakLen {
c.peakLen = n
}
return nil
}
// PutFromReader streams r into the cache file for key, returning the
// total number of bytes written. Unlike Put, the data never has to
// reside fully in memory at any point — io.Copy uses an internal
// 32 KiB buffer. Used by restore to land a freshly decrypted blob on
// disk without buffering its entire plaintext (which may be tens of GB)
// in RAM.
func (c *blobDiskCache) PutFromReader(key string, r io.Reader) (int64, error) {
c.mu.Lock()
// Remove any prior entry first; we'll re-link after the file is
// written successfully.
if e, ok := c.items[key]; ok {
c.unlink(e)
c.curBytes -= e.size
_ = os.Remove(c.path(key))
delete(c.items, key)
}
c.mu.Unlock()
f, err := os.OpenFile(c.path(key), os.O_CREATE|os.O_TRUNC|os.O_WRONLY, 0o600)
if err != nil {
return 0, fmt.Errorf("creating cache file: %w", err)
}
written, copyErr := io.Copy(f, r)
closeErr := f.Close()
if copyErr != nil {
_ = os.Remove(c.path(key))
return written, fmt.Errorf("streaming to cache file: %w", copyErr)
}
if closeErr != nil {
_ = os.Remove(c.path(key))
return written, fmt.Errorf("closing cache file: %w", closeErr)
}
c.mu.Lock()
defer c.mu.Unlock()
// If the entry would exceed maxBytes outright, drop it on the
// floor — but the restore path passes math.MaxInt64 as maxBytes
// so this branch is effectively unreachable there.
if written > c.maxBytes {
_ = os.Remove(c.path(key))
return written, nil
}
e := &blobDiskCacheEntry{key: key, size: written}
c.pushFront(e)
c.items[key] = e
c.curBytes += written
for c.curBytes > c.maxBytes && c.tail != nil {
c.evictLRU()
}
if n := len(c.items); n > c.peakLen {
c.peakLen = n
}
return written, nil
}
// Get reads a cached blob from disk. Returns data and true on hit.
func (c *blobDiskCache) Get(key string) ([]byte, bool) {
c.mu.Lock()
c.getCalls++
e, ok := c.items[key]
if !ok {
c.mu.Unlock()
@@ -147,6 +231,7 @@ func (c *blobDiskCache) Get(key string) ([]byte, bool) {
// ReadAt reads a slice of a cached blob without loading the entire blob into memory.
func (c *blobDiskCache) ReadAt(key string, offset, length int64) ([]byte, error) {
c.mu.Lock()
c.readAtCalls++
e, ok := c.items[key]
if !ok {
c.mu.Unlock()
@@ -181,6 +266,34 @@ func (c *blobDiskCache) Has(key string) bool {
return ok
}
// Delete removes a blob from the cache and its disk file. No-op if absent.
// Used by restore's sweep logic to free blobs whose chunks have all been
// restored (so they will never be needed again during this restore).
func (c *blobDiskCache) Delete(key string) {
c.mu.Lock()
defer c.mu.Unlock()
e, ok := c.items[key]
if !ok {
return
}
c.unlink(e)
delete(c.items, key)
c.curBytes -= e.size
_ = os.Remove(c.path(key))
}
// Keys returns a snapshot of all cached keys. Safe for iteration without
// holding the cache lock; the cache may change concurrently.
func (c *blobDiskCache) Keys() []string {
c.mu.Lock()
defer c.mu.Unlock()
keys := make([]string, 0, len(c.items))
for k := range c.items {
keys = append(keys, k)
}
return keys
}
// Size returns current total cached bytes.
func (c *blobDiskCache) Size() int64 {
c.mu.Lock()
@@ -195,6 +308,28 @@ func (c *blobDiskCache) Len() int {
return len(c.items)
}
// GetCalls returns the number of times Get has been called.
func (c *blobDiskCache) GetCalls() int {
c.mu.Lock()
defer c.mu.Unlock()
return c.getCalls
}
// ReadAtCalls returns the number of times ReadAt has been called.
func (c *blobDiskCache) ReadAtCalls() int {
c.mu.Lock()
defer c.mu.Unlock()
return c.readAtCalls
}
// PeakLen returns the maximum number of cached entries ever held at
// once during this cache's lifetime.
func (c *blobDiskCache) PeakLen() int {
c.mu.Lock()
defer c.mu.Unlock()
return c.peakLen
}
// Close removes the cache directory and all cached blobs.
func (c *blobDiskCache) Close() error {
c.mu.Lock()

View File

@@ -2,49 +2,25 @@ package vaultik
import (
"fmt"
"regexp"
"strconv"
"strings"
"time"
"git.eeqj.de/sneak/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/types"
)
// SnapshotInfo contains information about a snapshot
// SnapshotInfo contains information about a snapshot.
// UncompressedSize and NewChunkSize are populated only when the snapshot
// is present in the local database; LocallyTracked indicates whether
// those values are meaningful.
type SnapshotInfo struct {
ID types.SnapshotID `json:"id"`
Timestamp time.Time `json:"timestamp"`
CompressedSize int64 `json:"compressed_size"`
}
// formatNumber formats a number with commas
func formatNumber(n int) string {
str := fmt.Sprintf("%d", n)
var result []string
for i, digit := range str {
if i > 0 && (len(str)-i)%3 == 0 {
result = append(result, ",")
}
result = append(result, string(digit))
}
return strings.Join(result, "")
}
// formatDuration formats a duration in a human-readable way
func formatDuration(d time.Duration) string {
if d < time.Second {
return fmt.Sprintf("%dms", d.Milliseconds())
}
if d < time.Minute {
return fmt.Sprintf("%.1fs", d.Seconds())
}
if d < time.Hour {
mins := int(d.Minutes())
secs := int(d.Seconds()) % 60
return fmt.Sprintf("%dm %ds", mins, secs)
}
hours := int(d.Hours())
mins := int(d.Minutes()) % 60
return fmt.Sprintf("%dh %dm", hours, mins)
ID types.SnapshotID `json:"id"`
Timestamp time.Time `json:"timestamp"`
CompressedSize int64 `json:"compressed_size"`
UncompressedSize int64 `json:"uncompressed_size,omitempty"`
NewChunkSize int64 `json:"new_chunk_size,omitempty"`
LocallyTracked bool `json:"locally_tracked"`
}
// formatBytes formats bytes in a human-readable format
@@ -95,18 +71,39 @@ func parseSnapshotName(snapshotID string) string {
return strings.Join(parts[1:len(parts)-1], "_")
}
// parseDuration parses a duration string with support for days
// parseDuration parses a duration string with support for human-friendly units:
// d/day/days, w/week/weeks, mo/month/months, y/year/years, plus standard Go
// duration units (h, m, s).
func parseDuration(s string) (time.Duration, error) {
// Check for days suffix
if strings.HasSuffix(s, "d") {
daysStr := strings.TrimSuffix(s, "d")
days, err := strconv.Atoi(daysStr)
if err != nil {
return 0, fmt.Errorf("invalid days value: %w", err)
}
return time.Duration(days) * 24 * time.Hour, nil
if d, err := time.ParseDuration(s); err == nil {
return d, nil
}
// Otherwise use standard Go duration parsing
return time.ParseDuration(s)
re := regexp.MustCompile(`(\d+)\s*([a-zA-Z]+)`)
matches := re.FindAllStringSubmatch(s, -1)
if len(matches) == 0 {
return 0, fmt.Errorf("invalid duration: %q", s)
}
var total time.Duration
for _, match := range matches {
n, err := strconv.Atoi(match[1])
if err != nil {
return 0, fmt.Errorf("invalid number %q: %w", match[1], err)
}
unit := strings.ToLower(match[2])
switch unit {
case "d", "day", "days":
total += time.Duration(n) * 24 * time.Hour
case "w", "week", "weeks":
total += time.Duration(n) * 7 * 24 * time.Hour
case "mo", "month", "months":
total += time.Duration(n) * 30 * 24 * time.Hour
case "y", "year", "years":
total += time.Duration(n) * 365 * 24 * time.Hour
default:
return 0, fmt.Errorf("unknown time unit %q", unit)
}
}
return total, nil
}

View File

@@ -2,6 +2,7 @@ package vaultik
import (
"testing"
"time"
)
func TestParseSnapshotName(t *testing.T) {
@@ -37,6 +38,41 @@ func TestParseSnapshotName(t *testing.T) {
}
}
func TestParseDuration(t *testing.T) {
tests := []struct {
input string
want time.Duration
err bool
}{
{"30d", 30 * 24 * time.Hour, false},
{"4w", 4 * 7 * 24 * time.Hour, false},
{"6mo", 6 * 30 * 24 * time.Hour, false},
{"1y", 365 * 24 * time.Hour, false},
{"2w3d", 2*7*24*time.Hour + 3*24*time.Hour, false},
{"1h", time.Hour, false},
{"30s", 30 * time.Second, false},
{"garbage", 0, true},
}
for _, tt := range tests {
t.Run(tt.input, func(t *testing.T) {
got, err := parseDuration(tt.input)
if tt.err {
if err == nil {
t.Fatalf("expected error for %q, got %v", tt.input, got)
}
return
}
if err != nil {
t.Fatalf("unexpected error for %q: %v", tt.input, err)
}
if got != tt.want {
t.Errorf("parseDuration(%q) = %v, want %v", tt.input, got, tt.want)
}
})
}
}
func TestParseSnapshotTimestamp(t *testing.T) {
tests := []struct {
name string

View File

@@ -7,9 +7,9 @@ import (
"sort"
"strings"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"github.com/dustin/go-humanize"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/snapshot"
)
// ShowInfo displays system and configuration information
@@ -22,14 +22,29 @@ func (v *Vaultik) ShowInfo() error {
v.printfStdout("Go Version: %s\n", runtime.Version())
v.printlnStdout()
// Storage Configuration
// Storage Configuration. The backend is selected by storage_url
// (s3://, file://, rclone://); the legacy s3.* fields are only
// printed when they're actually populated, since the URL scheme
// is the primary configuration.
v.printfStdout("=== Storage Configuration ===\n")
v.printfStdout("S3 Bucket: %s\n", v.Config.S3.Bucket)
storageInfo := v.Storage.Info()
v.printfStdout("Type: %s\n", storageInfo.Type)
v.printfStdout("Location: %s\n", storageInfo.Location)
if v.Config.StorageURL != "" {
v.printfStdout("Storage URL: %s\n", v.Config.StorageURL)
}
if v.Config.S3.Bucket != "" {
v.printfStdout("S3 Bucket: %s\n", v.Config.S3.Bucket)
}
if v.Config.S3.Prefix != "" {
v.printfStdout("S3 Prefix: %s\n", v.Config.S3.Prefix)
}
v.printfStdout("S3 Endpoint: %s\n", v.Config.S3.Endpoint)
v.printfStdout("S3 Region: %s\n", v.Config.S3.Region)
if v.Config.S3.Endpoint != "" {
v.printfStdout("S3 Endpoint: %s\n", v.Config.S3.Endpoint)
}
if v.Config.S3.Region != "" {
v.printfStdout("S3 Region: %s\n", v.Config.S3.Region)
}
v.printlnStdout()
// Backup Settings
@@ -66,18 +81,6 @@ func (v *Vaultik) ShowInfo() error {
}
v.printlnStdout()
// Daemon Settings (if applicable)
if v.Config.BackupInterval > 0 || v.Config.MinTimeBetweenRun > 0 {
v.printfStdout("=== Daemon Settings ===\n")
if v.Config.BackupInterval > 0 {
v.printfStdout("Backup Interval: %s\n", v.Config.BackupInterval)
}
if v.Config.MinTimeBetweenRun > 0 {
v.printfStdout("Minimum Time: %s\n", v.Config.MinTimeBetweenRun)
}
v.printlnStdout()
}
// Local Database
v.printfStdout("=== Local Database ===\n")
v.printfStdout("Index Path: %s\n", v.Config.IndexPath)
@@ -349,7 +352,7 @@ func (v *Vaultik) printRemoteInfoTable(result *RemoteInfoResult) {
humanize.Comma(int64(result.OrphanedBlobCount)), humanize.Bytes(uint64(result.OrphanedBlobSize)))
if result.OrphanedBlobCount > 0 {
v.printfStdout("\nRun 'vaultik prune --remote' to remove orphaned blobs.\n")
v.printfStdout("\nRun 'vaultik prune' to remove orphaned blobs.\n")
}
}

View File

@@ -11,16 +11,17 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/storage"
"git.eeqj.de/sneak/vaultik/internal/types"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/vaultik/internal/config"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/snapshot"
"sneak.berlin/go/vaultik/internal/storage"
"sneak.berlin/go/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/ui"
"sneak.berlin/go/vaultik/internal/vaultik"
)
// MockStorer implements storage.Storer for testing
@@ -520,6 +521,7 @@ func TestBackupAndRestore(t *testing.T) {
Fs: fs,
Stdout: io.Discard,
Stderr: io.Discard,
UI: ui.NewWithColor(io.Discard, false),
}
vaultikApp.SetContext(ctx)
@@ -541,3 +543,293 @@ func TestBackupAndRestore(t *testing.T) {
t.Log("Backup and restore test completed successfully")
}
// TestEndToEndFileStorage exercises the full backup → restore loop against the
// real `file://` storage backend (FileStorer) on a real OS filesystem. This is
// the closest local approximation of a production backup: encrypted blobs get
// written to disk, the metadata SQLite database is exported through the same
// blobgen pipeline as a real backup, and restoration reads them back through
// the public Vaultik.Restore entrypoint. It is the canonical end-to-end smoke
// test for 1.0.
func TestEndToEndFileStorage(t *testing.T) {
log.Initialize(log.Config{})
// Real OS filesystem (SQLite + FileStorer both need it).
fs := afero.NewOsFs()
tempDir, err := os.MkdirTemp("", "vaultik-e2e-")
require.NoError(t, err)
defer func() { _ = os.RemoveAll(tempDir) }()
dataDir := filepath.Join(tempDir, "source")
storeDir := filepath.Join(tempDir, "remote")
restoreDir := filepath.Join(tempDir, "restored")
dbPath := filepath.Join(tempDir, "index.sqlite")
// Write a representative mix of file sizes:
// - empty file
// - tiny text file
// - file just under chunk boundary
// - file forcing multiple chunks
// - nested subdirectories
chunkSize := int64(64 * 1024)
maxBlobSize := int64(512 * 1024)
testFiles := map[string][]byte{
filepath.Join(dataDir, "empty.txt"): {},
filepath.Join(dataDir, "small.txt"): []byte("hello vaultik"),
filepath.Join(dataDir, "subdir", "medium.bin"): bytesPattern("medium-", int(chunkSize/2)),
filepath.Join(dataDir, "subdir", "large.bin"): bytesPattern("large-", int(chunkSize*4)),
filepath.Join(dataDir, "deep", "nest", "leaf.txt"): []byte("leaf"),
}
for path, content := range testFiles {
require.NoError(t, fs.MkdirAll(filepath.Dir(path), 0o755))
require.NoError(t, afero.WriteFile(fs, path, content, 0o644))
}
// Create a file with non-default permissions.
restrictedPath := filepath.Join(dataDir, "restricted.txt")
require.NoError(t, afero.WriteFile(fs, restrictedPath, []byte("secret"), 0o600))
testFiles[restrictedPath] = []byte("secret")
// Create an empty directory (should survive round-trip).
emptyDir := filepath.Join(dataDir, "emptydir")
require.NoError(t, fs.MkdirAll(emptyDir, 0o755))
// Create a symlink.
symlinkPath := filepath.Join(dataDir, "link-to-small")
require.NoError(t, os.Symlink("small.txt", symlinkPath))
// FileStorer is the real-world local-disk backend.
storer, err := storage.NewFileStorer(storeDir)
require.NoError(t, err)
agePublicKey := "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
ageSecretKey := "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
cfg := &config.Config{
AgeRecipients: []string{agePublicKey},
AgeSecretKey: ageSecretKey,
CompressionLevel: 3,
Hostname: "test-host",
}
ctx := context.Background()
db, err := database.New(ctx, dbPath)
require.NoError(t, err)
defer func() { _ = db.Close() }()
repos := database.NewRepositories(db)
sm := snapshot.NewSnapshotManager(snapshot.SnapshotManagerParams{
Repos: repos,
Storage: storer,
Config: cfg,
})
sm.SetFilesystem(fs)
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
Storage: storer,
ChunkSize: chunkSize,
MaxBlobSize: maxBlobSize,
CompressionLevel: cfg.CompressionLevel,
AgeRecipients: cfg.AgeRecipients,
Repositories: repos,
})
snapshotID, err := sm.CreateSnapshotWithName(ctx, cfg.Hostname, "e2e", "test-version", "test-git")
require.NoError(t, err)
scanResult, err := scanner.Scan(ctx, dataDir, snapshotID)
require.NoError(t, err)
require.Greater(t, scanResult.FilesScanned, 0)
require.Greater(t, scanResult.BlobsCreated, 0)
require.NoError(t, sm.CompleteSnapshot(ctx, snapshotID))
require.NoError(t, sm.ExportSnapshotMetadata(ctx, dbPath, snapshotID))
// Verify the backup actually landed on disk under blobs/ and metadata/.
blobInfo, err := os.Stat(filepath.Join(storeDir, "blobs"))
require.NoError(t, err)
require.True(t, blobInfo.IsDir())
metaInfo, err := os.Stat(filepath.Join(storeDir, "metadata", snapshotID))
require.NoError(t, err)
require.True(t, metaInfo.IsDir())
// Tear down the source DB before restore — restore must work using only
// the remote bytes plus the secret key, with no help from the local index.
require.NoError(t, db.Close())
restoreVaultik := &vaultik.Vaultik{
Config: cfg,
Storage: storer,
Fs: fs,
Stdout: io.Discard,
Stderr: io.Discard,
UI: ui.NewWithColor(io.Discard, false),
}
restoreVaultik.SetContext(ctx)
require.NoError(t, restoreVaultik.Restore(&vaultik.RestoreOptions{
SnapshotID: snapshotID,
TargetDir: restoreDir,
Verify: true,
}))
// Byte-equality compare every original against its restored copy.
for origPath, expected := range testFiles {
restoredPath := filepath.Join(restoreDir, origPath)
got, err := afero.ReadFile(fs, restoredPath)
require.NoError(t, err, "restored file missing: %s", restoredPath)
require.Equalf(t, expected, got, "byte-equality failed for %s", origPath)
}
// Verify the restricted file kept its permissions.
restoredRestricted := filepath.Join(restoreDir, restrictedPath)
rInfo, err := os.Stat(restoredRestricted)
require.NoError(t, err)
assert.Equal(t, os.FileMode(0o600), rInfo.Mode().Perm(),
"restricted file should preserve 0600 permissions")
// Verify the empty directory was restored.
restoredEmptyDir := filepath.Join(restoreDir, emptyDir)
dInfo, err := os.Stat(restoredEmptyDir)
require.NoError(t, err, "empty directory should be restored")
assert.True(t, dInfo.IsDir(), "emptydir should be a directory")
// Verify the symlink was restored with the correct target.
restoredSymlink := filepath.Join(restoreDir, symlinkPath)
target, err := os.Readlink(restoredSymlink)
require.NoError(t, err, "symlink should be restored")
assert.Equal(t, "small.txt", target, "symlink target should be preserved")
}
// TestDedupOnlySnapshotRestores backs up the same directory twice without
// touching it between runs, then restores the SECOND (fully-deduplicated)
// snapshot. The second snapshot uploads no new blobs — every chunk is
// already in storage from the first run. This test guards against the
// regression where snapshot_blobs was populated only for blobs uploaded
// during the snapshot, leaving fully-deduplicated snapshots unrestorable
// with "chunk X not found in any blob" errors.
func TestDedupOnlySnapshotRestores(t *testing.T) {
log.Initialize(log.Config{})
fs := afero.NewOsFs()
tempDir, err := os.MkdirTemp("", "vaultik-dedup-")
require.NoError(t, err)
defer func() { _ = os.RemoveAll(tempDir) }()
dataDir := filepath.Join(tempDir, "source")
storeDir := filepath.Join(tempDir, "remote")
restoreDir := filepath.Join(tempDir, "restored")
dbPath := filepath.Join(tempDir, "index.sqlite")
chunkSize := int64(64 * 1024)
maxBlobSize := int64(512 * 1024)
testFiles := map[string][]byte{
filepath.Join(dataDir, "a.bin"): bytesPattern("a-", int(chunkSize*3)),
filepath.Join(dataDir, "b.bin"): bytesPattern("b-", int(chunkSize*2)),
}
for path, content := range testFiles {
require.NoError(t, fs.MkdirAll(filepath.Dir(path), 0o755))
require.NoError(t, afero.WriteFile(fs, path, content, 0o644))
}
storer, err := storage.NewFileStorer(storeDir)
require.NoError(t, err)
agePublicKey := "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
ageSecretKey := "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
cfg := &config.Config{
AgeRecipients: []string{agePublicKey},
AgeSecretKey: ageSecretKey,
CompressionLevel: 3,
Hostname: "test-host",
}
ctx := context.Background()
db, err := database.New(ctx, dbPath)
require.NoError(t, err)
defer func() { _ = db.Close() }()
repos := database.NewRepositories(db)
makeScanner := func() *snapshot.Scanner {
return snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
Storage: storer,
ChunkSize: chunkSize,
MaxBlobSize: maxBlobSize,
CompressionLevel: cfg.CompressionLevel,
AgeRecipients: cfg.AgeRecipients,
Repositories: repos,
})
}
sm := snapshot.NewSnapshotManager(snapshot.SnapshotManagerParams{
Repos: repos, Storage: storer, Config: cfg,
})
sm.SetFilesystem(fs)
// First snapshot — uploads all blobs.
id1, err := sm.CreateSnapshotWithName(ctx, cfg.Hostname, "dedup", "v", "g")
require.NoError(t, err)
r1, err := makeScanner().Scan(ctx, dataDir, id1)
require.NoError(t, err)
require.Greater(t, r1.BlobsCreated, 0, "first snapshot should upload at least one blob")
require.NoError(t, sm.CompleteSnapshot(ctx, id1))
require.NoError(t, sm.ExportSnapshotMetadata(ctx, dbPath, id1))
// Second snapshot — same data, every chunk dedups. Sleep past the
// second-precision timestamp so the snapshot IDs differ.
time.Sleep(1100 * time.Millisecond)
id2, err := sm.CreateSnapshotWithName(ctx, cfg.Hostname, "dedup", "v", "g")
require.NoError(t, err)
r2, err := makeScanner().Scan(ctx, dataDir, id2)
require.NoError(t, err)
require.Equal(t, 0, r2.BlobsCreated, "second snapshot should upload zero new blobs (fully dedup'd)")
require.NoError(t, sm.CompleteSnapshot(ctx, id2))
require.NoError(t, sm.ExportSnapshotMetadata(ctx, dbPath, id2))
// snapshot_blobs for id2 must be populated despite no uploads.
blobHashes, err := repos.Snapshots.GetBlobHashes(ctx, id2)
require.NoError(t, err)
require.NotEmpty(t, blobHashes, "snapshot_blobs for fully-dedup'd snapshot must reference blobs uploaded by prior snapshot")
require.NoError(t, db.Close())
restoreVaultik := &vaultik.Vaultik{
Config: cfg,
Storage: storer,
Fs: fs,
Stdout: io.Discard,
Stderr: io.Discard,
UI: ui.NewWithColor(io.Discard, false),
}
restoreVaultik.SetContext(ctx)
require.NoError(t, restoreVaultik.Restore(&vaultik.RestoreOptions{
SnapshotID: id2,
TargetDir: restoreDir,
Verify: true,
}))
for origPath, expected := range testFiles {
restoredPath := filepath.Join(restoreDir, origPath)
got, err := afero.ReadFile(fs, restoredPath)
require.NoError(t, err, "restored file missing: %s", restoredPath)
require.Equalf(t, expected, got, "byte-equality failed for %s", origPath)
}
}
// bytesPattern returns a deterministic byte slice of length n with a tag prefix,
// useful for forcing chunker behavior with reproducible content.
func bytesPattern(tag string, n int) []byte {
out := make([]byte, n)
for i := range out {
out[i] = byte(tag[i%len(tag)] ^ byte(i&0xff))
}
return out
}

View File

@@ -5,8 +5,8 @@ import (
"fmt"
"strings"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/dustin/go-humanize"
"sneak.berlin/go/vaultik/internal/log"
)
// PruneOptions contains options for the prune command
@@ -15,6 +15,31 @@ type PruneOptions struct {
JSON bool
}
// NukeRemote deletes every snapshot's metadata and every blob from remote
// storage. After this returns successfully the bucket prefix is empty and
// the next backup starts from scratch.
//
// Refuses to run unless force is true. The caller is responsible for
// confirming with the user.
func (v *Vaultik) NukeRemote(force bool) error {
if !force {
return fmt.Errorf("nuke requires --force (this deletes ALL remote snapshots and blobs)")
}
v.UI.Begin("Removing all snapshot metadata from backup destination store.")
if _, err := v.RemoveAllSnapshots(&RemoveOptions{Force: true, Remote: true}); err != nil {
return fmt.Errorf("removing all snapshots: %w", err)
}
v.UI.Begin("Removing all blobs from backup destination store.")
if err := v.PruneBlobs(&PruneOptions{Force: true}); err != nil {
return fmt.Errorf("pruning blobs: %w", err)
}
v.UI.Complete("Backup destination store is now empty.")
return nil
}
// PruneBlobsResult contains the result of a blob prune operation
type PruneBlobsResult struct {
BlobsFound int `json:"blobs_found"`
@@ -23,6 +48,19 @@ type PruneBlobsResult struct {
BytesFreed int64 `json:"bytes_freed"`
}
// Prune removes orphaned data from the local index database AND
// unreferenced blobs from the backup destination store. This is the
// single user-facing prune entry point — the split between local and
// remote cleanup is an implementation detail. Calling code should
// prefer this method over PruneDatabase or PruneBlobs individually
// unless it specifically wants one half.
func (v *Vaultik) Prune(opts *PruneOptions) error {
if _, err := v.PruneDatabase(); err != nil {
return fmt.Errorf("pruning local database: %w", err)
}
return v.PruneBlobs(opts)
}
// PruneBlobs removes unreferenced blobs from storage
func (v *Vaultik) PruneBlobs(opts *PruneOptions) error {
log.Info("Starting prune operation")

View File

@@ -8,12 +8,12 @@ import (
"testing"
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"git.eeqj.de/sneak/vaultik/internal/vaultik"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/types"
"sneak.berlin/go/vaultik/internal/vaultik"
)
// setupPurgeTest creates a Vaultik instance with an in-memory database and mock
@@ -156,7 +156,7 @@ func TestPurgeKeepLatest_WithNameFilter(t *testing.T) {
err := v.PurgeSnapshotsWithOptions(&vaultik.SnapshotPurgeOptions{
KeepLatest: true,
Force: true,
Name: "home",
Names: []string{"home"},
})
require.NoError(t, err)
@@ -190,7 +190,7 @@ func TestPurgeKeepLatest_NameFilterNoMatch(t *testing.T) {
err := v.PurgeSnapshotsWithOptions(&vaultik.SnapshotPurgeOptions{
KeepLatest: true,
Force: true,
Name: "nonexistent",
Names: []string{"nonexistent"},
})
require.NoError(t, err)
@@ -215,7 +215,7 @@ func TestPurgeOlderThan_WithNameFilter(t *testing.T) {
err := v.PurgeSnapshotsWithOptions(&vaultik.SnapshotPurgeOptions{
OlderThan: "365d",
Force: true,
Name: "home",
Names: []string{"home"},
})
require.NoError(t, err)

View File

@@ -0,0 +1,351 @@
package vaultik_test
import (
"bytes"
"context"
"io"
"strings"
"sync"
"testing"
"github.com/klauspost/compress/zstd"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/snapshot"
"sneak.berlin/go/vaultik/internal/storage"
"sneak.berlin/go/vaultik/internal/vaultik"
)
// testStorer implements storage.Storer for testing
type testStorer struct {
mu sync.Mutex
data map[string][]byte
}
func newTestStorer() *testStorer {
return &testStorer{
data: make(map[string][]byte),
}
}
func (s *testStorer) Put(ctx context.Context, key string, reader io.Reader) error {
s.mu.Lock()
defer s.mu.Unlock()
data, err := io.ReadAll(reader)
if err != nil {
return err
}
s.data[key] = data
return nil
}
func (s *testStorer) PutWithProgress(ctx context.Context, key string, reader io.Reader, size int64, progress storage.ProgressCallback) error {
return s.Put(ctx, key, reader)
}
func (s *testStorer) Get(ctx context.Context, key string) (io.ReadCloser, error) {
s.mu.Lock()
defer s.mu.Unlock()
data, exists := s.data[key]
if !exists {
return nil, storage.ErrNotFound
}
return io.NopCloser(bytes.NewReader(data)), nil
}
func (s *testStorer) Stat(ctx context.Context, key string) (*storage.ObjectInfo, error) {
s.mu.Lock()
defer s.mu.Unlock()
data, exists := s.data[key]
if !exists {
return nil, storage.ErrNotFound
}
return &storage.ObjectInfo{
Key: key,
Size: int64(len(data)),
}, nil
}
func (s *testStorer) Delete(ctx context.Context, key string) error {
s.mu.Lock()
defer s.mu.Unlock()
delete(s.data, key)
return nil
}
func (s *testStorer) List(ctx context.Context, prefix string) ([]string, error) {
s.mu.Lock()
defer s.mu.Unlock()
var keys []string
for key := range s.data {
if prefix == "" || strings.HasPrefix(key, prefix) {
keys = append(keys, key)
}
}
return keys, nil
}
func (s *testStorer) ListStream(ctx context.Context, prefix string) <-chan storage.ObjectInfo {
ch := make(chan storage.ObjectInfo)
go func() {
defer close(ch)
s.mu.Lock()
defer s.mu.Unlock()
for key, data := range s.data {
if prefix == "" || strings.HasPrefix(key, prefix) {
ch <- storage.ObjectInfo{
Key: key,
Size: int64(len(data)),
}
}
}
}()
return ch
}
func (s *testStorer) hasKey(key string) bool {
s.mu.Lock()
defer s.mu.Unlock()
_, exists := s.data[key]
return exists
}
func (s *testStorer) keyCount() int {
s.mu.Lock()
defer s.mu.Unlock()
return len(s.data)
}
func (s *testStorer) Info() storage.StorageInfo {
return storage.StorageInfo{
Type: "test",
Location: "memory",
}
}
// addManifest creates a compressed manifest in storage
func addManifest(t *testing.T, store *testStorer, snapshotID string, blobHashes []string) {
t.Helper()
blobs := make([]snapshot.BlobInfo, len(blobHashes))
for i, hash := range blobHashes {
blobs[i] = snapshot.BlobInfo{
Hash: hash,
CompressedSize: 1000,
}
}
manifest := &snapshot.Manifest{
SnapshotID: snapshotID,
BlobCount: len(blobs),
Blobs: blobs,
}
data, err := snapshot.EncodeManifest(manifest, 3)
require.NoError(t, err)
key := "metadata/" + snapshotID + "/manifest.json.zst"
err = store.Put(context.Background(), key, bytes.NewReader(data))
require.NoError(t, err)
}
// addBlob adds a fake blob to storage
func addBlob(t *testing.T, store *testStorer, hash string) {
t.Helper()
// Create zstd compressed data
var buf bytes.Buffer
writer, _ := zstd.NewWriter(&buf)
_, _ = writer.Write([]byte("blob data"))
_ = writer.Close()
key := "blobs/" + hash[:2] + "/" + hash[2:4] + "/" + hash
err := store.Put(context.Background(), key, bytes.NewReader(buf.Bytes()))
require.NoError(t, err)
}
// ============================================================================
// Unit Tests for RemoveSnapshot
// ============================================================================
func TestRemoveSnapshot_LocalOnly(t *testing.T) {
log.Initialize(log.Config{})
store := newTestStorer()
blobA := "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
addManifest(t, store, "snapshot-001", []string{blobA})
addBlob(t, store, blobA)
tv := vaultik.NewForTesting(store)
opts := &vaultik.RemoveOptions{Force: true}
result, err := tv.RemoveSnapshot("snapshot-001", opts)
require.NoError(t, err)
assert.Equal(t, "snapshot-001", result.SnapshotID)
assert.False(t, result.RemoteRemoved)
// Blobs should NOT be deleted (that's what prune is for)
assert.True(t, store.hasKey("blobs/aa/aa/"+blobA))
// Remote metadata should NOT be deleted (no --remote flag)
assert.True(t, store.hasKey("metadata/snapshot-001/manifest.json.zst"))
// Verify output
assert.Contains(t, tv.Stdout.String(), "Removed snapshot 'snapshot-001' from local database")
}
func TestRemoveSnapshot_WithRemote(t *testing.T) {
log.Initialize(log.Config{})
store := newTestStorer()
blobA := "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
addManifest(t, store, "snapshot-001", []string{blobA})
addBlob(t, store, blobA)
tv := vaultik.NewForTesting(store)
opts := &vaultik.RemoveOptions{Force: true, Remote: true}
result, err := tv.RemoveSnapshot("snapshot-001", opts)
require.NoError(t, err)
assert.Equal(t, "snapshot-001", result.SnapshotID)
assert.True(t, result.RemoteRemoved)
// Blobs should NOT be deleted
assert.True(t, store.hasKey("blobs/aa/aa/"+blobA))
// Remote metadata SHOULD be deleted
assert.False(t, store.hasKey("metadata/snapshot-001/manifest.json.zst"))
// Verify output mentions prune
assert.Contains(t, tv.Stdout.String(), "Removed snapshot 'snapshot-001' from local database")
assert.Contains(t, tv.Stdout.String(), "Removed snapshot metadata from remote storage")
assert.Contains(t, tv.Stdout.String(), "Run 'vaultik prune' to remove orphaned blobs")
}
func TestRemoveSnapshot_DryRun(t *testing.T) {
log.Initialize(log.Config{})
store := newTestStorer()
blobA := "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
addManifest(t, store, "snapshot-001", []string{blobA})
addBlob(t, store, blobA)
initialCount := store.keyCount()
tv := vaultik.NewForTesting(store)
opts := &vaultik.RemoveOptions{Force: true, DryRun: true, Remote: true}
result, err := tv.RemoveSnapshot("snapshot-001", opts)
require.NoError(t, err)
assert.True(t, result.DryRun)
// Nothing should be deleted
assert.Equal(t, initialCount, store.keyCount())
assert.True(t, store.hasKey("blobs/aa/aa/"+blobA))
assert.True(t, store.hasKey("metadata/snapshot-001/manifest.json.zst"))
// Verify dry run message
assert.Contains(t, tv.Stdout.String(), "[Dry run - no changes made]")
}
func TestRemoveAllSnapshots_RequiresForce(t *testing.T) {
log.Initialize(log.Config{})
store := newTestStorer()
addManifest(t, store, "snapshot-001", []string{})
addManifest(t, store, "snapshot-002", []string{})
tv := vaultik.NewForTesting(store)
opts := &vaultik.RemoveOptions{All: true} // No Force
_, err := tv.RemoveAllSnapshots(opts)
assert.Error(t, err)
assert.Contains(t, err.Error(), "--all requires --force")
}
func TestRemoveAllSnapshots_WithForce(t *testing.T) {
log.Initialize(log.Config{})
store := newTestStorer()
blobA := "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
addManifest(t, store, "snapshot-001", []string{blobA})
addManifest(t, store, "snapshot-002", []string{blobA})
addBlob(t, store, blobA)
tv := vaultik.NewForTesting(store)
opts := &vaultik.RemoveOptions{All: true, Force: true, Remote: true}
result, err := tv.RemoveAllSnapshots(opts)
require.NoError(t, err)
assert.Len(t, result.SnapshotsRemoved, 2)
assert.True(t, result.RemoteRemoved)
// Blobs should NOT be deleted
assert.True(t, store.hasKey("blobs/aa/aa/"+blobA))
// Remote metadata SHOULD be deleted
assert.False(t, store.hasKey("metadata/snapshot-001/manifest.json.zst"))
assert.False(t, store.hasKey("metadata/snapshot-002/manifest.json.zst"))
// Verify output
assert.Contains(t, tv.Stdout.String(), "Removed 2 snapshot(s)")
assert.Contains(t, tv.Stdout.String(), "Run 'vaultik prune' to remove orphaned blobs")
}
func TestRemoveAllSnapshots_DryRun(t *testing.T) {
log.Initialize(log.Config{})
store := newTestStorer()
addManifest(t, store, "snapshot-001", []string{})
addManifest(t, store, "snapshot-002", []string{})
initialCount := store.keyCount()
tv := vaultik.NewForTesting(store)
opts := &vaultik.RemoveOptions{All: true, Force: true, DryRun: true}
result, err := tv.RemoveAllSnapshots(opts)
require.NoError(t, err)
assert.True(t, result.DryRun)
assert.Len(t, result.SnapshotsRemoved, 2)
// Nothing should be deleted
assert.Equal(t, initialCount, store.keyCount())
// Verify dry run message
assert.Contains(t, tv.Stdout.String(), "[Dry run - no changes made]")
}
func TestRemoveAllSnapshots_NoSnapshots(t *testing.T) {
log.Initialize(log.Config{})
store := newTestStorer()
// No snapshots added
tv := vaultik.NewForTesting(store)
opts := &vaultik.RemoveOptions{All: true, Force: true}
result, err := tv.RemoveAllSnapshots(opts)
require.NoError(t, err)
assert.Len(t, result.SnapshotsRemoved, 0)
// Verify output
assert.Contains(t, tv.Stdout.String(), "No snapshots found")
}

View File

@@ -7,26 +7,18 @@ import (
"encoding/hex"
"fmt"
"io"
"math"
"os"
"path/filepath"
"time"
"filippo.io/age"
"git.eeqj.de/sneak/vaultik/internal/blobgen"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/dustin/go-humanize"
"github.com/schollz/progressbar/v3"
"github.com/spf13/afero"
"golang.org/x/term"
)
const (
// progressBarWidth is the character width of the progress bar display.
progressBarWidth = 40
// progressBarThrottle is the minimum interval between progress bar redraws.
progressBarThrottle = 100 * time.Millisecond
"sneak.berlin/go/vaultik/internal/blobgen"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/types"
)
// RestoreOptions contains options for the restore operation
@@ -35,6 +27,7 @@ type RestoreOptions struct {
TargetDir string
Paths []string // Optional paths to restore (empty = all)
Verify bool // Verify restored files by checking chunk hashes
SkipErrors bool // Continue past file-restore errors instead of aborting
}
// RestoreResult contains statistics from a restore operation
@@ -92,10 +85,12 @@ func (v *Vaultik) Restore(opts *RestoreOptions) error {
if len(files) == 0 {
log.Warn("No files found to restore")
v.UI.Warning("No files found to restore.")
return nil
}
log.Info("Found files to restore", "count", len(files))
v.UI.Info("Found %s files to restore.", v.UI.Count(len(files)))
// Step 3: Create target directory
if err := v.Fs.MkdirAll(opts.TargetDir, 0755); err != nil {
@@ -124,16 +119,16 @@ func (v *Vaultik) Restore(opts *RestoreOptions) error {
"duration", result.Duration,
)
v.printfStdout("Restored %d files (%s) in %s\n",
result.FilesRestored,
humanize.Bytes(uint64(result.BytesRestored)),
result.Duration.Round(time.Second),
v.UI.Complete("Restored %s files (%s) in %s.",
v.UI.Count(result.FilesRestored),
v.UI.Size(result.BytesRestored),
v.UI.Duration(result.Duration),
)
if result.FilesFailed > 0 {
_, _ = fmt.Fprintf(v.Stdout, "\nWARNING: %d file(s) failed to restore:\n", result.FilesFailed)
v.UI.Warning("%d file(s) failed to restore:", result.FilesFailed)
for _, path := range result.FailedFiles {
_, _ = fmt.Fprintf(v.Stdout, " - %s\n", path)
v.UI.Detail("%s", v.UI.Path(path))
}
}
@@ -164,7 +159,12 @@ func (v *Vaultik) prepareRestoreIdentity() (age.Identity, error) {
return identity, nil
}
// restoreAllFiles iterates over files and restores each one, tracking progress and failures
// restoreAllFiles processes files in blob-locality order: drain every
// file whose blob set is on disk, download the missing blobs for the
// pending file with the smallest uncached count, repeat. This keeps
// peak cache occupancy near 1 even on snapshots whose path order
// interleaves blobs, and lets the sweeper free each blob the moment
// its file set is exhausted.
func (v *Vaultik) restoreAllFiles(
files []*database.File,
repos *database.Repositories,
@@ -173,56 +173,199 @@ func (v *Vaultik) restoreAllFiles(
chunkToBlobMap map[string]*database.BlobChunk,
) (*RestoreResult, error) {
result := &RestoreResult{}
blobCache, err := newBlobDiskCache(4 * v.Config.BlobSizeLimit.Int64())
// The restore-side blob cache is unbounded — restores may read any
// blob many times across deduplicated files and we want to avoid
// re-downloading until we can prove a blob is no longer needed.
// Cleanup is driven by the sweeper below, not by LRU.
blobCache, err := newBlobDiskCache(math.MaxInt64)
if err != nil {
return nil, fmt.Errorf("creating blob cache: %w", err)
}
defer func() { _ = blobCache.Close() }()
if v.restoreCacheObserver != nil {
v.restoreCacheObserver(blobCache)
}
defer func() {
if v.restoreCacheObserver != nil {
v.restoreCacheObserver(blobCache)
}
_ = blobCache.Close()
}()
// Calculate total bytes for progress bar
// Per-restore sweep state: every blob_size_limit/100 bytes written,
// scan the cache and delete any blob whose remaining file references
// are all already restored.
sweeper := newRestoreSweeper(v.ctx, repos, blobCache, v.Config.BlobSizeLimit.Int64()/100)
// Pre-fetch every blob row once so chunk extraction can map a
// blob_id to its hash without a DB round-trip per chunk.
blobsByID, err := repos.Blobs.GetAll(v.ctx)
if err != nil {
return nil, fmt.Errorf("fetching blob index: %w", err)
}
blobIDToHash := make(map[string]string, len(blobsByID))
blobByHash := make(map[string]*database.Blob, len(blobsByID))
for id, blob := range blobsByID {
hash := blob.Hash.String()
blobIDToHash[id] = hash
blobByHash[hash] = blob
}
plan, err := newRestorePlan(v.ctx, repos, files, chunkToBlobMap, blobIDToHash)
if err != nil {
return nil, fmt.Errorf("building restore plan: %w", err)
}
// Index files by ID so the loop can look them up by the IDs the
// plan hands back.
filesByID := make(map[types.FileID]*database.File, len(files))
for _, f := range files {
filesByID[f.ID] = f
}
// Calculate total bytes expected for percentage / ETA arithmetic.
var totalBytesExpected int64
for _, file := range files {
totalBytesExpected += file.Size
}
// Create progress bar if output is a terminal
bar := v.newProgressBar("Restoring", totalBytesExpected)
v.UI.Begin("Restoring %s files (%s) to %s.",
v.UI.Count(len(files)),
v.UI.Size(totalBytesExpected),
v.UI.Path(opts.TargetDir))
for i, file := range files {
session := &restoreSession{
v: v,
ctx: v.ctx,
repos: repos,
opts: opts,
identity: identity,
chunkToBlobMap: chunkToBlobMap,
blobByHash: blobByHash,
blobIDToHash: blobIDToHash,
blobCache: blobCache,
sweeper: sweeper,
result: result,
}
// Periodic progress output, matching the snapshot create cadence.
startTime := time.Now()
lastStatusTime := startTime
const statusInterval = 15 * time.Second
processed := 0
for plan.hasPending() {
if v.ctx.Err() != nil {
return nil, v.ctx.Err()
}
if err := v.restoreFile(v.ctx, repos, file, opts.TargetDir, identity, chunkToBlobMap, blobCache, result); err != nil {
log.Error("Failed to restore file", "path", file.Path, "error", err)
result.FilesFailed++
result.FailedFiles = append(result.FailedFiles, file.Path.String())
// Update progress bar even on failure
if bar != nil {
_ = bar.Add64(file.Size)
fileID, ready := plan.popReady()
if !ready {
// No file is fully cache-served. First free any blobs
// whose file sets are exhausted — without this, the
// blob whose last file we just finished would still be
// cached when we Put the next one, briefly pushing
// peak occupancy from 1 to 2.
sweeper.sweep()
// Pick the pending file with the smallest uncached
// blob set and download its blobs. After each blob
// lands, the plan moves any pending file whose set
// just emptied onto the ready queue.
next := plan.pickNextDownload()
if next.IsZero() {
break
}
for _, hash := range plan.blobsNeeded(next) {
blob, ok := blobByHash[hash]
if !ok {
return nil, fmt.Errorf("blob hash %s missing from blob index", hash[:16])
}
if err := session.downloadBlobToCache(hash, blob.CompressedSize); err != nil {
return nil, fmt.Errorf("downloading blob %s: %w", hash[:16], err)
}
result.BlobsDownloaded++
result.BytesDownloaded += blob.CompressedSize
plan.markBlobCached(hash)
}
continue
}
// Update progress bar
if bar != nil {
_ = bar.Add64(file.Size)
file := filesByID[fileID]
if err := session.restoreFile(file); err != nil {
log.Error("Failed to restore file", "path", file.Path, "error", err)
if !opts.SkipErrors {
return nil, fmt.Errorf("restoring %s: %w (pass --skip-errors to continue past restore failures)", file.Path, err)
}
v.UI.Error("Failed to restore %s: %v. Skipping (--skip-errors).", v.UI.Path(file.Path.String()), err)
result.FilesFailed++
result.FailedFiles = append(result.FailedFiles, file.Path.String())
plan.finishFile(fileID)
continue
}
// Progress logging (for non-terminal or structured logs)
if (i+1)%100 == 0 || i+1 == len(files) {
// Record the file as restored so the sweeper can free blobs
// once all referencing files are done, and drop it from the
// plan's indexes so future picks ignore it.
sweeper.fileRestored(fileID.String())
plan.finishFile(fileID)
processed++
if time.Since(lastStatusTime) >= statusInterval {
v.printRestoreProgress(processed, len(files), result.BytesRestored, totalBytesExpected, startTime)
lastStatusTime = time.Now()
}
// Structured progress log for --verbose / JSON consumers.
if processed%100 == 0 || processed == len(files) {
log.Info("Restore progress",
"files", fmt.Sprintf("%d/%d", i+1, len(files)),
"files", fmt.Sprintf("%d/%d", processed, len(files)),
"bytes", humanize.Bytes(uint64(result.BytesRestored)),
)
}
}
if bar != nil {
_ = bar.Finish()
return result, nil
}
// printRestoreProgress emits a periodic restore-phase status line via
// the UI writer, mirroring scanner.printProcessingProgress so the two
// long-running commands have the same on-screen rhythm.
func (v *Vaultik) printRestoreProgress(filesDone, totalFiles int, bytesDone, totalBytes int64, startTime time.Time) {
elapsed := time.Since(startTime)
pct := float64(bytesDone) / float64(totalBytes) * 100
byteRate := float64(bytesDone) / elapsed.Seconds()
fileRate := float64(filesDone) / elapsed.Seconds()
remainingBytes := totalBytes - bytesDone
var eta time.Duration
if byteRate > 0 && remainingBytes > 0 {
eta = time.Duration(float64(remainingBytes)/byteRate) * time.Second
}
return result, nil
if eta > 0 {
v.UI.Progress("Restore: %s/%s files (%s), %s/%s, %s, %.0f files/sec, restore elapsed: %s, restore ETA: %s (est remain %s).",
v.UI.Count(filesDone),
v.UI.Count(totalFiles),
v.UI.Percent(pct),
v.UI.Size(bytesDone),
v.UI.Size(totalBytes),
v.UI.Speed(byteRate),
fileRate,
v.UI.Duration(elapsed),
v.UI.Time(time.Now().Add(eta)),
v.UI.Duration(eta))
return
}
v.UI.Progress("Restore: %s/%s files (%s), %s/%s, %s, %.0f files/sec, restore elapsed: %s.",
v.UI.Count(filesDone),
v.UI.Count(totalFiles),
v.UI.Percent(pct),
v.UI.Size(bytesDone),
v.UI.Size(totalBytes),
v.UI.Speed(byteRate),
fileRate,
v.UI.Duration(elapsed))
}
// handleRestoreVerification runs post-restore verification if requested
@@ -237,17 +380,17 @@ func (v *Vaultik) handleRestoreVerification(
}
if result.FilesFailed > 0 {
v.printfStdout("\nVerification FAILED: %d files did not match expected checksums\n", result.FilesFailed)
v.UI.Error("Verification failed: %s files did not match expected checksums.",
v.UI.Count(result.FilesFailed))
for _, path := range result.FailedFiles {
v.printfStdout(" - %s\n", path)
v.UI.Detail("%s", v.UI.Path(path))
}
return fmt.Errorf("%d files failed verification", result.FilesFailed)
}
v.printfStdout("Verified %d files (%s)\n",
result.FilesVerified,
humanize.Bytes(uint64(result.BytesVerified)),
)
v.UI.Complete("Verified %s files (%s).",
v.UI.Count(result.FilesVerified),
v.UI.Size(result.BytesVerified))
return nil
}
@@ -372,209 +515,211 @@ func (v *Vaultik) buildChunkToBlobMap(ctx context.Context, repos *database.Repos
return result, rows.Err()
}
// restoreFile restores a single file
func (v *Vaultik) restoreFile(
ctx context.Context,
repos *database.Repositories,
file *database.File,
targetDir string,
identity age.Identity,
chunkToBlobMap map[string]*database.BlobChunk,
blobCache *blobDiskCache,
result *RestoreResult,
) error {
// Calculate target path - use full original path under target directory
targetPath := filepath.Join(targetDir, file.Path.String())
// Create parent directories
parentDir := filepath.Dir(targetPath)
if err := v.Fs.MkdirAll(parentDir, 0755); err != nil {
return fmt.Errorf("creating parent directory: %w", err)
}
// Handle symlinks
if file.IsSymlink() {
return v.restoreSymlink(file, targetPath, result)
}
// Handle directories
if file.Mode&uint32(os.ModeDir) != 0 {
return v.restoreDirectory(file, targetPath, result)
}
// Handle regular files
return v.restoreRegularFile(ctx, repos, file, targetPath, identity, chunkToBlobMap, blobCache, result)
// restoreSession holds every piece of per-restore state shared by the
// restore-time methods. Each restore builds one of these from the
// snapshot's metadata and then drives the file loop through methods on
// it. Keeping this state on the struct rather than threading it
// through every function signature keeps the inner-loop call sites
// readable: restoreFile(file) instead of a ten-argument helper.
type restoreSession struct {
v *Vaultik
ctx context.Context
repos *database.Repositories
opts *RestoreOptions
identity age.Identity
chunkToBlobMap map[string]*database.BlobChunk
blobByHash map[string]*database.Blob
blobIDToHash map[string]string
blobCache *blobDiskCache
sweeper *restoreSweeper
result *RestoreResult
}
// restoreSymlink restores a symbolic link
func (v *Vaultik) restoreSymlink(file *database.File, targetPath string, result *RestoreResult) error {
// Remove existing file if it exists
_ = v.Fs.Remove(targetPath)
// restoreFile dispatches to the right per-kind restorer.
func (s *restoreSession) restoreFile(file *database.File) error {
targetPath := filepath.Join(s.opts.TargetDir, file.Path.String())
parentDir := filepath.Dir(targetPath)
if err := s.v.Fs.MkdirAll(parentDir, 0755); err != nil {
return fmt.Errorf("creating parent directory: %w", err)
}
if file.IsSymlink() {
return s.restoreSymlink(file, targetPath)
}
if file.Mode&uint32(os.ModeDir) != 0 {
return s.restoreDirectory(file, targetPath)
}
return s.restoreRegularFile(file, targetPath)
}
// Create symlink
// Note: afero.MemMapFs doesn't support symlinks, so we use os for real filesystems
if osFs, ok := v.Fs.(*afero.OsFs); ok {
_ = osFs // silence unused variable warning
// restoreSymlink restores a symbolic link.
func (s *restoreSession) restoreSymlink(file *database.File, targetPath string) error {
_ = s.v.Fs.Remove(targetPath)
// afero.MemMapFs doesn't support symlinks, so route real-FS
// symlinks through os.
if _, ok := s.v.Fs.(*afero.OsFs); ok {
if err := os.Symlink(file.LinkTarget.String(), targetPath); err != nil {
return fmt.Errorf("creating symlink: %w", err)
}
} else {
log.Debug("Symlink creation not supported on this filesystem", "path", file.Path, "target", file.LinkTarget)
}
result.FilesRestored++
s.result.FilesRestored++
log.Debug("Restored symlink", "path", file.Path, "target", file.LinkTarget)
return nil
}
// restoreDirectory restores a directory with proper permissions
func (v *Vaultik) restoreDirectory(file *database.File, targetPath string, result *RestoreResult) error {
// Create directory
if err := v.Fs.MkdirAll(targetPath, os.FileMode(file.Mode)); err != nil {
// restoreDirectory restores a directory with its permissions, mtime,
// and (on real filesystems, with sufficient privileges) ownership.
func (s *restoreSession) restoreDirectory(file *database.File, targetPath string) error {
if err := s.v.Fs.MkdirAll(targetPath, os.FileMode(file.Mode)); err != nil {
return fmt.Errorf("creating directory: %w", err)
}
// Set permissions
if err := v.Fs.Chmod(targetPath, os.FileMode(file.Mode)); err != nil {
if err := s.v.Fs.Chmod(targetPath, os.FileMode(file.Mode)); err != nil {
log.Debug("Failed to set directory permissions", "path", targetPath, "error", err)
}
// Set ownership (requires root)
if osFs, ok := v.Fs.(*afero.OsFs); ok {
_ = osFs
if _, ok := s.v.Fs.(*afero.OsFs); ok {
if err := os.Chown(targetPath, int(file.UID), int(file.GID)); err != nil {
log.Debug("Failed to set directory ownership", "path", targetPath, "error", err)
}
}
// Set mtime
if err := v.Fs.Chtimes(targetPath, file.MTime, file.MTime); err != nil {
if err := s.v.Fs.Chtimes(targetPath, file.MTime, file.MTime); err != nil {
log.Debug("Failed to set directory mtime", "path", targetPath, "error", err)
}
result.FilesRestored++
s.result.FilesRestored++
return nil
}
// restoreRegularFile restores a regular file by reconstructing it from chunks
func (v *Vaultik) restoreRegularFile(
ctx context.Context,
repos *database.Repositories,
file *database.File,
targetPath string,
identity age.Identity,
chunkToBlobMap map[string]*database.BlobChunk,
blobCache *blobDiskCache,
result *RestoreResult,
) error {
// Get file chunks in order
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
// restoreRegularFile reconstructs a regular file by reading chunks
// directly out of cached blobs via ReadAt. The expectation when this
// method runs is that every blob this file needs is already in the
// disk cache — the planner guarantees that by only marking files
// "ready" once their full blob set is on disk.
func (s *restoreSession) restoreRegularFile(file *database.File, targetPath string) error {
fileStart := time.Now()
t0 := time.Now()
fileChunks, err := s.repos.FileChunks.GetByFileID(s.ctx, file.ID)
fileChunksQueryDur := time.Since(t0)
if err != nil {
return fmt.Errorf("getting file chunks: %w", err)
}
// Create output file
outFile, err := v.Fs.Create(targetPath)
t0 = time.Now()
outFile, err := s.v.Fs.Create(targetPath)
createDur := time.Since(t0)
if err != nil {
return fmt.Errorf("creating output file: %w", err)
}
defer func() { _ = outFile.Close() }()
// Write chunks in order
var bytesWritten int64
var (
readAtDur time.Duration
writeDur time.Duration
sweeperDur time.Duration
bytesWritten int64
)
for _, fc := range fileChunks {
// Find which blob contains this chunk
chunkHashStr := fc.ChunkHash.String()
blobChunk, ok := chunkToBlobMap[chunkHashStr]
blobChunk, ok := s.chunkToBlobMap[chunkHashStr]
if !ok {
return fmt.Errorf("chunk %s not found in any blob", chunkHashStr[:16])
}
// Get the blob's hash from the database
blob, err := repos.Blobs.GetByID(ctx, blobChunk.BlobID.String())
if err != nil {
return fmt.Errorf("getting blob %s: %w", blobChunk.BlobID, err)
}
// Download and decrypt blob if not cached
blobHashStr := blob.Hash.String()
blobData, ok := blobCache.Get(blobHashStr)
blobHash, ok := s.blobIDToHash[blobChunk.BlobID.String()]
if !ok {
blobData, err = v.downloadBlob(ctx, blobHashStr, blob.CompressedSize, identity)
if err != nil {
return fmt.Errorf("downloading blob %s: %w", blobHashStr[:16], err)
}
if putErr := blobCache.Put(blobHashStr, blobData); putErr != nil {
log.Debug("Failed to cache blob on disk", "hash", blobHashStr[:16], "error", putErr)
}
result.BlobsDownloaded++
result.BytesDownloaded += blob.CompressedSize
return fmt.Errorf("blob id %s missing from hash index", blobChunk.BlobID)
}
// Extract chunk from blob
if blobChunk.Offset+blobChunk.Length > int64(len(blobData)) {
return fmt.Errorf("chunk %s extends beyond blob data (offset=%d, length=%d, blob_size=%d)",
fc.ChunkHash[:16], blobChunk.Offset, blobChunk.Length, len(blobData))
t0 = time.Now()
chunkData, err := s.blobCache.ReadAt(blobHash, blobChunk.Offset, blobChunk.Length)
readAtDur += time.Since(t0)
if err != nil {
return fmt.Errorf("reading chunk %s from cached blob %s: %w", fc.ChunkHash[:16], blobHash[:16], err)
}
chunkData := blobData[blobChunk.Offset : blobChunk.Offset+blobChunk.Length]
// Write chunk to output file
t0 = time.Now()
n, err := outFile.Write(chunkData)
writeDur += time.Since(t0)
if err != nil {
return fmt.Errorf("writing chunk: %w", err)
}
bytesWritten += int64(n)
t0 = time.Now()
s.sweeper.chunkRestored(int64(n))
sweeperDur += time.Since(t0)
}
// Close file before setting metadata
log.Debug("Restored regular file (timings)",
"path", file.Path,
"chunks", len(fileChunks),
"bytes_written", bytesWritten,
"ms_total", time.Since(fileStart).Milliseconds(),
"ms_file_chunks_query", fileChunksQueryDur.Milliseconds(),
"ms_create", createDur.Milliseconds(),
"ms_readat", readAtDur.Milliseconds(),
"ms_writes", writeDur.Milliseconds(),
"ms_sweeper", sweeperDur.Milliseconds(),
)
if err := outFile.Close(); err != nil {
return fmt.Errorf("closing output file: %w", err)
}
// Set permissions
if err := v.Fs.Chmod(targetPath, os.FileMode(file.Mode)); err != nil {
if err := s.v.Fs.Chmod(targetPath, os.FileMode(file.Mode)); err != nil {
log.Debug("Failed to set file permissions", "path", targetPath, "error", err)
}
// Set ownership (requires root)
if osFs, ok := v.Fs.(*afero.OsFs); ok {
_ = osFs
if _, ok := s.v.Fs.(*afero.OsFs); ok {
if err := os.Chown(targetPath, int(file.UID), int(file.GID)); err != nil {
log.Debug("Failed to set file ownership", "path", targetPath, "error", err)
}
}
// Set mtime
if err := v.Fs.Chtimes(targetPath, file.MTime, file.MTime); err != nil {
if err := s.v.Fs.Chtimes(targetPath, file.MTime, file.MTime); err != nil {
log.Debug("Failed to set file mtime", "path", targetPath, "error", err)
}
result.FilesRestored++
result.BytesRestored += bytesWritten
s.result.FilesRestored++
s.result.BytesRestored += bytesWritten
log.Debug("Restored file", "path", file.Path, "size", humanize.Bytes(uint64(bytesWritten)))
return nil
}
// downloadBlob downloads and decrypts a blob
func (v *Vaultik) downloadBlob(ctx context.Context, blobHash string, expectedSize int64, identity age.Identity) ([]byte, error) {
rc, err := v.FetchAndDecryptBlob(ctx, blobHash, expectedSize, identity)
// downloadBlobToCache streams a blob from remote storage straight into
// the disk cache, decrypting and decompressing on the fly. The
// plaintext never lives fully in memory — io.Copy through
// blobDiskCache.PutFromReader uses a 32 KiB buffer regardless of blob
// size, which is what makes multi-GB blobs tractable on machines with
// less RAM than the blob.
func (s *restoreSession) downloadBlobToCache(blobHash string, expectedSize int64) error {
start := time.Now()
t0 := time.Now()
rc, err := s.v.FetchAndDecryptBlob(s.ctx, blobHash, expectedSize, s.identity)
fetchSetupDur := time.Since(t0)
if err != nil {
return nil, err
return err
}
data, err := io.ReadAll(rc)
if err != nil {
_ = rc.Close()
return nil, fmt.Errorf("reading blob data: %w", err)
t0 = time.Now()
written, copyErr := s.blobCache.PutFromReader(blobHash, rc)
streamDur := time.Since(t0)
closeErr := rc.Close()
if copyErr != nil {
return copyErr
}
if closeErr != nil {
return closeErr
}
// Close triggers hash verification
if err := rc.Close(); err != nil {
return nil, err
}
return data, nil
log.Debug("Streamed blob into disk cache",
"hash", blobHash[:16],
"compressed_bytes", expectedSize,
"plaintext_bytes", written,
"ms_total", time.Since(start).Milliseconds(),
"ms_fetch_setup", fetchSetupDur.Milliseconds(),
"ms_stream_decrypt_decompress", streamDur.Milliseconds(),
)
return nil
}
// verifyRestoredFiles verifies that all restored files match their expected chunk hashes
@@ -606,16 +751,16 @@ func (v *Vaultik) verifyRestoredFiles(
"files", len(regularFiles),
"bytes", humanize.Bytes(uint64(totalBytes)),
)
v.printfStdout("\nVerifying %d files (%s)...\n",
len(regularFiles),
humanize.Bytes(uint64(totalBytes)),
)
v.UI.Begin("Verifying %s files (%s).",
v.UI.Count(len(regularFiles)),
v.UI.Size(totalBytes))
// Create progress bar if output is a terminal
bar := v.newProgressBar("Verifying", totalBytes)
startTime := time.Now()
lastStatusTime := startTime
const statusInterval = 15 * time.Second
// Verify each file
for _, file := range regularFiles {
var bytesProcessed int64
for i, file := range regularFiles {
if ctx.Err() != nil {
return ctx.Err()
}
@@ -630,17 +775,14 @@ func (v *Vaultik) verifyRestoredFiles(
result.FilesVerified++
result.BytesVerified += bytesVerified
}
bytesProcessed += file.Size
// Update progress bar
if bar != nil {
_ = bar.Add64(file.Size)
if time.Since(lastStatusTime) >= statusInterval {
v.printVerifyProgress(i+1, len(regularFiles), bytesProcessed, totalBytes, startTime)
lastStatusTime = time.Now()
}
}
if bar != nil {
_ = bar.Finish()
}
log.Info("Verification complete",
"files_verified", result.FilesVerified,
"bytes_verified", humanize.Bytes(uint64(result.BytesVerified)),
@@ -650,6 +792,46 @@ func (v *Vaultik) verifyRestoredFiles(
return nil
}
// printVerifyProgress emits a periodic verify-phase status line. Same
// shape as the restore progress line so user-facing pacing is uniform
// across the two phases.
func (v *Vaultik) printVerifyProgress(filesDone, totalFiles int, bytesDone, totalBytes int64, startTime time.Time) {
elapsed := time.Since(startTime)
pct := float64(bytesDone) / float64(totalBytes) * 100
byteRate := float64(bytesDone) / elapsed.Seconds()
fileRate := float64(filesDone) / elapsed.Seconds()
remainingBytes := totalBytes - bytesDone
var eta time.Duration
if byteRate > 0 && remainingBytes > 0 {
eta = time.Duration(float64(remainingBytes)/byteRate) * time.Second
}
if eta > 0 {
v.UI.Progress("Verify: %s/%s files (%s), %s/%s, %s, %.0f files/sec, verify elapsed: %s, verify ETA: %s (est remain %s).",
v.UI.Count(filesDone),
v.UI.Count(totalFiles),
v.UI.Percent(pct),
v.UI.Size(bytesDone),
v.UI.Size(totalBytes),
v.UI.Speed(byteRate),
fileRate,
v.UI.Duration(elapsed),
v.UI.Time(time.Now().Add(eta)),
v.UI.Duration(eta))
return
}
v.UI.Progress("Verify: %s/%s files (%s), %s/%s, %s, %.0f files/sec, verify elapsed: %s.",
v.UI.Count(filesDone),
v.UI.Count(totalFiles),
v.UI.Percent(pct),
v.UI.Size(bytesDone),
v.UI.Size(totalBytes),
v.UI.Speed(byteRate),
fileRate,
v.UI.Duration(elapsed))
}
// verifyFile verifies a single restored file by checking its chunk hashes
func (v *Vaultik) verifyFile(
ctx context.Context,
@@ -705,38 +887,3 @@ func (v *Vaultik) verifyFile(
log.Debug("File verified", "path", file.Path, "bytes", bytesVerified, "chunks", len(fileChunks))
return bytesVerified, nil
}
// newProgressBar creates a terminal-aware progress bar with standard options.
// It returns nil if stdout is not a terminal.
func (v *Vaultik) newProgressBar(description string, total int64) *progressbar.ProgressBar {
if !v.isTerminal() {
return nil
}
return progressbar.NewOptions64(
total,
progressbar.OptionSetDescription(description),
progressbar.OptionSetWriter(v.Stderr),
progressbar.OptionShowBytes(true),
progressbar.OptionShowCount(),
progressbar.OptionSetWidth(progressBarWidth),
progressbar.OptionThrottle(progressBarThrottle),
progressbar.OptionOnCompletion(func() {
v.printfStderr("\n")
}),
progressbar.OptionSetRenderBlankState(true),
)
}
// isTerminal returns true if stdout is a terminal.
// It checks whether v.Stdout implements Fd() (i.e. is an *os.File),
// and falls back to false for non-file writers (e.g. in tests).
func (v *Vaultik) isTerminal() bool {
type fder interface {
Fd() uintptr
}
f, ok := v.Stdout.(fder)
if !ok {
return false
}
return term.IsTerminal(int(f.Fd()))
}

View File

@@ -0,0 +1,315 @@
package vaultik
import (
"bytes"
"context"
"crypto/rand"
"fmt"
"io"
"os"
"path/filepath"
"sort"
"sync"
"testing"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/vaultik/internal/config"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/snapshot"
"sneak.berlin/go/vaultik/internal/storage"
"sneak.berlin/go/vaultik/internal/ui"
)
// TestRestoreLocalityAndReadAt asserts three properties of the restore
// hot path that together produce acceptable throughput on real-world
// snapshots. All three currently fail on main:
//
// 1. Peak blob cache occupancy ≤ 1.
// Restore order must respect blob locality: every file fully
// contained within the currently cached blob should be restored
// before any other blob is downloaded. The sweeper then frees
// each blob as soon as its file set is exhausted. Without smart
// ordering, path-order interleaves blobs and the cache holds
// every touched blob until the last file referencing it lands.
//
// 2. Each remote blob is fetched exactly once.
// Counted via wrapping the Storer.
//
// 3. blobDiskCache.Get is never called during restore.
// Chunk extraction from a cached blob must go through ReadAt,
// which reads only the chunk's bytes from disk. Get reads the
// entire blob (up to 50 GB in production) into memory just to
// slice out a few KB — currently the dominant cost in restore.
//
// The test deliberately constructs an adversarial scenario: three
// blobs A/B/C of ~6 MB each, nine files distributed across them, and
// path-ordered names that interleave the blobs (a1, b1, c1, a2, b2,
// c2, …) so naive path-order processing would touch every blob before
// finishing any of them.
func TestRestoreLocalityAndReadAt(t *testing.T) {
log.Initialize(log.Config{})
fs := afero.NewOsFs()
tempDir, err := os.MkdirTemp("", "vaultik-locality-")
require.NoError(t, err)
defer func() { _ = os.RemoveAll(tempDir) }()
dataDir := filepath.Join(tempDir, "source")
storeDir := filepath.Join(tempDir, "remote")
restoreDir := filepath.Join(tempDir, "restored")
dbPath := filepath.Join(tempDir, "index.sqlite")
require.NoError(t, fs.MkdirAll(dataDir, 0o755))
// Layout: 15 source files of exactly 1 MiB each. With
// chunkSize (avg) = 4 MiB the chunker's minSize is 1 MiB, so any
// file of 1 MiB becomes a single chunk. With a 5 MiB blob limit
// the packer fits exactly 5 chunks per blob, producing 3 blobs
// containing src-001..005, src-006..010, src-011..015.
//
// Then add 9 "copy" files — byte-for-byte clones of three of the
// sources (one from each blob group) — with interleaved names
// (cp-001-A, cp-002-B, cp-003-C, cp-004-A, …) so a naive
// path-ordered restore would touch all three blobs before
// finishing any of them.
const (
srcBytes = 1024 * 1024
srcCount = 15
blobsCount = 3
perBlob = srcCount / blobsCount
)
type source struct {
path string
data []byte
}
sources := make([]*source, srcCount)
for i := 0; i < srcCount; i++ {
s := &source{
path: fmt.Sprintf("src-%03d.bin", i+1),
data: randomBytes(t, srcBytes),
}
sources[i] = s
require.NoError(t, afero.WriteFile(fs, filepath.Join(dataDir, s.path), s.data, 0o644))
}
// Pick one representative source per blob group (src-001 → blob
// 1, src-006 → blob 2, src-011 → blob 3) and create 3 copies of
// each with interleaved alphabetical names.
type copyFile struct {
path string
data []byte
sourceBlob int // 0, 1, or 2
sourceIndex int // index into sources slice
}
groupReps := []int{0, perBlob, 2 * perBlob} // 0, 5, 10
letters := []byte{'A', 'B', 'C'}
var copies []copyFile
for i := 0; i < 3; i++ {
for j := 0; j < blobsCount; j++ {
seq := i*blobsCount + j + 1
name := fmt.Sprintf("cp-%03d-%c.bin", seq, letters[j])
path := filepath.Join(dataDir, name)
src := sources[groupReps[j]]
require.NoError(t, afero.WriteFile(fs, path, src.data, 0o644))
copies = append(copies, copyFile{path: path, data: src.data, sourceBlob: j, sourceIndex: groupReps[j]})
}
}
// chunkSize avg = 4 MiB makes minSize = 1 MiB, so a 1 MiB file
// becomes one chunk. maxBlobSize = 5 MiB packs exactly 5 chunks
// per blob, yielding 3 blobs from 15 source files.
chunkSize := int64(4 * 1024 * 1024)
maxBlobSize := int64(5 * 1024 * 1024)
storer, err := storage.NewFileStorer(storeDir)
require.NoError(t, err)
agePublicKey := "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
ageSecretKey := "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
cfg := &config.Config{
AgeRecipients: []string{agePublicKey},
AgeSecretKey: ageSecretKey,
CompressionLevel: 3,
Hostname: "test-host",
BlobSizeLimit: config.Size(maxBlobSize),
}
ctx := context.Background()
db, err := database.New(ctx, dbPath)
require.NoError(t, err)
defer func() { _ = db.Close() }()
repos := database.NewRepositories(db)
sm := snapshot.NewSnapshotManager(snapshot.SnapshotManagerParams{
Repos: repos,
Storage: storer,
Config: cfg,
})
sm.SetFilesystem(fs)
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
Storage: storer,
ChunkSize: chunkSize,
MaxBlobSize: maxBlobSize,
CompressionLevel: cfg.CompressionLevel,
AgeRecipients: cfg.AgeRecipients,
Repositories: repos,
})
snapshotID, err := sm.CreateSnapshotWithName(ctx, cfg.Hostname, "locality", "test-version", "test-git")
require.NoError(t, err)
_, err = scanner.Scan(ctx, dataDir, snapshotID)
require.NoError(t, err)
require.NoError(t, sm.CompleteSnapshot(ctx, snapshotID))
require.NoError(t, sm.ExportSnapshotMetadata(ctx, dbPath, snapshotID))
blobsOnDisk := listBlobKeys(t, storeDir)
t.Logf("backup produced %d blobs", len(blobsOnDisk))
require.GreaterOrEqual(t, len(blobsOnDisk), 3, "expected at least 3 blobs from 3 filler groups")
require.NoError(t, db.Close())
// Wrap the storer so we can count downloads per blob key.
counter := newCountingStorer(storer)
// Capture the restore-side cache for instrumentation inspection.
// The observer fires twice (immediately after creation and
// immediately before close) so we read PeakLen and call counters
// from the same instance the production code used.
var cacheRef *blobDiskCache
v := &Vaultik{
Config: cfg,
Storage: counter,
Fs: fs,
Stdout: io.Discard,
Stderr: io.Discard,
UI: ui.NewWithColor(io.Discard, false),
restoreCacheObserver: func(c *blobDiskCache) {
cacheRef = c
},
}
v.SetContext(ctx)
require.NoError(t, v.Restore(&RestoreOptions{
SnapshotID: snapshotID,
TargetDir: restoreDir,
}))
require.NotNil(t, cacheRef, "restoreCacheObserver must fire during restore")
// Verify restored content matches.
for _, s := range sources {
restored := filepath.Join(restoreDir, dataDir, s.path)
got, err := afero.ReadFile(fs, restored)
require.NoErrorf(t, err, "source missing after restore: %s", s.path)
require.Truef(t, bytes.Equal(got, s.data), "byte mismatch for source %s", s.path)
}
for _, c := range copies {
restored := filepath.Join(restoreDir, c.path)
got, err := afero.ReadFile(fs, restored)
require.NoErrorf(t, err, "copy missing after restore: %s", c.path)
require.Truef(t, bytes.Equal(got, c.data), "byte mismatch for copy %s", c.path)
}
// (1) Each blob fetched exactly once.
for key, n := range counter.snapshot() {
if !filterBlobKey(key) {
continue
}
assert.Equalf(t, 1, n, "blob %s fetched %d times, want exactly 1", key, n)
}
// (2) Peak cache size ≤ 1. The sweeper plus locality-aware
// ordering should free each blob before the next one downloads.
assert.LessOrEqualf(t, cacheRef.PeakLen(), 1,
"peak cached blobs was %d; expected ≤ 1 with locality-ordered restore", cacheRef.PeakLen())
// (3) Cache.Get must never be called during restore — chunk
// extraction has to go through ReadAt so we never read the whole
// blob from disk to grab a few KB slice.
assert.Equalf(t, 0, cacheRef.GetCalls(),
"blobDiskCache.Get was called %d times during restore; restore must use ReadAt exclusively", cacheRef.GetCalls())
t.Logf("blob cache stats: peak_len=%d get_calls=%d readat_calls=%d",
cacheRef.PeakLen(), cacheRef.GetCalls(), cacheRef.ReadAtCalls())
}
// randomBytes returns n bytes of random data. Used to make sure the
// chunker picks non-degenerate FastCDC boundaries.
func randomBytes(t *testing.T, n int) []byte {
t.Helper()
b := make([]byte, n)
_, err := rand.Read(b)
require.NoError(t, err)
return b
}
// listBlobKeys walks the FileStorer blobs/ tree and returns the
// relative keys for every blob file present.
func listBlobKeys(t *testing.T, storeDir string) []string {
t.Helper()
var keys []string
root := filepath.Join(storeDir, "blobs")
err := filepath.Walk(root, func(p string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if info.IsDir() {
return nil
}
rel, _ := filepath.Rel(storeDir, p)
keys = append(keys, rel)
return nil
})
require.NoError(t, err)
sort.Strings(keys)
return keys
}
// filterBlobKey returns true when key looks like a blob storage path
// (rather than a snapshot metadata path).
func filterBlobKey(key string) bool {
return len(key) > 6 && key[:6] == "blobs/"
}
// countingStorerInternal wraps a storage.Storer and records the number
// of Get calls per key, so the locality test can assert each blob is
// fetched exactly once. Defined here (rather than reusing the one in
// the integration_test package) because this test lives in package
// vaultik for access to unexported cache internals.
type countingStorerInternal struct {
storage.Storer
mu sync.Mutex
counts map[string]int
}
func newCountingStorer(inner storage.Storer) *countingStorerInternal {
return &countingStorerInternal{Storer: inner, counts: make(map[string]int)}
}
func (c *countingStorerInternal) Get(ctx context.Context, key string) (io.ReadCloser, error) {
c.mu.Lock()
c.counts[key]++
c.mu.Unlock()
return c.Storer.Get(ctx, key)
}
func (c *countingStorerInternal) snapshot() map[string]int {
c.mu.Lock()
defer c.mu.Unlock()
out := make(map[string]int, len(c.counts))
for k, v := range c.counts {
out[k] = v
}
return out
}

View File

@@ -0,0 +1,185 @@
package vaultik
import (
"context"
"fmt"
"math"
"os"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/types"
)
// restorePlan orders restore-time file processing by blob locality. The
// goal is to keep the blob disk cache occupancy as small as possible:
// download one blob, drain every file referencing only that blob, let
// the sweeper free the blob, then move on. Files that span multiple
// blobs are processed when their full blob set is on disk.
//
// The plan keeps two indexes:
//
// - fileBlobs: for each pending file, the set of blob hashes it
// still needs that are NOT yet in the cache. Files with an empty
// set are "ready" — they can be restored from the current cache
// with no further downloads.
// - blobFiles: for each blob, the set of pending files referencing
// it. Used to short-circuit "when this blob lands, which files
// become ready" without a global scan.
type restorePlan struct {
fileBlobs map[types.FileID]map[string]struct{}
blobFiles map[string]map[types.FileID]struct{}
ready []types.FileID
cached map[string]struct{}
}
// newRestorePlan builds the file→blob index for the given files. Files
// whose chunks reference no blobs (symlinks, directories) start in the
// ready queue immediately.
func newRestorePlan(
ctx context.Context,
repos *database.Repositories,
files []*database.File,
chunkToBlobMap map[string]*database.BlobChunk,
blobIDToHash map[string]string,
) (*restorePlan, error) {
p := &restorePlan{
fileBlobs: make(map[types.FileID]map[string]struct{}, len(files)),
blobFiles: make(map[string]map[types.FileID]struct{}),
ready: make([]types.FileID, 0, len(files)),
cached: make(map[string]struct{}),
}
for _, f := range files {
if f.IsSymlink() || f.Mode&uint32(os.ModeDir) != 0 {
// No chunks to fetch — restore can run immediately.
p.fileBlobs[f.ID] = nil
p.ready = append(p.ready, f.ID)
continue
}
fileChunks, err := repos.FileChunks.GetByFileID(ctx, f.ID)
if err != nil {
return nil, fmt.Errorf("planning %s: %w", f.Path, err)
}
blobs := make(map[string]struct{})
for _, fc := range fileChunks {
bc, ok := chunkToBlobMap[fc.ChunkHash.String()]
if !ok {
return nil, fmt.Errorf("planning %s: chunk %s missing from blob map",
f.Path, fc.ChunkHash.String()[:16])
}
hash, ok := blobIDToHash[bc.BlobID.String()]
if !ok {
return nil, fmt.Errorf("planning %s: blob id %s missing from id-to-hash map",
f.Path, bc.BlobID)
}
blobs[hash] = struct{}{}
}
p.fileBlobs[f.ID] = blobs
for hash := range blobs {
set, ok := p.blobFiles[hash]
if !ok {
set = make(map[types.FileID]struct{})
p.blobFiles[hash] = set
}
set[f.ID] = struct{}{}
}
if len(blobs) == 0 {
p.ready = append(p.ready, f.ID)
}
}
return p, nil
}
// markBlobCached records that the named blob is now resident in the
// disk cache and moves any pending file whose remaining-uncached-blobs
// set just dropped to empty onto the ready queue.
func (p *restorePlan) markBlobCached(blobHash string) {
if _, already := p.cached[blobHash]; already {
return
}
p.cached[blobHash] = struct{}{}
for fileID := range p.blobFiles[blobHash] {
blobs := p.fileBlobs[fileID]
delete(blobs, blobHash)
if len(blobs) == 0 {
p.ready = append(p.ready, fileID)
}
}
}
// popReady returns the next ready file, removing it from the queue. If
// no file is ready, the second return value is false.
func (p *restorePlan) popReady() (types.FileID, bool) {
if len(p.ready) == 0 {
return types.FileID{}, false
}
id := p.ready[0]
p.ready = p.ready[1:]
return id, true
}
// finishFile drops a restored file from both indexes so subsequent
// planning calls don't reconsider it.
func (p *restorePlan) finishFile(fileID types.FileID) {
for hash := range p.fileBlobs[fileID] {
if set, ok := p.blobFiles[hash]; ok {
delete(set, fileID)
if len(set) == 0 {
delete(p.blobFiles, hash)
}
}
}
delete(p.fileBlobs, fileID)
// Also scrub the file from any blobFiles entries where it might
// still appear even after its uncached-blob set was emptied.
for hash, set := range p.blobFiles {
if _, ok := set[fileID]; ok {
delete(set, fileID)
if len(set) == 0 {
delete(p.blobFiles, hash)
}
}
}
}
// pickNextDownload returns the pending file whose remaining-uncached
// blob set is smallest (with ties broken by FileID string compare so
// the choice is deterministic across runs). This file's blobs are
// downloaded next, after which it — together with any other pending
// files whose blob sets become empty — moves to the ready queue.
//
// The zero FileID return means nothing is pending.
func (p *restorePlan) pickNextDownload() types.FileID {
var best types.FileID
bestCount := math.MaxInt
var bestID string
for id, blobs := range p.fileBlobs {
n := len(blobs)
if n == 0 {
// Already-ready files should have been popped via
// popReady; ignore here just in case.
continue
}
idStr := id.String()
if n < bestCount || (n == bestCount && (best.IsZero() || idStr < bestID)) {
best = id
bestCount = n
bestID = idStr
}
}
return best
}
// blobsNeeded returns the uncached blob hashes for fileID in any order.
func (p *restorePlan) blobsNeeded(fileID types.FileID) []string {
blobs := p.fileBlobs[fileID]
out := make([]string, 0, len(blobs))
for h := range blobs {
out = append(out, h)
}
return out
}
// hasPending reports whether any unfinished files remain.
func (p *restorePlan) hasPending() bool {
return len(p.fileBlobs) > 0
}

View File

@@ -0,0 +1,118 @@
package vaultik
import (
"context"
"fmt"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
)
// restoreSweeper frees cached blobs once all files that reference any of
// their chunks have been restored. It works as follows:
//
// 1. Callers add a file's ID to an in-memory restored set via
// fileRestored once the file is fully written to disk.
// 2. After each chunk is restored, chunkRestored accumulates a running
// byte count.
// 3. When the accumulator crosses a threshold (one hundredth of the
// configured blob size — so a sweep runs about a hundred times per
// blob's worth of restored bytes), the sweeper iterates every key in
// the cache. For each cached blob it asks the DB which files
// reference any chunk in that blob, then compares that list against
// the in-memory restored set. If any referencing file is missing
// from the set the blob is kept; otherwise the cache entry is
// deleted.
//
// All DB reads happen against the snapshot's temporary metadata DB,
// which is local, indexed, and not under contention — the queries are
// cheap and run at most once per blob per sweep interval.
type restoreSweeper struct {
ctx context.Context
repos *database.Repositories
cache *blobDiskCache
threshold int64
bytesAccum int64
restored map[string]struct{}
}
// newRestoreSweeper returns a sweeper that triggers eviction every
// `threshold` bytes restored. Callers should pass blob_size_limit/100.
func newRestoreSweeper(ctx context.Context, repos *database.Repositories, cache *blobDiskCache, threshold int64) *restoreSweeper {
if threshold <= 0 {
threshold = 1
}
return &restoreSweeper{
ctx: ctx,
repos: repos,
cache: cache,
threshold: threshold,
restored: make(map[string]struct{}),
}
}
// fileRestored records a file as fully restored. After this call, any
// blob whose only remaining references come from files in the restored
// set may be evicted on the next sweep.
func (s *restoreSweeper) fileRestored(fileID string) {
s.restored[fileID] = struct{}{}
}
// chunkRestored accounts n bytes against the sweep threshold and runs a
// sweep if the threshold has been crossed since the last sweep.
func (s *restoreSweeper) chunkRestored(n int64) {
s.bytesAccum += n
if s.bytesAccum < s.threshold {
return
}
s.bytesAccum = 0
s.sweep()
}
// sweep deletes any cached blob whose chunks are no longer referenced
// by an unrestored file. Per-blob DB failures are logged and the blob
// is kept — we'd rather hold a blob longer than risk a re-download.
func (s *restoreSweeper) sweep() {
for _, blobHash := range s.cache.Keys() {
needed, err := s.blobStillNeeded(blobHash)
if err != nil {
log.Debug("sweeper referencing-files query failed", "blob_hash", blobHash[:16], "error", err)
continue
}
if !needed {
s.cache.Delete(blobHash)
}
}
}
// blobStillNeeded returns true if any file that references a chunk in
// this blob has not yet been restored. On any error the function
// returns true — keeping the blob is always the safe answer because we
// can't prove we're done with it.
func (s *restoreSweeper) blobStillNeeded(blobHash string) (bool, error) {
rows, err := s.repos.DB().Conn().QueryContext(s.ctx, `
SELECT DISTINCT fc.file_id
FROM file_chunks fc
JOIN blob_chunks bc ON bc.chunk_hash = fc.chunk_hash
JOIN blobs b ON b.id = bc.blob_id
WHERE b.blob_hash = ?
`, blobHash)
if err != nil {
return true, fmt.Errorf("querying referencing files: %w", err)
}
defer func() { _ = rows.Close() }()
for rows.Next() {
var fileID string
if err := rows.Scan(&fileID); err != nil {
return true, fmt.Errorf("scanning file_id: %w", err)
}
if _, ok := s.restored[fileID]; !ok {
return true, nil
}
}
if err := rows.Err(); err != nil {
return true, err
}
return false, nil
}

View File

@@ -0,0 +1,248 @@
package vaultik_test
import (
"context"
"fmt"
"io"
"math/rand"
"os"
"path/filepath"
"strings"
"sync"
"testing"
"github.com/spf13/afero"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/vaultik/internal/config"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/snapshot"
"sneak.berlin/go/vaultik/internal/storage"
"sneak.berlin/go/vaultik/internal/ui"
"sneak.berlin/go/vaultik/internal/vaultik"
)
// TestRestoreSweeperEvictsBlobs exercises the reference-counted blob
// disk cache eviction during restore.
//
// The scenario: 30 unique 1 MB random files plus 10 duplicates of those
// (40 files total, 30 MB of unique content) get backed up with a 10 MB
// blob_size_limit. After backup the snapshot's encrypted blobs are
// restored through Vaultik.Restore, and per-key Get counts on the
// storage layer are recorded. Each blob in the snapshot MUST be
// downloaded exactly once — re-downloads would mean the sweeper either
// evicted a blob that was still needed (LRU regression) or that the
// cache held nothing at all (broken cache).
//
// The duplicates ensure deduplicated files share blobs with their
// originals; the sweeper must keep each blob alive until BOTH the
// original AND every duplicate referencing its chunks have been
// restored.
func TestRestoreSweeperEvictsBlobs(t *testing.T) {
log.Initialize(log.Config{})
fs := afero.NewOsFs()
tempDir, err := os.MkdirTemp("", "vaultik-sweeper-")
require.NoError(t, err)
defer func() { _ = os.RemoveAll(tempDir) }()
dataDir := filepath.Join(tempDir, "source")
storeDir := filepath.Join(tempDir, "remote")
restoreDir := filepath.Join(tempDir, "restored")
dbPath := filepath.Join(tempDir, "index.sqlite")
require.NoError(t, fs.MkdirAll(dataDir, 0o755))
// Generate 30 unique 1 MB random files. The PRNG seed is fixed so
// failures are reproducible; the entropy is what matters here — the
// FastCDC chunker needs realistic-looking data to pick chunk
// boundaries naturally.
const (
uniqueFiles = 30
duplicateFiles = 10
fileSize = 1 * 1024 * 1024
)
rng := rand.New(rand.NewSource(42))
type sourceFile struct {
path string
data []byte
}
uniques := make([]sourceFile, 0, uniqueFiles)
expected := make(map[string][]byte, uniqueFiles+duplicateFiles)
for i := 0; i < uniqueFiles; i++ {
data := make([]byte, fileSize)
_, err := rng.Read(data)
require.NoError(t, err)
path := filepath.Join(dataDir, fmt.Sprintf("unique-%02d.bin", i))
require.NoError(t, afero.WriteFile(fs, path, data, 0o644))
uniques = append(uniques, sourceFile{path: path, data: data})
expected[path] = data
}
// Pick 10 of the originals and copy each to a fresh path so the
// chunker dedups them against the originals' blobs.
for i, idx := range rng.Perm(uniqueFiles)[:duplicateFiles] {
src := uniques[idx]
dstPath := filepath.Join(dataDir, fmt.Sprintf("dup-%02d.bin", i))
require.NoError(t, afero.WriteFile(fs, dstPath, src.data, 0o644))
expected[dstPath] = src.data
}
chunkSize := int64(64 * 1024)
maxBlobSize := int64(10 * 1024 * 1024)
storer, err := storage.NewFileStorer(storeDir)
require.NoError(t, err)
agePublicKey := "age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"
ageSecretKey := "AGE-SECRET-KEY-19CR5YSFW59HM4TLD6GXVEDMZFTVVF7PPHKUT68TXSFPK7APHXA2QS2NJA5"
cfg := &config.Config{
AgeRecipients: []string{agePublicKey},
AgeSecretKey: ageSecretKey,
CompressionLevel: 3,
Hostname: "test-host",
BlobSizeLimit: config.Size(maxBlobSize),
}
ctx := context.Background()
db, err := database.New(ctx, dbPath)
require.NoError(t, err)
defer func() { _ = db.Close() }()
repos := database.NewRepositories(db)
sm := snapshot.NewSnapshotManager(snapshot.SnapshotManagerParams{
Repos: repos,
Storage: storer,
Config: cfg,
})
sm.SetFilesystem(fs)
scanner := snapshot.NewScanner(snapshot.ScannerConfig{
FS: fs,
Storage: storer,
ChunkSize: chunkSize,
MaxBlobSize: maxBlobSize,
CompressionLevel: cfg.CompressionLevel,
AgeRecipients: cfg.AgeRecipients,
Repositories: repos,
})
snapshotID, err := sm.CreateSnapshotWithName(ctx, cfg.Hostname, "sweeper", "test-version", "test-git")
require.NoError(t, err)
scanResult, err := scanner.Scan(ctx, dataDir, snapshotID)
require.NoError(t, err)
require.Equal(t, uniqueFiles+duplicateFiles, scanResult.FilesScanned)
require.Greater(t, scanResult.BlobsCreated, 1, "30 MB of unique data at 10 MB blob size should yield multiple blobs")
require.NoError(t, sm.CompleteSnapshot(ctx, snapshotID))
require.NoError(t, sm.ExportSnapshotMetadata(ctx, dbPath, snapshotID))
// Count blobs actually present on disk; this is the ground-truth
// figure each blob's GET count must equal exactly once.
blobCount := countBlobsOnDisk(t, storeDir)
require.Greater(t, blobCount, 1, "expected more than one blob")
t.Logf("backup produced %d blobs from %d files (%d unique + %d duplicates)",
blobCount, uniqueFiles+duplicateFiles, uniqueFiles, duplicateFiles)
// Force restore to operate without the source-side index, exactly
// as a real restore on a fresh machine would.
require.NoError(t, db.Close())
counter := newCountingStorer(storer)
restoreVaultik := &vaultik.Vaultik{
Config: cfg,
Storage: counter,
Fs: fs,
Stdout: io.Discard,
Stderr: io.Discard,
UI: ui.NewWithColor(io.Discard, false),
}
restoreVaultik.SetContext(ctx)
require.NoError(t, restoreVaultik.Restore(&vaultik.RestoreOptions{
SnapshotID: snapshotID,
TargetDir: restoreDir,
}))
// Verify every restored file byte-matches its source.
for origPath, want := range expected {
restoredPath := filepath.Join(restoreDir, origPath)
got, err := afero.ReadFile(fs, restoredPath)
require.NoErrorf(t, err, "restored file missing: %s", restoredPath)
require.Equalf(t, want, got, "byte mismatch for %s", origPath)
}
// Each blob must have been downloaded exactly once. >1 means the
// sweeper evicted a still-needed blob; 0 means the cache silently
// stopped being consulted.
blobDownloads := 0
for key, count := range counter.snapshot() {
if !strings.HasPrefix(key, "blobs/") {
continue
}
assert.Equalf(t, 1, count,
"blob %s should have been downloaded exactly once during restore, got %d", key, count)
blobDownloads++
}
assert.Equal(t, blobCount, blobDownloads,
"every blob on disk should have been fetched exactly once during restore")
t.Logf("restore downloaded %d blobs, each exactly once", blobDownloads)
}
// countingStorer wraps a Storer and records the number of Get calls per
// key. Used to verify that the restore-side blob cache + sweeper avoid
// re-downloading blobs that are evicted while still needed.
type countingStorer struct {
storage.Storer
mu sync.Mutex
counts map[string]int
}
func newCountingStorer(inner storage.Storer) *countingStorer {
return &countingStorer{Storer: inner, counts: make(map[string]int)}
}
func (c *countingStorer) Get(ctx context.Context, key string) (io.ReadCloser, error) {
c.mu.Lock()
c.counts[key]++
c.mu.Unlock()
return c.Storer.Get(ctx, key)
}
func (c *countingStorer) snapshot() map[string]int {
c.mu.Lock()
defer c.mu.Unlock()
out := make(map[string]int, len(c.counts))
for k, v := range c.counts {
out[k] = v
}
return out
}
// countBlobsOnDisk walks the blobs/ tree of a FileStorer-backed store
// and returns the total number of blob files. Used to ground-truth the
// expected number of restore-time downloads.
func countBlobsOnDisk(t *testing.T, storeDir string) int {
t.Helper()
count := 0
root := filepath.Join(storeDir, "blobs")
err := filepath.Walk(root, func(_ string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if !info.IsDir() {
count++
}
return nil
})
require.NoError(t, err)
return count
}

View File

@@ -12,21 +12,21 @@ import (
"text/tabwriter"
"time"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/types"
"github.com/dustin/go-humanize"
"golang.org/x/sync/errgroup"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/snapshot"
"sneak.berlin/go/vaultik/internal/types"
)
// SnapshotCreateOptions contains options for the snapshot create command
type SnapshotCreateOptions struct {
Daemon bool
Cron bool
Prune bool
SkipErrors bool // Skip file read errors (log them loudly but continue)
Snapshots []string // Optional list of snapshot names to process (empty = all)
Cron bool
Prune bool
KeepNewerThan string // With --prune: keep snapshots newer than this duration (e.g. "4w"); default: keep only latest
SkipErrors bool // Skip file read errors (log them loudly but continue)
Snapshots []string // Optional list of snapshot names to process (empty = all)
}
// CreateSnapshot executes the snapshot creation operation
@@ -57,12 +57,6 @@ func (v *Vaultik) CreateSnapshot(opts *SnapshotCreateOptions) error {
return fmt.Errorf("prune database: %w", err)
}
if opts.Daemon {
log.Info("Running in daemon mode")
// TODO: Implement daemon mode with inotify
return fmt.Errorf("daemon mode not yet implemented")
}
// Determine which snapshots to process
snapshotNames := opts.Snapshots
if len(snapshotNames) == 0 {
@@ -89,28 +83,50 @@ func (v *Vaultik) CreateSnapshot(opts *SnapshotCreateOptions) error {
// Print overall summary if multiple snapshots
if len(snapshotNames) > 1 {
v.printfStdout("\nAll %d snapshots completed in %s\n", len(snapshotNames), time.Since(overallStartTime).Round(time.Second))
v.UI.Complete("All %d snapshots completed in %s.", len(snapshotNames), v.UI.Duration(time.Since(overallStartTime)))
}
// Prune old snapshots and unreferenced blobs if --prune was specified
if opts.Prune {
log.Info("Pruning enabled - deleting old snapshots and unreferenced blobs")
v.printlnStdout("\nPruning old snapshots (keeping latest)...")
if err := v.PurgeSnapshotsWithOptions(&SnapshotPurgeOptions{
KeepLatest: true,
Force: true,
}); err != nil {
return fmt.Errorf("prune: purging old snapshots: %w", err)
if err := v.runPostBackupPrune(snapshotNames, opts.KeepNewerThan); err != nil {
return fmt.Errorf("post-backup prune: %w", err)
}
}
v.printlnStdout("Pruning unreferenced blobs...")
if v.UI.WarningCount() > 0 {
v.UI.Complete("Finished (with %d warnings).", v.UI.WarningCount())
} else {
v.UI.Complete("Finished successfully.")
}
if err := v.PruneBlobs(&PruneOptions{Force: true}); err != nil {
return fmt.Errorf("prune: removing unreferenced blobs: %w", err)
}
return nil
}
log.Info("Pruning complete")
// runPostBackupPrune drops older snapshots of the given names and removes
// orphan blobs from remote storage. If keepNewerThan is set (e.g. "4w"),
// snapshots newer than that duration are kept. Otherwise only the latest
// snapshot of each name is kept.
func (v *Vaultik) runPostBackupPrune(snapshotNames []string, keepNewerThan string) error {
log.Info("Running post-backup prune", "snapshots", snapshotNames, "keep_newer_than", keepNewerThan)
v.UI.Begin("Running post-backup prune.")
purgeOpts := &SnapshotPurgeOptions{
Force: true,
Names: snapshotNames,
Quiet: true,
}
if keepNewerThan != "" {
purgeOpts.OlderThan = keepNewerThan
} else {
purgeOpts.KeepLatest = true
}
if err := v.PurgeSnapshotsWithOptions(purgeOpts); err != nil {
return fmt.Errorf("purging old snapshots: %w", err)
}
if err := v.PruneBlobs(&PruneOptions{Force: true}); err != nil {
return fmt.Errorf("pruning orphaned blobs: %w", err)
}
return nil
@@ -136,7 +152,7 @@ func (v *Vaultik) createNamedSnapshot(opts *SnapshotCreateOptions, hostname, sna
snapshotStartTime := time.Now()
if total > 1 {
v.printfStdout("\n=== Snapshot %d/%d: %s ===\n", idx, total, snapName)
v.UI.Info("Snapshot %d/%d: %s.", idx, total, snapName)
}
resolvedDirs, err := v.resolveSnapshotPaths(snapName)
@@ -146,6 +162,7 @@ func (v *Vaultik) createNamedSnapshot(opts *SnapshotCreateOptions, hostname, sna
scanner := v.ScannerFactory(snapshot.ScannerParams{
EnableProgress: !opts.Cron,
UI: v.UI,
Fs: v.Fs,
Exclude: v.Config.GetExcludes(snapName),
SkipErrors: opts.SkipErrors,
@@ -156,7 +173,7 @@ func (v *Vaultik) createNamedSnapshot(opts *SnapshotCreateOptions, hostname, sna
return fmt.Errorf("creating snapshot: %w", err)
}
log.Info("Beginning snapshot", "snapshot_id", snapshotID, "name", snapName)
v.printfStdout("Beginning snapshot: %s\n", snapshotID)
v.UI.Begin("Creating snapshot %s.", v.UI.Snapshot(snapshotID))
stats, err := v.scanAllDirectories(scanner, resolvedDirs, snapshotID)
if err != nil {
@@ -220,7 +237,7 @@ func (v *Vaultik) scanAllDirectories(scanner *snapshot.Scanner, resolvedDirs []s
}
log.Info("Scanning directory", "path", dir)
v.printfStdout("Beginning directory scan (%d/%d): %s\n", i+1, len(resolvedDirs), dir)
v.UI.Begin("Enumerating snapshot source files in %s (%d of %d).", v.UI.Path(dir), i+1, len(resolvedDirs))
result, err := scanner.Scan(v.ctx, dir, snapshotID)
if err != nil {
return nil, fmt.Errorf("failed to scan %s: %w", dir, err)
@@ -289,23 +306,13 @@ func (v *Vaultik) finalizeSnapshotMetadata(snapshotID string, stats *snapshotSta
return nil
}
// formatUploadSpeed formats bytes uploaded and duration into a human-readable speed string
func formatUploadSpeed(bytesUploaded int64, duration time.Duration) string {
// uploadSpeed returns the average network upload rate as a colorized
// bits/sec string, or "N/A" when there's no usable data.
func (v *Vaultik) uploadSpeed(bytesUploaded int64, duration time.Duration) string {
if bytesUploaded <= 0 || duration <= 0 {
return "N/A"
}
bytesPerSec := float64(bytesUploaded) / duration.Seconds()
bitsPerSec := bytesPerSec * 8
switch {
case bitsPerSec >= 1e9:
return fmt.Sprintf("%.1f Gbit/s", bitsPerSec/1e9)
case bitsPerSec >= 1e6:
return fmt.Sprintf("%.0f Mbit/s", bitsPerSec/1e6)
case bitsPerSec >= 1e3:
return fmt.Sprintf("%.0f Kbit/s", bitsPerSec/1e3)
default:
return fmt.Sprintf("%.0f bit/s", bitsPerSec)
return v.UI.Speed(0)
}
return v.UI.Speed(float64(bytesUploaded) / duration.Seconds())
}
// printSnapshotSummary prints the comprehensive snapshot completion summary
@@ -324,35 +331,36 @@ func (v *Vaultik) printSnapshotSummary(snapshotID string, startTime time.Time, s
compressionRatio = 1.0
}
v.printfStdout("=== Snapshot Complete ===\n")
v.printfStdout("ID: %s\n", snapshotID)
v.printfStdout("Files: %s examined, %s to process, %s unchanged",
formatNumber(stats.totalFiles),
formatNumber(totalFilesChanged),
formatNumber(stats.totalFilesSkipped))
v.UI.Complete("Created snapshot %s.", v.UI.Snapshot(snapshotID))
filesMsg := fmt.Sprintf("Files: %s examined, %s backed up, %s unchanged",
v.UI.Count(stats.totalFiles),
v.UI.Count(totalFilesChanged),
v.UI.Count(stats.totalFilesSkipped))
if stats.totalFilesDeleted > 0 {
v.printfStdout(", %s deleted", formatNumber(stats.totalFilesDeleted))
filesMsg += fmt.Sprintf(", %s deleted", v.UI.Count(stats.totalFilesDeleted))
}
v.printlnStdout()
v.printfStdout("Data: %s total (%s to process)",
humanize.Bytes(uint64(totalBytesAll)),
humanize.Bytes(uint64(stats.totalBytes)))
v.UI.Detail("%s.", filesMsg)
dataMsg := fmt.Sprintf("Data: %s total (%s backed up)",
v.UI.Size(totalBytesAll),
v.UI.Size(stats.totalBytes))
if stats.totalBytesDeleted > 0 {
v.printfStdout(", %s deleted", humanize.Bytes(uint64(stats.totalBytesDeleted)))
dataMsg += fmt.Sprintf(", %s deleted", v.UI.Size(stats.totalBytesDeleted))
}
v.printlnStdout()
v.UI.Detail("%s.", dataMsg)
if stats.totalBlobsUploaded > 0 {
v.printfStdout("Storage: %s compressed from %s (%.2fx)\n",
humanize.Bytes(uint64(totalBlobSizeCompressed)),
humanize.Bytes(uint64(totalBlobSizeUncompressed)),
v.UI.Detail("Storage: %s compressed from %s (%.2fx ratio).",
v.UI.Size(totalBlobSizeCompressed),
v.UI.Size(totalBlobSizeUncompressed),
compressionRatio)
v.printfStdout("Upload: %d blobs, %s in %s (%s)\n",
v.UI.Detail("Upload: %d blobs, %s in %s (%s).",
stats.totalBlobsUploaded,
humanize.Bytes(uint64(stats.totalBytesUploaded)),
formatDuration(stats.uploadDuration),
formatUploadSpeed(stats.totalBytesUploaded, stats.uploadDuration))
v.UI.Size(stats.totalBytesUploaded),
v.UI.Duration(stats.uploadDuration),
v.uploadSpeed(stats.totalBytesUploaded, stats.uploadDuration))
}
v.printfStdout("Duration: %s\n", formatDuration(snapshotDuration))
v.UI.Detail("Snapshot create duration: %s.", v.UI.Duration(snapshotDuration))
}
// getSnapshotBlobSizes returns total compressed and uncompressed blob sizes for a snapshot
@@ -399,7 +407,26 @@ func (v *Vaultik) ListSnapshots(jsonOutput bool) error {
return encoder.Encode(snapshots)
}
return v.printSnapshotTable(snapshots)
if err := v.printSnapshotTable(snapshots); err != nil {
return err
}
// Warn about local snapshots that don't exist in remote storage.
var stale []string
for id := range localSnapshotMap {
if !remoteSnapshots[id] {
stale = append(stale, id)
}
}
if len(stale) > 0 {
v.UI.Warning("%d local snapshot record(s) not found in backup destination store:", len(stale))
for _, id := range stale {
v.UI.Info("%s", v.UI.Snapshot(id))
}
v.UI.Info("Run 'vaultik snapshot cleanup' to remove stale local records.")
}
return nil
}
// listRemoteSnapshotIDs returns a set of snapshot IDs found in remote storage
@@ -454,10 +481,23 @@ func (v *Vaultik) buildSnapshotInfoList(remoteSnapshots map[string]bool, localSn
totalSize = localSnap.BlobSize
}
uncompressedSize, err := v.Repositories.Snapshots.GetSnapshotUncompressedChunkSize(v.ctx, snapshotID)
if err != nil {
log.Warn("Failed to get uncompressed chunk size", "id", snapshotID, "error", err)
}
newChunkSize, err := v.Repositories.Snapshots.GetSnapshotNewChunkSize(v.ctx, snapshotID)
if err != nil {
log.Warn("Failed to get new chunk size", "id", snapshotID, "error", err)
}
snapshots = append(snapshots, SnapshotInfo{
ID: localSnap.ID,
Timestamp: localSnap.StartedAt,
CompressedSize: totalSize,
ID: localSnap.ID,
Timestamp: localSnap.StartedAt,
CompressedSize: totalSize,
UncompressedSize: uncompressedSize,
NewChunkSize: newChunkSize,
LocallyTracked: true,
})
} else {
timestamp, err := parseSnapshotTimestamp(snapshotID)
@@ -471,6 +511,7 @@ func (v *Vaultik) buildSnapshotInfoList(remoteSnapshots map[string]bool, localSn
ID: types.SnapshotID(snapshotID),
Timestamp: timestamp,
CompressedSize: 0,
LocallyTracked: false,
})
remoteOnly = append(remoteOnly, snapshotID)
}
@@ -566,18 +607,27 @@ func (v *Vaultik) printSnapshotTable(snapshots []SnapshotInfo) error {
if _, err := fmt.Fprintln(w, "REMOTE SNAPSHOTS:"); err != nil {
return err
}
if _, err := fmt.Fprintln(w, "SNAPSHOT ID\tTIMESTAMP\tCOMPRESSED SIZE"); err != nil {
if _, err := fmt.Fprintln(w, "SNAPSHOT ID\tTIMESTAMP\tCOMPRESSED SIZE\tUNCOMPRESSED SIZE\tNEW CHUNK SIZE"); err != nil {
return err
}
if _, err := fmt.Fprintln(w, "───────────\t─────────\t───────────────"); err != nil {
if _, err := fmt.Fprintln(w, "───────────\t─────────\t───────────────\t─────────────────\t──────────────"); err != nil {
return err
}
const remoteOnlyCell = "<remote only>"
for _, snap := range snapshots {
if _, err := fmt.Fprintf(w, "%s\t%s\t%s\n",
uncompressed := remoteOnlyCell
newChunks := remoteOnlyCell
if snap.LocallyTracked {
uncompressed = formatBytes(snap.UncompressedSize)
newChunks = formatBytes(snap.NewChunkSize)
}
if _, err := fmt.Fprintf(w, "%s\t%s\t%s\t%s\t%s\n",
snap.ID,
snap.Timestamp.Format("2006-01-02 15:04:05"),
formatBytes(snap.CompressedSize)); err != nil {
formatBytes(snap.CompressedSize),
uncompressed,
newChunks); err != nil {
return err
}
}
@@ -585,18 +635,19 @@ func (v *Vaultik) printSnapshotTable(snapshots []SnapshotInfo) error {
return w.Flush()
}
// SnapshotPurgeOptions contains options for the snapshot purge command
// SnapshotPurgeOptions contains options for the snapshot purge command.
type SnapshotPurgeOptions struct {
KeepLatest bool
OlderThan string
Force bool
Name string // Filter purge to a specific snapshot name
KeepLatest bool // Keep only the most recent snapshot per name
OlderThan string // Drop snapshots older than this duration (e.g. "30d", "6m", "1y")
Force bool // Skip confirmation prompt
Names []string // If non-empty, only operate on snapshots with one of these names
Quiet bool // Suppress informational output (used by --prune flag)
}
// PurgeSnapshotsWithOptions removes old snapshots based on criteria.
// When KeepLatest is true, retention is applied per snapshot name — the latest
// snapshot for each distinct name is kept. If Name is non-empty, only snapshots
// matching that name are considered for purge.
// Retention is per-snapshot-name: KeepLatest keeps the latest of EACH configured
// snapshot name, not the latest globally. This prevents `home` and `system`
// snapshots from cannibalizing each other.
func (v *Vaultik) PurgeSnapshotsWithOptions(opts *SnapshotPurgeOptions) error {
// Sync with remote first
if err := v.syncWithRemote(); err != nil {
@@ -609,27 +660,28 @@ func (v *Vaultik) PurgeSnapshotsWithOptions(opts *SnapshotPurgeOptions) error {
return fmt.Errorf("listing snapshots: %w", err)
}
// Convert to SnapshotInfo format, only including completed snapshots
snapshots := make([]SnapshotInfo, 0, len(dbSnapshots))
for _, s := range dbSnapshots {
if s.CompletedAt != nil {
snapshots = append(snapshots, SnapshotInfo{
ID: s.ID,
Timestamp: s.StartedAt,
CompressedSize: s.BlobSize,
})
}
// Build name filter set if --snapshot was specified.
nameFilter := make(map[string]struct{}, len(opts.Names))
for _, n := range opts.Names {
nameFilter[n] = struct{}{}
}
// If --name is specified, filter to only snapshots matching that name
if opts.Name != "" {
filtered := make([]SnapshotInfo, 0, len(snapshots))
for _, snap := range snapshots {
if parseSnapshotName(snap.ID.String()) == opts.Name {
filtered = append(filtered, snap)
// Collect completed snapshots, applying the name filter.
snapshots := make([]SnapshotInfo, 0, len(dbSnapshots))
for _, s := range dbSnapshots {
if s.CompletedAt == nil {
continue
}
if len(nameFilter) > 0 {
if _, ok := nameFilter[parseSnapshotName(s.ID.String())]; !ok {
continue
}
}
snapshots = filtered
snapshots = append(snapshots, SnapshotInfo{
ID: s.ID,
Timestamp: s.StartedAt,
CompressedSize: s.BlobSize,
})
}
// Sort by timestamp (newest first)
@@ -640,21 +692,18 @@ func (v *Vaultik) PurgeSnapshotsWithOptions(opts *SnapshotPurgeOptions) error {
var toDelete []SnapshotInfo
if opts.KeepLatest {
// Keep the latest snapshot per snapshot name
// Group snapshots by name, then mark all but the newest in each group
latestByName := make(map[string]bool) // tracks whether we've seen the latest for each name
// Keep the latest snapshot per snapshot name. Snapshots are sorted
// newest-first, so the first occurrence of each name is kept.
seen := make(map[string]bool)
for _, snap := range snapshots {
name := parseSnapshotName(snap.ID.String())
if latestByName[name] {
// Already kept the latest for this name — delete this one
if seen[name] {
toDelete = append(toDelete, snap)
} else {
// This is the latest (sorted newest-first) — keep it
latestByName[name] = true
continue
}
seen[name] = true
}
} else if opts.OlderThan != "" {
// Parse duration
duration, err := parseDuration(opts.OlderThan)
if err != nil {
return fmt.Errorf("invalid duration: %w", err)
@@ -669,22 +718,25 @@ func (v *Vaultik) PurgeSnapshotsWithOptions(opts *SnapshotPurgeOptions) error {
}
if len(toDelete) == 0 {
v.printlnStdout("No snapshots to delete")
if !opts.Quiet {
v.printlnStdout("No snapshots to delete")
}
return nil
}
return v.confirmAndExecutePurge(toDelete, opts.Force)
return v.confirmAndExecutePurge(toDelete, opts.Force, opts.Quiet)
}
// confirmAndExecutePurge shows deletion candidates, confirms with user, and deletes snapshots
func (v *Vaultik) confirmAndExecutePurge(toDelete []SnapshotInfo, force bool) error {
// Show what will be deleted
v.printfStdout("The following snapshots will be deleted:\n\n")
for _, snap := range toDelete {
v.printfStdout(" %s (%s, %s)\n",
snap.ID,
snap.Timestamp.Format("2006-01-02 15:04:05"),
formatBytes(snap.CompressedSize))
func (v *Vaultik) confirmAndExecutePurge(toDelete []SnapshotInfo, force, quiet bool) error {
if !quiet {
v.printfStdout("The following snapshots will be deleted:\n\n")
for _, snap := range toDelete {
v.printfStdout(" %s (%s, %s)\n",
snap.ID,
snap.Timestamp.Format("2006-01-02 15:04:05"),
formatBytes(snap.CompressedSize))
}
}
// Confirm unless --force is used
@@ -700,7 +752,7 @@ func (v *Vaultik) confirmAndExecutePurge(toDelete []SnapshotInfo, force bool) er
v.printlnStdout("Cancelled")
return nil
}
} else {
} else if !quiet {
v.printfStdout("\nDeleting %d snapshot(s) (--force specified)\n", len(toDelete))
}
@@ -716,10 +768,19 @@ func (v *Vaultik) confirmAndExecutePurge(toDelete []SnapshotInfo, force bool) er
}
}
v.printfStdout("Deleted %d snapshot(s)\n", len(toDelete))
// Tidy up local DB orphans now so users don't have to run a
// separate command after a purge. Guarded against nil for tests
// that don't wire up a SnapshotManager.
if v.SnapshotManager != nil {
if err := v.SnapshotManager.CleanupOrphanedData(v.ctx); err != nil {
log.Warn("Failed to clean up orphaned local data after purge", "error", err)
}
}
// Note: Run 'vaultik prune' separately to clean up unreferenced blobs
v.printlnStdout("\nNote: Run 'vaultik prune' to clean up unreferenced blobs.")
if !quiet {
v.printfStdout("Deleted %d snapshot(s)\n", len(toDelete))
v.printlnStdout("\nNote: Run 'vaultik prune' to clean up unreferenced remote blobs.")
}
return nil
}
@@ -733,15 +794,17 @@ func (v *Vaultik) VerifySnapshot(snapshotID string, deep bool) error {
return v.VerifySnapshotWithOptions(snapshotID, opts)
}
// VerifySnapshotWithOptions checks snapshot integrity with full options
// VerifySnapshotWithOptions checks snapshot integrity with full options.
// Deep verification is delegated to RunDeepVerify so this function only
// implements the shallow (existence-only) path.
func (v *Vaultik) VerifySnapshotWithOptions(snapshotID string, opts *VerifyOptions) error {
if opts.Deep {
return v.RunDeepVerify(snapshotID, opts)
}
result := &VerifyResult{
SnapshotID: snapshotID,
Mode: "shallow",
}
if opts.Deep {
result.Mode = "deep"
}
v.printVerifyHeader(snapshotID, opts)
@@ -779,22 +842,12 @@ func (v *Vaultik) VerifySnapshotWithOptions(snapshotID string, opts *VerifyOptio
return v.formatVerifyResult(result, manifest, opts)
}
// printVerifyHeader prints the snapshot ID and parsed timestamp for verification output
// printVerifyHeader prints the snapshot ID and parsed timestamp for verification output.
// Snapshot ID format: hostname[_name]_<RFC3339>
func (v *Vaultik) printVerifyHeader(snapshotID string, opts *VerifyOptions) {
// Parse snapshot ID to extract timestamp
parts := strings.Split(snapshotID, "-")
var snapshotTime time.Time
if len(parts) >= 3 {
// Format: hostname-YYYYMMDD-HHMMSSZ
dateStr := parts[len(parts)-2]
timeStr := parts[len(parts)-1]
if len(dateStr) == 8 && len(timeStr) == 7 && strings.HasSuffix(timeStr, "Z") {
timeStr = timeStr[:6] // Remove Z
timestamp, err := time.Parse("20060102150405", dateStr+timeStr)
if err == nil {
snapshotTime = timestamp
}
}
if t, err := parseSnapshotTimestamp(snapshotID); err == nil {
snapshotTime = t
}
if !opts.JSON {
@@ -811,7 +864,7 @@ func (v *Vaultik) verifyManifestBlobsExist(manifest *snapshot.Manifest, opts *Ve
for _, blob := range manifest.Blobs {
blobPath := fmt.Sprintf("blobs/%s/%s/%s", blob.Hash[:2], blob.Hash[2:4], blob.Hash)
// Just check existence (deep verification is handled by RunDeepVerify)
// Shallow: just check existence (deep verification is handled by RunDeepVerify)
_, err := v.Storage.Stat(v.ctx, blobPath)
if err != nil {
if !opts.JSON {
@@ -869,6 +922,41 @@ func (v *Vaultik) outputVerifyJSON(result *VerifyResult) error {
return nil
}
// CleanupLocalSnapshots removes local snapshot records that have no
// corresponding metadata in remote storage. These are typically left
// behind by incomplete or interrupted backups.
func (v *Vaultik) CleanupLocalSnapshots() error {
remoteSnapshots, err := v.listRemoteSnapshotIDs()
if err != nil {
return err
}
localSnapshots, err := v.Repositories.Snapshots.ListRecent(v.ctx, 10000)
if err != nil {
return fmt.Errorf("listing local snapshots: %w", err)
}
var removed int
for _, snap := range localSnapshots {
id := snap.ID.String()
if !remoteSnapshots[id] {
v.printfStdout("Removing stale local record: %s\n", id)
if err := v.deleteSnapshotFromLocalDB(id); err != nil {
log.Error("Failed to delete local snapshot", "snapshot_id", id, "error", err)
continue
}
removed++
}
}
if removed == 0 {
v.printlnStdout("No stale local snapshots found.")
} else {
v.printfStdout("Removed %d stale local snapshot record(s).\n", removed)
}
return nil
}
// Helper methods that were previously on SnapshotApp
func (v *Vaultik) downloadManifest(snapshotID string) (*snapshot.Manifest, error) {
@@ -1013,6 +1101,16 @@ func (v *Vaultik) RemoveSnapshot(snapshotID string, opts *RemoveOptions) (*Remov
result.RemoteRemoved = true
}
// Clean up the local rows that just became orphaned (files, chunks,
// blob_chunks, blobs no longer referenced by any snapshot). This
// used to be a separate `vaultik snapshot prune` step; running it
// inline means `snapshot remove` leaves no ghost rows behind.
if v.SnapshotManager != nil {
if err := v.SnapshotManager.CleanupOrphanedData(v.ctx); err != nil {
log.Warn("Failed to clean up orphaned local data after removal", "error", err)
}
}
// Output result
if opts.JSON {
return result, v.outputRemoveJSON(result)
@@ -1022,7 +1120,7 @@ func (v *Vaultik) RemoveSnapshot(snapshotID string, opts *RemoveOptions) (*Remov
v.printfStdout("Removed snapshot '%s' from local database\n", snapshotID)
if opts.Remote {
v.printlnStdout("Removed snapshot metadata from remote storage")
v.printlnStdout("\nNote: Blobs were not removed. Run 'vaultik prune' to remove orphaned blobs.")
v.printlnStdout("\nNote: Remote blobs were not removed. Run 'vaultik prune' to remove orphaned blobs.")
}
return result, nil
@@ -1134,6 +1232,14 @@ func (v *Vaultik) executeRemoveAll(snapshotIDs []string, opts *RemoveOptions) (*
result.RemoteRemoved = true
}
// Clean up everything that just became orphaned locally so the
// index database doesn't carry 39k ghost rows after a wipe.
if v.SnapshotManager != nil {
if err := v.SnapshotManager.CleanupOrphanedData(v.ctx); err != nil {
log.Warn("Failed to clean up orphaned local data after bulk removal", "error", err)
}
}
if opts.JSON {
return result, v.outputRemoveJSON(result)
}
@@ -1141,7 +1247,7 @@ func (v *Vaultik) executeRemoveAll(snapshotIDs []string, opts *RemoveOptions) (*
v.printfStdout("Removed %d snapshot(s)\n", len(result.SnapshotsRemoved))
if opts.Remote {
v.printlnStdout("Removed snapshot metadata from remote storage")
v.printlnStdout("\nNote: Blobs were not removed. Run 'vaultik prune' to remove orphaned blobs.")
v.printlnStdout("\nNote: Remote blobs were not removed. Run 'vaultik prune' to remove orphaned blobs.")
}
return result, nil
@@ -1213,9 +1319,13 @@ type PruneResult struct {
// before starting a new backup or on-demand via the prune command.
func (v *Vaultik) PruneDatabase() (*PruneResult, error) {
log.Info("Pruning local database: removing incomplete snapshots and orphaned data")
v.UI.Begin("Pruning local index database (removing incomplete snapshots and orphaned data).")
result := &PruneResult{}
// Snapshot counts before deletion of incompletes.
snapshotCountBefore, _ := v.getTableCount("snapshots")
// First, delete any incomplete snapshots
incompleteSnapshots, err := v.Repositories.Snapshots.GetIncompleteSnapshots(v.ctx)
if err != nil {
@@ -1268,12 +1378,12 @@ func (v *Vaultik) PruneDatabase() (*PruneResult, error) {
"orphaned_blobs", result.BlobsDeleted,
)
// Print summary
v.printfStdout("Local database prune complete:\n")
v.printfStdout(" Incomplete snapshots removed: %d\n", result.SnapshotsDeleted)
v.printfStdout(" Orphaned files removed: %d\n", result.FilesDeleted)
v.printfStdout(" Orphaned chunks removed: %d\n", result.ChunksDeleted)
v.printfStdout(" Orphaned blobs removed: %d\n", result.BlobsDeleted)
snapshotCountAfter := snapshotCountBefore - result.SnapshotsDeleted
v.UI.Complete("Pruned local index database.")
v.UI.Detail("Incomplete snapshots: %d removed (%d remain).", result.SnapshotsDeleted, snapshotCountAfter)
v.UI.Detail("Orphaned files: %d removed (%d remain).", result.FilesDeleted, fileCountAfter)
v.UI.Detail("Orphaned chunks: %d removed (%d remain).", result.ChunksDeleted, chunkCountAfter)
v.UI.Detail("Orphaned blobs: %d removed (%d remain).", result.BlobsDeleted, blobCountAfter)
return result, nil
}

View File

@@ -7,14 +7,15 @@ import (
"io"
"os"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/crypto"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"git.eeqj.de/sneak/vaultik/internal/storage"
"github.com/spf13/afero"
"go.uber.org/fx"
"sneak.berlin/go/vaultik/internal/config"
"sneak.berlin/go/vaultik/internal/crypto"
"sneak.berlin/go/vaultik/internal/database"
"sneak.berlin/go/vaultik/internal/globals"
"sneak.berlin/go/vaultik/internal/snapshot"
"sneak.berlin/go/vaultik/internal/storage"
"sneak.berlin/go/vaultik/internal/ui"
)
// Vaultik contains all dependencies needed for vaultik operations
@@ -37,6 +38,19 @@ type Vaultik struct {
Stdout io.Writer
Stderr io.Writer
Stdin io.Reader
// UI is the writer for user-facing status, progress, warnings, errors.
// See package internal/ui for formatting conventions. Defaults to a
// writer wrapping Stdout; the cli layer replaces it with a discarding
// writer in --cron mode.
UI *ui.Writer
// restoreCacheObserver, if non-nil, is invoked once with the
// restore-side blob disk cache immediately after the cache is
// created and again immediately before it is closed. Only
// internal-package tests set this; the type is unexported so
// callers outside this package can't reach it.
restoreCacheObserver func(*blobDiskCache)
}
// VaultikParams contains all parameters for New that can be provided by fx
@@ -83,6 +97,7 @@ func New(params VaultikParams) *Vaultik {
Stdout: os.Stdout,
Stderr: os.Stderr,
Stdin: os.Stdin,
UI: ui.New(os.Stdout),
}
}
@@ -139,11 +154,6 @@ func (v *Vaultik) printlnStdout(args ...any) {
_, _ = fmt.Fprintln(v.Stdout, args...)
}
// printfStderr writes formatted output to stderr.
func (v *Vaultik) printfStderr(format string, args ...any) {
_, _ = fmt.Fprintf(v.Stderr, format, args...)
}
// scanStdin reads a line of input from stdin.
func (v *Vaultik) scanStdin(a ...any) (int, error) {
return fmt.Fscanln(v.Stdin, a...)

View File

@@ -10,11 +10,11 @@ import (
"os"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/snapshot"
"github.com/dustin/go-humanize"
"github.com/klauspost/compress/zstd"
_ "github.com/mattn/go-sqlite3"
_ "modernc.org/sqlite"
"sneak.berlin/go/vaultik/internal/log"
"sneak.berlin/go/vaultik/internal/snapshot"
)
// VerifyOptions contains options for the verify command
@@ -57,9 +57,8 @@ func (v *Vaultik) RunDeepVerify(snapshotID string, opts *VerifyOptions) error {
}
if !v.CanDecrypt() {
return v.deepVerifyFailure(result, opts,
"VAULTIK_AGE_SECRET_KEY environment variable not set - required for deep verification",
fmt.Errorf("VAULTIK_AGE_SECRET_KEY environment variable not set - required for deep verification"))
msg := "VAULTIK_AGE_SECRET_KEY not set; required for deep verification"
return v.deepVerifyFailure(result, opts, msg, fmt.Errorf("%s", msg))
}
log.Info("Starting snapshot verification", "snapshot_id", snapshotID, "mode", "deep")
@@ -258,7 +257,7 @@ func (v *Vaultik) decryptAndLoadDatabase(reader io.ReadCloser, secretKey string)
log.Info("Database decompressed", "size", humanize.Bytes(uint64(written)))
// Open the database
db, err := sql.Open("sqlite3", tempPath)
db, err := sql.Open("sqlite", tempPath)
if err != nil {
_ = os.Remove(tempPath)
return nil, fmt.Errorf("failed to open database: %w", err)

View File

@@ -0,0 +1,92 @@
package vaultik_test
import (
"bytes"
"crypto/rand"
"crypto/sha256"
"encoding/hex"
"io"
"testing"
"github.com/klauspost/compress/zstd"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/vaultik/internal/crypto"
)
// TestTeeReaderWithDecryption tests that TeeReader correctly hashes all encrypted
// bytes when streaming through age decryption and zstd decompression.
// This validates the verification path: hash encrypted blob -> decrypt -> decompress.
func TestTeeReaderWithDecryption(t *testing.T) {
// Test data - use random data that doesn't compress well (5MB)
testData := make([]byte, 5*1024*1024)
_, err := rand.Read(testData)
require.NoError(t, err)
// Compress the data
var compressedBuf bytes.Buffer
compressor, err := zstd.NewWriter(&compressedBuf, zstd.WithEncoderLevel(zstd.SpeedDefault))
require.NoError(t, err)
_, err = compressor.Write(testData)
require.NoError(t, err)
err = compressor.Close()
require.NoError(t, err)
// Encrypt the compressed data
testRecipient := "age1cplgrwj77ta54dnmydvvmzn64ltk83ankxl5sww04mrtmu62kv3s89gmvv"
testSecretKey := "AGE-SECRET-KEY-1C77PYNTHXSHNNC6EYR2W52UWYXACXA5JT00J9CCW9986M3XY87PSGP89AQ"
encryptor, err := crypto.NewEncryptor([]string{testRecipient})
require.NoError(t, err)
var encryptedBuf bytes.Buffer
err = encryptor.EncryptStream(&encryptedBuf, bytes.NewReader(compressedBuf.Bytes()))
require.NoError(t, err)
encryptedData := encryptedBuf.Bytes()
// Calculate the expected hash of the encrypted data directly
expectedHash := sha256.Sum256(encryptedData)
expectedHashStr := hex.EncodeToString(expectedHash[:])
t.Logf("Encrypted data size: %d bytes", len(encryptedData))
t.Logf("Expected hash: %s", expectedHashStr)
// Now simulate what verifyBlob does: use TeeReader to hash while decrypting
decryptor, err := crypto.NewDecryptor(testSecretKey)
require.NoError(t, err)
// Create hasher and tee reader
hasher := sha256.New()
reader := bytes.NewReader(encryptedData)
teeReader := io.TeeReader(reader, hasher)
// Decrypt through the tee reader
decryptedReader, err := decryptor.DecryptStream(teeReader)
require.NoError(t, err)
// Decompress
decompressor, err := zstd.NewReader(decryptedReader)
require.NoError(t, err)
defer decompressor.Close()
// Read all decompressed data (simulating chunk verification)
decompressedData, err := io.ReadAll(decompressor)
require.NoError(t, err)
// Verify we got the original data back
assert.Equal(t, testData, decompressedData, "Decompressed data should match original")
// Drain remaining decompressed data (should be 0)
remaining, err := io.Copy(io.Discard, decompressor)
require.NoError(t, err)
assert.Equal(t, int64(0), remaining, "No remaining decompressed data")
// Calculate hash from tee reader
calculatedHashStr := hex.EncodeToString(hasher.Sum(nil))
t.Logf("Calculated hash (before drain): %s", calculatedHashStr)
// Verify the hash matches the direct hash of encrypted data
assert.Equal(t, expectedHashStr, calculatedHashStr,
"Hash calculated via TeeReader should match direct hash of encrypted data")
}

View File

@@ -20,9 +20,6 @@ s3:
region: us-east-1
use_ssl: true
part_size: 5242880 # 5MB
backup_interval: 1h
full_scan_interval: 24h
min_time_between_run: 15m
index_path: /tmp/vaultik-test.sqlite
chunk_size: 10MB
blob_size_limit: 10GB

View File

@@ -17,9 +17,6 @@ s3:
region: us-east-1
use_ssl: false
part_size: 5242880 # 5MB
backup_interval: 1h
full_scan_interval: 24h
min_time_between_run: 15m
index_path: /tmp/vaultik-integration-test.sqlite
chunk_size: 10MB
blob_size_limit: 10GB