Go to file

sneak 417b25a5f5 Add custom types, version command, and restore --verify flag - Add internal/types package with type-safe wrappers for IDs, hashes, paths, and credentials (FileID, BlobID, ChunkHash, etc.) - Implement driver.Valuer and sql.Scanner for UUID-based types - Add `vaultik version` command showing version, commit, go version - Add `--verify` flag to restore command that checksums all restored files against expected chunk hashes with progress bar - Remove fetch.go (dead code, functionality in restore) - Clean up TODO.md, remove completed items - Update all database and snapshot code to use new custom types		2026-01-14 17:11:52 -08:00
cmd/vaultik	Add exclude patterns, snapshot prune, and other improvements	2026-01-01 05:42:56 -08:00
docs	Fix manifest generation to not encrypt manifests	2025-07-26 02:54:52 +02:00
internal	Add custom types, version command, and restore --verify flag	2026-01-14 17:11:52 -08:00
test	Add custom types, version command, and restore --verify flag	2026-01-14 17:11:52 -08:00
.gitignore	Refactor CLI to use flags instead of positional arguments	2025-07-20 09:45:24 +02:00
AGENTS.md	initial design	2025-07-20 08:51:38 +02:00
ARCHITECTURE.md	Add ARCHITECTURE.md documenting internal design	2025-12-18 19:49:42 -08:00
CLAUDE.md	Fix FK constraint errors in batched file insertion	2025-12-19 19:48:48 +07:00
config.example.yml	Add custom types, version command, and restore --verify flag	2026-01-14 17:11:52 -08:00
go.mod	Add custom types, version command, and restore --verify flag	2026-01-14 17:11:52 -08:00
go.sum	Add custom types, version command, and restore --verify flag	2026-01-14 17:11:52 -08:00
LICENSE	Major refactoring: UUID-based storage, streaming architecture, and CLI improvements	2025-07-22 14:56:44 +02:00
Makefile	Add custom types, version command, and restore --verify flag	2026-01-14 17:11:52 -08:00
PROCESS.md	Fix FK constraint errors in batched file insertion	2025-12-19 19:48:48 +07:00
README.md	Add custom types, version command, and restore --verify flag	2026-01-14 17:11:52 -08:00
test-config.yml	Major refactoring: UUID-based storage, streaming architecture, and CLI improvements	2025-07-22 14:56:44 +02:00
TODO.md	Add custom types, version command, and restore --verify flag	2026-01-14 17:11:52 -08:00

README.md

vaultik (ваултик)

WIP: pre-1.0, some functions may not be fully implemented yet

vaultik is an incremental backup daemon written in Go. It encrypts data using an age public key and uploads each encrypted blob directly to a remote S3-compatible object store. It requires no private keys, secrets, or credentials (other than those required to PUT to encrypted object storage, such as S3 API keys) stored on the backed-up system.

It includes table-stakes features such as:

modern encryption (the excellent age)
deduplication
incremental backups
modern multithreaded zstd compression with configurable levels
content-addressed immutable storage
local state tracking in standard SQLite database, enables write-only incremental backups to destination
no mutable remote metadata
no plaintext file paths or metadata stored in remote
does not create huge numbers of small files (to keep S3 operation counts down) even if the source system has many small files

why

Existing backup software fails under one or more of these conditions:

Requires secrets (passwords, private keys) on the source system, which compromises encrypted backups in the case of host system compromise
Depends on symmetric encryption unsuitable for zero-trust environments
Creates one-blob-per-file, which results in excessive S3 operation counts
is slow

Other backup tools like restic, borg, and duplicity are designed for environments where the source host can store secrets and has access to decryption keys. I don't want to store backup decryption keys on my hosts, only public keys for encryption.

My requirements are:

open source
no passphrases or private keys on the source host
incremental
compressed
encrypted
s3 compatible without an intermediate step or tool

Surprisingly, no existing tool meets these requirements, so I wrote vaultik.

design goals

Backups must require only a public key on the source host.
No secrets or private keys may exist on the source system.
Restore must be possible using only the backup bucket and a private key.
Prune must be possible (requires private key, done on different hosts).
All encryption uses age (X25519, XChaCha20-Poly1305).
Compression uses zstd at a configurable level.
Files are chunked, and multiple chunks are packed into encrypted blobs to reduce object count for filesystems with many small files.
All metadata (snapshots) is stored remotely as encrypted SQLite DBs.

what

vaultik walks a set of configured directories and builds a content-addressable chunk map of changed files using deterministic chunking. Each chunk is streamed into a blob packer. Blobs are compressed with zstd, encrypted with age, and uploaded directly to remote storage under a content-addressed S3 path. At the end, a pruned snapshot-specific sqlite database of metadata is created, encrypted, and uploaded alongside the blobs.

No plaintext file contents ever hit disk. No private key or secret passphrase is needed or stored locally.

how

install

go install git.eeqj.de/sneak/vaultik@latest

generate keypair

age-keygen -o agekey.txt
grep 'public key:' agekey.txt

write config

# Named snapshots - each snapshot can contain multiple paths
snapshots:
  system:
    paths:
      - /etc
      - /var/lib
    exclude:
      - '*.cache'  # Snapshot-specific exclusions
  home:
    paths:
      - /home/user/documents
      - /home/user/photos

# Global exclusions (apply to all snapshots)
exclude:
  - '*.log'
  - '*.tmp'
  - '.git'
  - 'node_modules'

age_recipients:
  - age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
s3:
  endpoint: https://s3.example.com
  bucket: vaultik-data
  prefix: host1/
  access_key_id: ...
  secret_access_key: ...
  region: us-east-1
backup_interval: 1h
full_scan_interval: 24h
min_time_between_run: 15m
chunk_size: 10MB
blob_size_limit: 1GB

run

# Create all configured snapshots
vaultik --config /etc/vaultik.yaml snapshot create

# Create specific snapshots by name
vaultik --config /etc/vaultik.yaml snapshot create home system

# Silent mode for cron
vaultik --config /etc/vaultik.yaml snapshot create --cron

cli

commands

vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--daemon] [--prune]
vaultik [--config <path>] snapshot list [--json]
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
vaultik [--config <path>] snapshot remove <snapshot-id> [--dry-run] [--force]
vaultik [--config <path>] snapshot prune
vaultik [--config <path>] restore <snapshot-id> <target-dir> [paths...]
vaultik [--config <path>] prune [--dry-run] [--force]
vaultik [--config <path>] info
vaultik [--config <path>] store info

environment

VAULTIK_AGE_SECRET_KEY: Required for restore and deep verify. Contains the age private key for decryption.
VAULTIK_CONFIG: Optional path to config file.

command details

snapshot create: Perform incremental backup of configured snapshots

Config is located at /etc/vaultik/config.yml by default
Optional snapshot names argument to create specific snapshots (default: all)
--cron: Silent unless error (for crontab)
--daemon: Run continuously with inotify monitoring and periodic scans
--prune: Delete old snapshots and orphaned blobs after backup

snapshot list: List all snapshots with their timestamps and sizes

--json: Output in JSON format

snapshot verify: Verify snapshot integrity

--deep: Download and verify blob contents (not just existence)

snapshot purge: Remove old snapshots based on criteria

--keep-latest: Keep only the most recent snapshot
--older-than: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
--force: Skip confirmation prompt

snapshot remove: Remove a specific snapshot

--dry-run: Show what would be deleted without deleting
--force: Skip confirmation prompt

snapshot prune: Clean orphaned data from local database

restore: Restore snapshot to target directory

Requires VAULTIK_AGE_SECRET_KEY environment variable with age private key
Optional path arguments to restore specific files/directories (default: all)
Downloads and decrypts metadata, fetches required blobs, reconstructs files
Preserves file permissions, timestamps, and ownership (ownership requires root)
Handles symlinks and directories

prune: Remove unreferenced blobs from remote storage

Scans all snapshots for referenced blobs
Deletes orphaned blobs

info: Display system and configuration information

store info: Display S3 bucket configuration and storage statistics

architecture

s3 bucket layout

s3://<bucket>/<prefix>/
├── blobs/
│   └── <aa>/<bb>/<full_blob_hash>
└── metadata/
    ├── <snapshot_id>/
    │   ├── db.zst.age
    │   └── manifest.json.zst

blobs/<aa>/<bb>/...: Two-level directory sharding using first 4 hex chars of blob hash
metadata/<snapshot_id>/db.zst.age: Encrypted, compressed SQLite database
metadata/<snapshot_id>/manifest.json.zst: Unencrypted blob list for pruning

blob manifest format

The manifest.json.zst file is unencrypted (compressed JSON) to enable pruning without decryption:

{
  "snapshot_id": "hostname_snapshotname_2025-01-01T12:00:00Z",
  "blob_hashes": [
    "aa1234567890abcdef...",
    "bb2345678901bcdef0..."
  ]
}

Snapshot IDs follow the format <hostname>_<snapshot-name>_<timestamp> (e.g., server1_home_2025-01-01T12:00:00Z).

local sqlite schema

CREATE TABLE files (
  id TEXT PRIMARY KEY,
  path TEXT NOT NULL UNIQUE,
  mtime INTEGER NOT NULL,
  size INTEGER NOT NULL,
  mode INTEGER NOT NULL,
  uid INTEGER NOT NULL,
  gid INTEGER NOT NULL
);

CREATE TABLE file_chunks (
  file_id TEXT NOT NULL,
  idx INTEGER NOT NULL,
  chunk_hash TEXT NOT NULL,
  PRIMARY KEY (file_id, idx),
  FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
);

CREATE TABLE chunks (
  chunk_hash TEXT PRIMARY KEY,
  size INTEGER NOT NULL
);

CREATE TABLE blobs (
  id TEXT PRIMARY KEY,
  blob_hash TEXT NOT NULL UNIQUE,
  uncompressed INTEGER NOT NULL,
  compressed INTEGER NOT NULL,
  uploaded_at INTEGER
);

CREATE TABLE blob_chunks (
  blob_hash TEXT NOT NULL,
  chunk_hash TEXT NOT NULL,
  offset INTEGER NOT NULL,
  length INTEGER NOT NULL,
  PRIMARY KEY (blob_hash, chunk_hash)
);

CREATE TABLE chunk_files (
  chunk_hash TEXT NOT NULL,
  file_id TEXT NOT NULL,
  file_offset INTEGER NOT NULL,
  length INTEGER NOT NULL,
  PRIMARY KEY (chunk_hash, file_id)
);

CREATE TABLE snapshots (
  id TEXT PRIMARY KEY,
  hostname TEXT NOT NULL,
  vaultik_version TEXT NOT NULL,
  started_at INTEGER NOT NULL,
  completed_at INTEGER,
  file_count INTEGER NOT NULL,
  chunk_count INTEGER NOT NULL,
  blob_count INTEGER NOT NULL,
  total_size INTEGER NOT NULL,
  blob_size INTEGER NOT NULL,
  compression_ratio REAL NOT NULL
);

CREATE TABLE snapshot_files (
  snapshot_id TEXT NOT NULL,
  file_id TEXT NOT NULL,
  PRIMARY KEY (snapshot_id, file_id)
);

CREATE TABLE snapshot_blobs (
  snapshot_id TEXT NOT NULL,
  blob_id TEXT NOT NULL,
  blob_hash TEXT NOT NULL,
  PRIMARY KEY (snapshot_id, blob_id)
);

data flow

backup

Load config, open local SQLite index
Walk source directories, check mtime/size against index
For changed/new files: chunk using content-defined chunking
For each chunk: hash, check if already uploaded, add to blob packer
When blob reaches threshold: compress, encrypt, upload to S3
Build snapshot metadata, compress, encrypt, upload
Create blob manifest (unencrypted) for pruning support

restore

Download metadata/<snapshot_id>/db.zst.age
Decrypt and decompress SQLite database
Query files table (optionally filtered by paths)
For each file, get ordered chunk list from file_chunks
Download required blobs, decrypt, decompress
Extract chunks and reconstruct files
Restore permissions, mtime, uid/gid

prune

List all snapshot manifests
Build set of all referenced blob hashes
List all blobs in storage
Delete any blob not in referenced set

chunking

Content-defined chunking using FastCDC algorithm
Average chunk size: configurable (default 10MB)
Deduplication at chunk level
Multiple chunks packed into blobs for efficiency

encryption

Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
Only public key needed on source host
Each blob encrypted independently
Metadata databases also encrypted

compression

zstd compression at configurable level
Applied before encryption
Blob-level compression for efficiency

does not

Store any secrets on the backed-up machine
Require mutable remote metadata
Use tarballs, restic, rsync, or ssh
Require a symmetric passphrase or password
Trust the source system with anything

does

Incremental deduplicated backup
Blob-packed chunk encryption
Content-addressed immutable blobs
Public-key encryption only
SQLite-based local and snapshot metadata
Fully stream-processed storage

requirements

Go 1.24 or later
S3-compatible object storage
Sufficient disk space for local index (typically <1GB)

license

MIT

author

Made with love and lots of expensive SOTA AI by sneak in Berlin in the summer of 2025.

Released as a free software gift to the world, no strings attached.

Contact: sneak@sneak.berlin

https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2