Go to file
sneak 0df07790ba Document complete vaultik architecture and implementation plan
- Expand README with full CLI documentation, architecture details, and features
- Add comprehensive 87-step implementation plan to DESIGN.md
- Document all commands, configuration options, and security considerations
- Define complete API signatures and data structures
2025-07-20 09:04:31 +02:00
AGENTS.md initial design 2025-07-20 08:51:38 +02:00
CLAUDE.md initial design 2025-07-20 08:51:38 +02:00
DESIGN.md Document complete vaultik architecture and implementation plan 2025-07-20 09:04:31 +02:00
go.mod initial design 2025-07-20 08:51:38 +02:00
README.md Document complete vaultik architecture and implementation plan 2025-07-20 09:04:31 +02:00

vaultik

vaultik is a incremental backup daemon written in Go. It encrypts data using an age public key and uploads each encrypted blob directly to a remote S3-compatible object store. It requires no private keys, secrets, or credentials stored on the backed-up system.


what

vaultik walks a set of configured directories and builds a content-addressable chunk map of changed files using deterministic chunking. Each chunk is streamed into a blob packer. Blobs are compressed with zstd, encrypted with age, and uploaded directly to remote storage under a content-addressed S3 path.

No plaintext file contents ever hit disk. No private key is needed or stored locally. All encrypted data is streaming-processed and immediately discarded once uploaded. Metadata is encrypted and pushed with the same mechanism.

why

Existing backup software fails under one or more of these conditions:

  • Requires secrets (passwords, private keys) on the source system
  • Depends on symmetric encryption unsuitable for zero-trust environments
  • Stages temporary archives or repositories
  • Writes plaintext metadata or plaintext file paths

vaultik addresses all of these by using:

  • Public-key-only encryption (via age) requires no secrets (other than bucket access key) on the source system
  • Blob-level deduplication and batching
  • Local state cache for incremental detection
  • S3-native chunked upload interface
  • Self-contained encrypted snapshot metadata

how

  1. install

    go install git.eeqj.de/sneak/vaultik@latest
    
  2. generate keypair

    age-keygen -o agekey.txt
    grep 'public key:' agekey.txt
    
  3. write config

    source_dirs:
      - /etc
      - /home/user/data
    exclude:
      - '*.log'
      - '*.tmp'
    age_recipient: age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    s3:
      endpoint: https://s3.example.com
      bucket: vaultik-data
      prefix: host1/
      access_key_id: ...
      secret_access_key: ...
      region: us-east-1
    backup_interval: 1h      # only used in daemon mode, not for --cron mode
    full_scan_interval: 24h  # normally we use inotify to mark dirty, but
                             # every 24h we do a full stat() scan
    min_time_between_run: 15m  # again, only for daemon mode
    index_path: /var/lib/vaultik/index.sqlite
    chunk_size: 10MB
    blob_size_limit: 10GB
    index_prefix: index/
    
  4. run

    vaultik backup /etc/vaultik.yaml
    
    vaultik backup /etc/vaultik.yaml --cron # silent unless error
    
    vaultik backup /etc/vaultik.yaml --daemon # runs in background, uses inotify
    

cli

commands

vaultik backup /etc/vaultik.yaml [--cron] [--daemon]
vaultik restore <bucket> <prefix> <snapshot_id> <target_dir>
vaultik prune <bucket> <prefix>
vaultik fetch <bucket> <prefix> <snapshot_id> <filepath> <target_fileordir>
vaultik verify <bucket> <prefix> [<snapshot_id>]

environment

  • VAULTIK_PRIVATE_KEY: Required for restore, prune, fetch, and verify commands. Contains the age private key for decryption.

command details

backup: Perform incremental backup of configured directories

  • --cron: Silent unless error (for crontab)
  • --daemon: Run continuously with inotify monitoring and periodic scans

restore: Restore entire snapshot to target directory

  • Downloads and decrypts metadata
  • Fetches only required blobs
  • Reconstructs directory structure

prune: Remove unreferenced blobs from storage

  • Requires private key
  • Downloads latest snapshot metadata
  • Deletes orphaned blobs

fetch: Extract single file from backup

  • Retrieves specific file without full restore
  • Supports extracting to different filename

verify: Validate backup integrity

  • Checks metadata hash
  • Verifies all referenced blobs exist
  • Validates chunk integrity

architecture

chunking

  • Content-defined chunking using rolling hash (Rabin fingerprint)
  • Average chunk size: 10MB (configurable)
  • Deduplication at chunk level
  • Multiple chunks packed into blobs for efficiency

encryption

  • Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
  • Only public key needed on source host
  • Each blob encrypted independently
  • Metadata databases also encrypted

storage

  • Content-addressed blob storage
  • Immutable append-only design
  • Two-level directory sharding for blobs (aa/bb/hash)
  • Compressed with zstd before encryption

state tracking

  • Local SQLite database for incremental state
  • Tracks file mtimes and chunk mappings
  • Enables efficient change detection
  • Supports inotify monitoring in daemon mode

does not

  • Store any secrets on the backed-up machine
  • Require mutable remote metadata
  • Use tarballs, restic, rsync, or ssh
  • Require a symmetric passphrase or password
  • Trust the source system with anything

does

  • Incremental deduplicated backup
  • Blob-packed chunk encryption
  • Content-addressed immutable blobs
  • Public-key encryption only
  • SQLite-based local and snapshot metadata
  • Fully stream-processed storage

restore

vaultik restore downloads only the snapshot metadata and required blobs. It never contacts the source system. All restore operations depend only on:

  • VAULTIK_PRIVATE_KEY
  • The bucket

The entire system is restore-only from object storage.


features

daemon mode

  • Continuous background operation
  • inotify-based change detection
  • Respects backup_interval and min_time_between_run
  • Full scan every full_scan_interval (default 24h)

cron mode

  • Single backup run
  • Silent output unless errors
  • Ideal for scheduled backups

metadata integrity

  • SHA256 hash of metadata stored separately
  • Encrypted hash file for verification
  • Chunked metadata support for large filesystems

exclusion patterns

  • Glob-based file exclusion
  • Configured in YAML
  • Applied during directory walk

prune

Run vaultik prune on a machine with the private key. It:

  • Downloads the most recent snapshot
  • Decrypts metadata
  • Lists referenced blobs
  • Deletes any blob in the bucket not referenced

This enables garbage collection from immutable storage.


license

WTFPL — see LICENSE.


security considerations

  • Source host compromise cannot decrypt backups
  • No replay attacks possible (append-only)
  • Each blob independently encrypted
  • Metadata tampering detectable via hash verification
  • S3 credentials only allow write access to backup prefix

performance

  • Streaming processing (no temp files)
  • Parallel blob uploads
  • Deduplication reduces storage and bandwidth
  • Local index enables fast incremental detection
  • Configurable compression levels

requirements

  • Go 1.24.4 or later
  • S3-compatible object storage
  • age command-line tool (for key generation)
  • SQLite3
  • Sufficient disk space for local index

author

sneak sneak@sneak.berlin https://sneak.berlin