Go to file
sneak d7cd9aac27 Add end-to-end integration tests for Vaultik
- Create comprehensive integration tests with mock S3 client
- Add in-memory filesystem and SQLite database support for testing
- Test full backup workflow including chunking, packing, and uploading
- Add test to verify encrypted blob content
- Fix scanner to use afero filesystem for temp file cleanup
- Demonstrate successful backup and verification with mock dependencies
2025-07-26 15:52:23 +02:00
cmd/vaultik Implement local SQLite index database with repositories 2025-07-20 10:26:15 +02:00
docs Fix manifest generation to not encrypt manifests 2025-07-26 02:54:52 +02:00
internal Add end-to-end integration tests for Vaultik 2025-07-26 15:52:23 +02:00
test Major refactoring: UUID-based storage, streaming architecture, and CLI improvements 2025-07-22 14:56:44 +02:00
.gitignore Refactor CLI to use flags instead of positional arguments 2025-07-20 09:45:24 +02:00
AGENTS.md initial design 2025-07-20 08:51:38 +02:00
CLAUDE.md Major refactoring: UUID-based storage, streaming architecture, and CLI improvements 2025-07-22 14:56:44 +02:00
config.example.yml Major refactoring: UUID-based storage, streaming architecture, and CLI improvements 2025-07-22 14:56:44 +02:00
DESIGN.md Major refactoring: UUID-based storage, streaming architecture, and CLI improvements 2025-07-22 14:56:44 +02:00
go.mod Refactor: Move Vaultik struct and methods to internal/vaultik package 2025-07-26 14:47:26 +02:00
go.sum Refactor: Move Vaultik struct and methods to internal/vaultik package 2025-07-26 14:47:26 +02:00
LICENSE Major refactoring: UUID-based storage, streaming architecture, and CLI improvements 2025-07-22 14:56:44 +02:00
Makefile Major refactoring: UUID-based storage, streaming architecture, and CLI improvements 2025-07-22 14:56:44 +02:00
README.md Major refactoring: UUID-based storage, streaming architecture, and CLI improvements 2025-07-22 14:56:44 +02:00
test-config.yml Major refactoring: UUID-based storage, streaming architecture, and CLI improvements 2025-07-22 14:56:44 +02:00
TODO-verify.md Refactor: Move Vaultik struct and methods to internal/vaultik package 2025-07-26 14:47:26 +02:00
TODO.md Refactor blob storage to use UUID primary keys and implement streaming chunking 2025-07-22 07:43:39 +02:00

vaultik (ваултик)

vaultik is a incremental backup daemon written in Go. It encrypts data using an age public key and uploads each encrypted blob directly to a remote S3-compatible object store. It requires no private keys, secrets, or credentials stored on the backed-up system.

It includes table-stakes features such as:

  • modern authenticated encryption
  • deduplication
  • incremental backups
  • modern multithreaded zstd compression with configurable levels
  • content-addressed immutable storage
  • local state tracking in standard SQLite database
  • inotify-based change detection
  • streaming processing of all data to not require lots of ram or temp file storage
  • no mutable remote metadata
  • no plaintext file paths or metadata stored in remote
  • does not create huge numbers of small files (to keep S3 operation counts down) even if the source system has many small files

what

vaultik walks a set of configured directories and builds a content-addressable chunk map of changed files using deterministic chunking. Each chunk is streamed into a blob packer. Blobs are compressed with zstd, encrypted with age, and uploaded directly to remote storage under a content-addressed S3 path.

No plaintext file contents ever hit disk. No private key or secret passphrase is needed or stored locally. All encrypted data is streaming-processed and immediately discarded once uploaded. Metadata is encrypted and pushed with the same mechanism.

why

Existing backup software fails under one or more of these conditions:

  • Requires secrets (passwords, private keys) on the source system, which compromises encrypted backups in the case of host system compromise
  • Depends on symmetric encryption unsuitable for zero-trust environments
  • Creates one-blob-per-file, which results in excessive S3 operation counts

vaultik addresses these by using:

  • Public-key-only encryption (via age) requires no secrets (other than remote storage api key) on the source system
  • Local state cache for incremental detection does not require reading from or decrypting remote storage
  • Content-addressed immutable storage allows efficient deduplication
  • Storage only of large encrypted blobs of configurable size (1G by default) reduces S3 operation counts and improves performance

how

  1. install

    go install git.eeqj.de/sneak/vaultik@latest
    
  2. generate keypair

    age-keygen -o agekey.txt
    grep 'public key:' agekey.txt
    
  3. write config

    source_dirs:
      - /etc
      - /home/user/data
    exclude:
      - '*.log'
      - '*.tmp'
    age_recipient: age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
    s3:
      # endpoint is optional if using AWS S3, but who even does that?
      endpoint: https://s3.example.com
      bucket: vaultik-data
      prefix: host1/
      access_key_id: ...
      secret_access_key: ...
      region: us-east-1
    backup_interval: 1h      # only used in daemon mode, not for --cron mode
    full_scan_interval: 24h  # normally we use inotify to mark dirty, but
                             # every 24h we do a full stat() scan
    min_time_between_run: 15m  # again, only for daemon mode
    #index_path: /var/lib/vaultik/index.sqlite
    chunk_size: 10MB
    blob_size_limit: 10GB
    
  4. run

    vaultik --config /etc/vaultik.yaml snapshot create
    
    vaultik --config /etc/vaultik.yaml snapshot create --cron # silent unless error
    
    vaultik --config /etc/vaultik.yaml snapshot daemon # runs continuously in foreground, uses inotify to detect changes
    
    # TODO
    * make sure daemon mode does not make a snapshot if no files have
      changed, even if the backup_interval has passed
    * in daemon mode, if we are long enough since the last snapshot event, and we get
      an inotify event, we should schedule the next snapshot creation for 10 minutes from the
      time of the mark-dirty event.
    

cli

commands

vaultik [--config <path>] snapshot create [--cron] [--daemon]
vaultik [--config <path>] snapshot list [--json]
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
vaultik [--config <path>] store info
# FIXME: remove 'bucket' and 'prefix' and 'snapshot' flags.  it should be
# 'vaultik restore snapshot <snapshot> --target <dir>'.  bucket and prefix are always
# from config file.
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
# FIXME: remove prune, it's the old version of "snapshot purge"
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
# FIXME: change fetch to 'vaultik restore path <snapshot> <path> --target <path>'
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
# FIXME: remove this, it's redundant with 'snapshot verify'
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]

environment

  • VAULTIK_PRIVATE_KEY: Required for restore, prune, fetch, and verify commands. Contains the age private key for decryption.
  • VAULTIK_CONFIG: Optional path to config file. If set, config file path doesn't need to be specified on the command line.

command details

snapshot create: Perform incremental backup of configured directories

  • Config is located at /etc/vaultik/config.yml by default
  • --cron: Silent unless error (for crontab)
  • --daemon: Run continuously with inotify monitoring and periodic scans

snapshot list: List all snapshots with their timestamps and sizes

  • --json: Output in JSON format

snapshot purge: Remove old snapshots based on criteria

  • --keep-latest: Keep only the most recent snapshot
  • --older-than: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
  • --force: Skip confirmation prompt

snapshot verify: Verify snapshot integrity

  • --deep: Download and verify blob hashes (not just existence)

store info: Display S3 bucket configuration and storage statistics

restore: Restore entire snapshot to target directory

  • Downloads and decrypts metadata
  • Fetches only required blobs
  • Reconstructs directory structure

prune: Remove unreferenced blobs from storage

  • Requires private key
  • Downloads latest snapshot metadata
  • Deletes orphaned blobs

fetch: Extract single file from backup

  • Retrieves specific file without full restore
  • Supports extracting to different filename

verify: Validate backup integrity

  • Checks metadata hash
  • Verifies all referenced blobs exist
  • Default: Downloads blobs and validates chunk integrity
  • --quick: Only checks blob existence and S3 content hashes

architecture

chunking

  • Content-defined chunking using rolling hash (Rabin fingerprint)
  • Average chunk size: 10MB (configurable)
  • Deduplication at chunk level
  • Multiple chunks packed into blobs for efficiency

encryption

  • Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
  • Only public key needed on source host
  • Each blob encrypted independently
  • Metadata databases also encrypted

storage

  • Content-addressed blob storage
  • Immutable append-only design
  • Two-level directory sharding for blobs (aa/bb/hash)
  • Compressed with zstd before encryption

state tracking

  • Local SQLite database for incremental state
  • Tracks file mtimes and chunk mappings
  • Enables efficient change detection
  • Supports inotify monitoring in daemon mode

does not

  • Store any secrets on the backed-up machine
  • Require mutable remote metadata
  • Use tarballs, restic, rsync, or ssh
  • Require a symmetric passphrase or password
  • Trust the source system with anything

does

  • Incremental deduplicated backup
  • Blob-packed chunk encryption
  • Content-addressed immutable blobs
  • Public-key encryption only
  • SQLite-based local and snapshot metadata
  • Fully stream-processed storage

restore

vaultik restore downloads only the snapshot metadata and required blobs. It never contacts the source system. All restore operations depend only on:

  • VAULTIK_PRIVATE_KEY
  • The bucket

The entire system is restore-only from object storage.


features

daemon mode

  • Continuous background operation
  • inotify-based change detection
  • Respects backup_interval and min_time_between_run
  • Full scan every full_scan_interval (default 24h)

cron mode

  • Single backup run
  • Silent output unless errors
  • Ideal for scheduled backups

metadata integrity

  • SHA256 hash of metadata stored separately
  • Encrypted hash file for verification
  • Chunked metadata support for large filesystems

exclusion patterns

  • Glob-based file exclusion
  • Configured in YAML
  • Applied during directory walk

prune

Run vaultik prune on a machine with the private key. It:

  • Downloads the most recent snapshot
  • Decrypts metadata
  • Lists referenced blobs
  • Deletes any blob in the bucket not referenced

This enables garbage collection from immutable storage.


LICENSE

MIT


requirements

  • Go 1.24.4 or later
  • S3-compatible object storage
  • Sufficient disk space for local index (typically <1GB)

author

Made with love and lots of expensive SOTA AI by sneak in Berlin in the summer of 2025.

Released as a free software gift to the world, no strings attached.

Contact: sneak@sneak.berlin

https://keys.openpgp.org/vks/v1/by-fingerprint/5539AD00DE4C42F3AFE11575052443F4DF2A55C2