vaultik/README.md
sneak 3e8b98dec6 Implement CLI skeleton with cobra and fx dependency injection
- Set up cobra CLI with all commands (backup, restore, prune, verify, fetch)
- Integrate uber/fx for dependency injection and lifecycle management
- Add globals package with build-time variables (Version, Commit)
- Implement config loading from YAML with validation
- Create core data models (FileInfo, ChunkInfo, BlobInfo, Snapshot)
- Add Makefile with build, test, lint, and clean targets
- Include minimal test suite for compilation verification
- Update documentation with --quick flag for verify command
- Fix markdown numbering in implementation TODO
2025-07-20 09:34:14 +02:00

280 lines
6.9 KiB
Markdown

# vaultik
`vaultik` is a incremental backup daemon written in Go. It
encrypts data using an `age` public key and uploads each encrypted blob
directly to a remote S3-compatible object store. It requires no private
keys, secrets, or credentials stored on the backed-up system.
---
## what
`vaultik` walks a set of configured directories and builds a
content-addressable chunk map of changed files using deterministic chunking.
Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
encrypted with `age`, and uploaded directly to remote storage under a
content-addressed S3 path.
No plaintext file contents ever hit disk. No private key is needed or stored
locally. All encrypted data is streaming-processed and immediately discarded
once uploaded. Metadata is encrypted and pushed with the same mechanism.
## why
Existing backup software fails under one or more of these conditions:
* Requires secrets (passwords, private keys) on the source system
* Depends on symmetric encryption unsuitable for zero-trust environments
* Stages temporary archives or repositories
* Writes plaintext metadata or plaintext file paths
`vaultik` addresses all of these by using:
* Public-key-only encryption (via `age`) requires no secrets (other than
bucket access key) on the source system
* Blob-level deduplication and batching
* Local state cache for incremental detection
* S3-native chunked upload interface
* Self-contained encrypted snapshot metadata
## how
1. **install**
```sh
go install git.eeqj.de/sneak/vaultik@latest
```
2. **generate keypair**
```sh
age-keygen -o agekey.txt
grep 'public key:' agekey.txt
```
3. **write config**
```yaml
source_dirs:
- /etc
- /home/user/data
exclude:
- '*.log'
- '*.tmp'
age_recipient: age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
s3:
endpoint: https://s3.example.com
bucket: vaultik-data
prefix: host1/
access_key_id: ...
secret_access_key: ...
region: us-east-1
backup_interval: 1h # only used in daemon mode, not for --cron mode
full_scan_interval: 24h # normally we use inotify to mark dirty, but
# every 24h we do a full stat() scan
min_time_between_run: 15m # again, only for daemon mode
index_path: /var/lib/vaultik/index.sqlite
chunk_size: 10MB
blob_size_limit: 10GB
index_prefix: index/
```
4. **run**
```sh
vaultik backup /etc/vaultik.yaml
```
```sh
vaultik backup /etc/vaultik.yaml --cron # silent unless error
```
```sh
vaultik backup /etc/vaultik.yaml --daemon # runs in background, uses inotify
```
---
## cli
### commands
```sh
vaultik backup /etc/vaultik.yaml [--cron] [--daemon]
vaultik restore <bucket> <prefix> <snapshot_id> <target_dir>
vaultik prune <bucket> <prefix>
vaultik fetch <bucket> <prefix> <snapshot_id> <filepath> <target_fileordir>
vaultik verify <bucket> <prefix> [<snapshot_id>]
```
### environment
* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
### command details
**backup**: Perform incremental backup of configured directories
* `--cron`: Silent unless error (for crontab)
* `--daemon`: Run continuously with inotify monitoring and periodic scans
**restore**: Restore entire snapshot to target directory
* Downloads and decrypts metadata
* Fetches only required blobs
* Reconstructs directory structure
**prune**: Remove unreferenced blobs from storage
* Requires private key
* Downloads latest snapshot metadata
* Deletes orphaned blobs
**fetch**: Extract single file from backup
* Retrieves specific file without full restore
* Supports extracting to different filename
**verify**: Validate backup integrity
* Checks metadata hash
* Verifies all referenced blobs exist
* Default: Downloads blobs and validates chunk integrity
* `--quick`: Only checks blob existence and S3 content hashes
---
## architecture
### chunking
* Content-defined chunking using rolling hash (Rabin fingerprint)
* Average chunk size: 10MB (configurable)
* Deduplication at chunk level
* Multiple chunks packed into blobs for efficiency
### encryption
* Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
* Only public key needed on source host
* Each blob encrypted independently
* Metadata databases also encrypted
### storage
* Content-addressed blob storage
* Immutable append-only design
* Two-level directory sharding for blobs (aa/bb/hash)
* Compressed with zstd before encryption
### state tracking
* Local SQLite database for incremental state
* Tracks file mtimes and chunk mappings
* Enables efficient change detection
* Supports inotify monitoring in daemon mode
## does not
* Store any secrets on the backed-up machine
* Require mutable remote metadata
* Use tarballs, restic, rsync, or ssh
* Require a symmetric passphrase or password
* Trust the source system with anything
---
## does
* Incremental deduplicated backup
* Blob-packed chunk encryption
* Content-addressed immutable blobs
* Public-key encryption only
* SQLite-based local and snapshot metadata
* Fully stream-processed storage
---
## restore
`vaultik restore` downloads only the snapshot metadata and required blobs. It
never contacts the source system. All restore operations depend only on:
* `VAULTIK_PRIVATE_KEY`
* The bucket
The entire system is restore-only from object storage.
---
## features
### daemon mode
* Continuous background operation
* inotify-based change detection
* Respects `backup_interval` and `min_time_between_run`
* Full scan every `full_scan_interval` (default 24h)
### cron mode
* Single backup run
* Silent output unless errors
* Ideal for scheduled backups
### metadata integrity
* SHA256 hash of metadata stored separately
* Encrypted hash file for verification
* Chunked metadata support for large filesystems
### exclusion patterns
* Glob-based file exclusion
* Configured in YAML
* Applied during directory walk
## prune
Run `vaultik prune` on a machine with the private key. It:
* Downloads the most recent snapshot
* Decrypts metadata
* Lists referenced blobs
* Deletes any blob in the bucket not referenced
This enables garbage collection from immutable storage.
---
## license
WTFPL — see LICENSE.
---
## security considerations
* Source host compromise cannot decrypt backups
* No replay attacks possible (append-only)
* Each blob independently encrypted
* Metadata tampering detectable via hash verification
* S3 credentials only allow write access to backup prefix
## performance
* Streaming processing (no temp files)
* Parallel blob uploads
* Deduplication reduces storage and bandwidth
* Local index enables fast incremental detection
* Configurable compression levels
## requirements
* Go 1.24.4 or later
* S3-compatible object storage
* age command-line tool (for key generation)
* SQLite3
* Sufficient disk space for local index
## author
sneak
[sneak@sneak.berlin](mailto:sneak@sneak.berlin)
[https://sneak.berlin](https://sneak.berlin)