Major refactoring: UUID-based storage, streaming architecture, and CLI improvements
This commit represents a significant architectural overhaul of vaultik: Database Schema Changes: - Switch files table to use UUID primary keys instead of path-based keys - Add UUID primary keys to blobs table for immediate chunk association - Update all foreign key relationships to use UUIDs - Add comprehensive schema documentation in DATAMODEL.md - Add SQLite busy timeout handling for concurrent operations Streaming and Performance Improvements: - Implement true streaming blob packing without intermediate storage - Add streaming chunk processing to reduce memory usage - Improve progress reporting with real-time metrics - Add upload metrics tracking in new uploads table CLI Refactoring: - Restructure CLI to use subcommands: snapshot create/list/purge/verify - Add store info command for S3 configuration display - Add custom duration parser supporting days/weeks/months/years - Remove old backup.go in favor of enhanced snapshot.go - Add --cron flag for silent operation Configuration Changes: - Remove unused index_prefix configuration option - Add support for snapshot pruning retention policies - Improve configuration validation and error messages Testing Improvements: - Add comprehensive repository tests with edge cases - Add cascade delete debugging tests - Fix concurrent operation tests to use SQLite busy timeout - Remove tolerance for SQLITE_BUSY errors in tests Documentation: - Add MIT LICENSE file - Update README with new command structure - Add comprehensive DATAMODEL.md explaining database schema - Update DESIGN.md with UUID-based architecture Other Changes: - Add test-config.yml for testing - Update Makefile with better test output formatting - Fix various race conditions in concurrent operations - Improve error handling throughout
This commit is contained in:
115
README.md
115
README.md
@@ -5,7 +5,21 @@ encrypts data using an `age` public key and uploads each encrypted blob
|
||||
directly to a remote S3-compatible object store. It requires no private
|
||||
keys, secrets, or credentials stored on the backed-up system.
|
||||
|
||||
---
|
||||
It includes table-stakes features such as:
|
||||
|
||||
* modern authenticated encryption
|
||||
* deduplication
|
||||
* incremental backups
|
||||
* modern multithreaded zstd compression with configurable levels
|
||||
* content-addressed immutable storage
|
||||
* local state tracking in standard SQLite database
|
||||
* inotify-based change detection
|
||||
* streaming processing of all data to not require lots of ram or temp file
|
||||
storage
|
||||
* no mutable remote metadata
|
||||
* no plaintext file paths or metadata stored in remote
|
||||
* does not create huge numbers of small files (to keep S3 operation counts
|
||||
down) even if the source system has many small files
|
||||
|
||||
## what
|
||||
|
||||
@@ -15,27 +29,29 @@ Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
|
||||
encrypted with `age`, and uploaded directly to remote storage under a
|
||||
content-addressed S3 path.
|
||||
|
||||
No plaintext file contents ever hit disk. No private key is needed or stored
|
||||
locally. All encrypted data is streaming-processed and immediately discarded
|
||||
once uploaded. Metadata is encrypted and pushed with the same mechanism.
|
||||
No plaintext file contents ever hit disk. No private key or secret
|
||||
passphrase is needed or stored locally. All encrypted data is
|
||||
streaming-processed and immediately discarded once uploaded. Metadata is
|
||||
encrypted and pushed with the same mechanism.
|
||||
|
||||
## why
|
||||
|
||||
Existing backup software fails under one or more of these conditions:
|
||||
|
||||
* Requires secrets (passwords, private keys) on the source system
|
||||
* Requires secrets (passwords, private keys) on the source system, which
|
||||
compromises encrypted backups in the case of host system compromise
|
||||
* Depends on symmetric encryption unsuitable for zero-trust environments
|
||||
* Stages temporary archives or repositories
|
||||
* Writes plaintext metadata or plaintext file paths
|
||||
* Creates one-blob-per-file, which results in excessive S3 operation counts
|
||||
|
||||
`vaultik` addresses all of these by using:
|
||||
`vaultik` addresses these by using:
|
||||
|
||||
* Public-key-only encryption (via `age`) requires no secrets (other than
|
||||
bucket access key) on the source system
|
||||
* Blob-level deduplication and batching
|
||||
* Local state cache for incremental detection
|
||||
* S3-native chunked upload interface
|
||||
* Self-contained encrypted snapshot metadata
|
||||
remote storage api key) on the source system
|
||||
* Local state cache for incremental detection does not require reading from
|
||||
or decrypting remote storage
|
||||
* Content-addressed immutable storage allows efficient deduplication
|
||||
* Storage only of large encrypted blobs of configurable size (1G by default)
|
||||
reduces S3 operation counts and improves performance
|
||||
|
||||
## how
|
||||
|
||||
@@ -63,6 +79,7 @@ Existing backup software fails under one or more of these conditions:
|
||||
- '*.tmp'
|
||||
age_recipient: age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
|
||||
s3:
|
||||
# endpoint is optional if using AWS S3, but who even does that?
|
||||
endpoint: https://s3.example.com
|
||||
bucket: vaultik-data
|
||||
prefix: host1/
|
||||
@@ -73,24 +90,30 @@ Existing backup software fails under one or more of these conditions:
|
||||
full_scan_interval: 24h # normally we use inotify to mark dirty, but
|
||||
# every 24h we do a full stat() scan
|
||||
min_time_between_run: 15m # again, only for daemon mode
|
||||
index_path: /var/lib/vaultik/index.sqlite
|
||||
#index_path: /var/lib/vaultik/index.sqlite
|
||||
chunk_size: 10MB
|
||||
blob_size_limit: 10GB
|
||||
index_prefix: index/
|
||||
```
|
||||
|
||||
4. **run**
|
||||
|
||||
```sh
|
||||
vaultik backup /etc/vaultik.yaml
|
||||
vaultik --config /etc/vaultik.yaml snapshot create
|
||||
```
|
||||
|
||||
```sh
|
||||
vaultik backup /etc/vaultik.yaml --cron # silent unless error
|
||||
vaultik --config /etc/vaultik.yaml snapshot create --cron # silent unless error
|
||||
```
|
||||
|
||||
```sh
|
||||
vaultik backup /etc/vaultik.yaml --daemon # runs in background, uses inotify
|
||||
vaultik --config /etc/vaultik.yaml snapshot daemon # runs continuously in foreground, uses inotify to detect changes
|
||||
|
||||
# TODO
|
||||
* make sure daemon mode does not make a snapshot if no files have
|
||||
changed, even if the backup_interval has passed
|
||||
* in daemon mode, if we are long enough since the last snapshot event, and we get
|
||||
an inotify event, we should schedule the next snapshot creation for 10 minutes from the
|
||||
time of the mark-dirty event.
|
||||
```
|
||||
|
||||
---
|
||||
@@ -100,26 +123,48 @@ Existing backup software fails under one or more of these conditions:
|
||||
### commands
|
||||
|
||||
```sh
|
||||
vaultik backup [--config <path>] [--cron] [--daemon]
|
||||
vaultik [--config <path>] snapshot create [--cron] [--daemon]
|
||||
vaultik [--config <path>] snapshot list [--json]
|
||||
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
|
||||
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
|
||||
vaultik [--config <path>] store info
|
||||
# FIXME: remove 'bucket' and 'prefix' and 'snapshot' flags. it should be
|
||||
# 'vaultik restore snapshot <snapshot> --target <dir>'. bucket and prefix are always
|
||||
# from config file.
|
||||
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
|
||||
# FIXME: remove prune, it's the old version of "snapshot purge"
|
||||
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
|
||||
# FIXME: change fetch to 'vaultik restore path <snapshot> <path> --target <path>'
|
||||
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
|
||||
# FIXME: remove this, it's redundant with 'snapshot verify'
|
||||
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
|
||||
```
|
||||
|
||||
### environment
|
||||
|
||||
* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
|
||||
* `VAULTIK_CONFIG`: Optional path to config file. If set, `vaultik backup` can be run without specifying the config file path.
|
||||
* `VAULTIK_CONFIG`: Optional path to config file. If set, config file path doesn't need to be specified on the command line.
|
||||
|
||||
### command details
|
||||
|
||||
**backup**: Perform incremental backup of configured directories
|
||||
**snapshot create**: Perform incremental backup of configured directories
|
||||
* Config is located at `/etc/vaultik/config.yml` by default
|
||||
* `--config`: Override config file path
|
||||
* `--cron`: Silent unless error (for crontab)
|
||||
* `--daemon`: Run continuously with inotify monitoring and periodic scans
|
||||
|
||||
**snapshot list**: List all snapshots with their timestamps and sizes
|
||||
* `--json`: Output in JSON format
|
||||
|
||||
**snapshot purge**: Remove old snapshots based on criteria
|
||||
* `--keep-latest`: Keep only the most recent snapshot
|
||||
* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
|
||||
* `--force`: Skip confirmation prompt
|
||||
|
||||
**snapshot verify**: Verify snapshot integrity
|
||||
* `--deep`: Download and verify blob hashes (not just existence)
|
||||
|
||||
**store info**: Display S3 bucket configuration and storage statistics
|
||||
|
||||
**restore**: Restore entire snapshot to target directory
|
||||
* Downloads and decrypts metadata
|
||||
* Fetches only required blobs
|
||||
@@ -245,41 +290,23 @@ This enables garbage collection from immutable storage.
|
||||
|
||||
---
|
||||
|
||||
## license
|
||||
## LICENSE
|
||||
|
||||
WTFPL — see LICENSE.
|
||||
[MIT](https://opensource.org/license/mit/)
|
||||
|
||||
---
|
||||
|
||||
## security considerations
|
||||
|
||||
* Source host compromise cannot decrypt backups
|
||||
* No replay attacks possible (append-only)
|
||||
* Each blob independently encrypted
|
||||
* Metadata tampering detectable via hash verification
|
||||
* S3 credentials only allow write access to backup prefix
|
||||
|
||||
## performance
|
||||
|
||||
* Streaming processing (no temp files)
|
||||
* Parallel blob uploads
|
||||
* Deduplication reduces storage and bandwidth
|
||||
* Local index enables fast incremental detection
|
||||
* Configurable compression levels
|
||||
|
||||
## requirements
|
||||
|
||||
* Go 1.24.4 or later
|
||||
* S3-compatible object storage
|
||||
* age command-line tool (for key generation)
|
||||
* SQLite3
|
||||
* Sufficient disk space for local index
|
||||
* Sufficient disk space for local index (typically <1GB)
|
||||
|
||||
## author
|
||||
|
||||
Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.
|
||||
|
||||
Released as a free software gift to the world, no strings attached, under the [WTFPL](https://www.wtfpl.net/) license.
|
||||
Released as a free software gift to the world, no strings attached.
|
||||
|
||||
Contact: [sneak@sneak.berlin](mailto:sneak@sneak.berlin)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user