Document complete vaultik architecture and implementation plan

- Expand README with full CLI documentation, architecture details, and features
- Add comprehensive 87-step implementation plan to DESIGN.md
- Document all commands, configuration options, and security considerations
- Define complete API signatures and data structures
This commit is contained in:
Jeffrey Paul 2025-07-20 09:04:31 +02:00
parent 67319a4699
commit 0df07790ba
2 changed files with 229 additions and 3 deletions

117
DESIGN.md
View File

@ -359,4 +359,119 @@ func RunPrune(bucket, prefix, privateKey string) error
## Implementation TODO ## Implementation TODO
To be completed by claude ### Phase 1: Core Infrastructure
1. Set up Go module and project structure
2. Create Makefile with test, fmt, and lint targets
3. Set up cobra CLI skeleton with all commands
4. Implement config loading and validation from YAML
5. Create data structures for FileInfo, ChunkInfo, BlobInfo, etc.
### Phase 2: Local Index Database
6. Implement SQLite schema creation and migrations
7. Create Index type with all database operations
8. Add transaction support and proper locking
9. Implement file tracking (save, lookup, delete)
10. Implement chunk tracking and deduplication
11. Implement blob tracking and chunk-to-blob mapping
12. Write tests for all index operations
### Phase 3: Chunking and Hashing
13. Implement Rabin fingerprint chunker
14. Create streaming chunk processor
15. Implement SHA256 hashing for chunks
16. Add configurable chunk size parameters
17. Write tests for chunking consistency
### Phase 4: Compression and Encryption
18. Implement zstd compression wrapper
19. Integrate age encryption library
20. Create Encryptor type for public key encryption
21. Create Decryptor type for private key decryption
22. Implement streaming encrypt/decrypt pipelines
23. Write tests for compression and encryption
### Phase 5: Blob Packing
24. Implement BlobWriter with size limits
25. Add chunk accumulation and flushing
26. Create blob hash calculation
27. Implement proper error handling and rollback
28. Write tests for blob packing scenarios
### Phase 6: S3 Operations
29. Integrate MinIO client library
30. Implement S3Client wrapper type
31. Add multipart upload support for large blobs
32. Implement retry logic with exponential backoff
33. Add connection pooling and timeout handling
34. Write tests using MinIO container
### Phase 7: Backup Command - Basic
35. Implement directory walking with exclusion patterns
36. Add file change detection using index
37. Integrate chunking pipeline for changed files
38. Implement blob upload coordination
39. Add progress reporting to stderr
40. Write integration tests for backup
### Phase 8: Snapshot Metadata
41. Implement snapshot metadata extraction from index
42. Create SQLite snapshot database builder
43. Add metadata compression and encryption
44. Implement metadata chunking for large snapshots
45. Add hash calculation and verification
46. Implement metadata upload to S3
47. Write tests for metadata operations
### Phase 9: Restore Command
48. Implement snapshot listing and selection
49. Add metadata download and reconstruction
50. Implement hash verification for metadata
51. Create file restoration logic with chunk retrieval
52. Add blob caching for efficiency
53. Implement proper file permissions and mtime restoration
54. Write integration tests for restore
### Phase 10: Prune Command
55. Implement latest snapshot detection
56. Add referenced blob extraction from metadata
57. Create S3 blob listing and comparison
58. Implement safe deletion of unreferenced blobs
59. Add dry-run mode for safety
60. Write tests for prune scenarios
### Phase 11: Verify Command
61. Implement metadata integrity checking
62. Add blob existence verification
63. Create optional deep verification mode
64. Implement detailed error reporting
65. Write tests for verification
### Phase 12: Fetch Command
66. Implement single-file metadata query
67. Add minimal blob downloading for file
68. Create streaming file reconstruction
69. Add support for output redirection
70. Write tests for fetch command
### Phase 13: Daemon Mode
71. Implement inotify watcher for Linux
72. Add dirty path tracking in index
73. Create periodic full scan scheduler
74. Implement backup interval enforcement
75. Add proper signal handling and shutdown
76. Write tests for daemon behavior
### Phase 14: Cron Mode
77. Implement silent operation mode
78. Add proper exit codes for cron
79. Implement lock file to prevent concurrent runs
80. Add error summary reporting
81. Write tests for cron mode
### Phase 15: Finalization
82. Add comprehensive logging throughout
83. Implement proper error wrapping and context
84. Add performance metrics collection
85. Create end-to-end integration tests
86. Write documentation and examples
87. Set up CI/CD pipeline

115
README.md
View File

@ -97,17 +97,77 @@ Existing backup software fails under one or more of these conditions:
## cli ## cli
### commands
```sh ```sh
vaultik backup /etc/vaultik.yaml vaultik backup /etc/vaultik.yaml [--cron] [--daemon]
vaultik restore <bucket> <prefix> <snapshot_id> <target_dir> vaultik restore <bucket> <prefix> <snapshot_id> <target_dir>
vaultik prune <bucket> <prefix> vaultik prune <bucket> <prefix>
vaultik fetch <bucket> <prefix> <snapshot_id> <filepath> <target_fileordir> vaultik fetch <bucket> <prefix> <snapshot_id> <filepath> <target_fileordir>
vaultik verify <bucket> <prefix> [<snapshot_id>]
``` ```
* `VAULTIK_PRIVATE_KEY` must be available in environment for `restore` and `prune` ### environment
* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
### command details
**backup**: Perform incremental backup of configured directories
* `--cron`: Silent unless error (for crontab)
* `--daemon`: Run continuously with inotify monitoring and periodic scans
**restore**: Restore entire snapshot to target directory
* Downloads and decrypts metadata
* Fetches only required blobs
* Reconstructs directory structure
**prune**: Remove unreferenced blobs from storage
* Requires private key
* Downloads latest snapshot metadata
* Deletes orphaned blobs
**fetch**: Extract single file from backup
* Retrieves specific file without full restore
* Supports extracting to different filename
**verify**: Validate backup integrity
* Checks metadata hash
* Verifies all referenced blobs exist
* Validates chunk integrity
--- ---
## architecture
### chunking
* Content-defined chunking using rolling hash (Rabin fingerprint)
* Average chunk size: 10MB (configurable)
* Deduplication at chunk level
* Multiple chunks packed into blobs for efficiency
### encryption
* Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
* Only public key needed on source host
* Each blob encrypted independently
* Metadata databases also encrypted
### storage
* Content-addressed blob storage
* Immutable append-only design
* Two-level directory sharding for blobs (aa/bb/hash)
* Compressed with zstd before encryption
### state tracking
* Local SQLite database for incremental state
* Tracks file mtimes and chunk mappings
* Enables efficient change detection
* Supports inotify monitoring in daemon mode
## does not ## does not
* Store any secrets on the backed-up machine * Store any secrets on the backed-up machine
@ -141,6 +201,33 @@ The entire system is restore-only from object storage.
--- ---
## features
### daemon mode
* Continuous background operation
* inotify-based change detection
* Respects `backup_interval` and `min_time_between_run`
* Full scan every `full_scan_interval` (default 24h)
### cron mode
* Single backup run
* Silent output unless errors
* Ideal for scheduled backups
### metadata integrity
* SHA256 hash of metadata stored separately
* Encrypted hash file for verification
* Chunked metadata support for large filesystems
### exclusion patterns
* Glob-based file exclusion
* Configured in YAML
* Applied during directory walk
## prune ## prune
Run `vaultik prune` on a machine with the private key. It: Run `vaultik prune` on a machine with the private key. It:
@ -160,6 +247,30 @@ WTFPL — see LICENSE.
--- ---
## security considerations
* Source host compromise cannot decrypt backups
* No replay attacks possible (append-only)
* Each blob independently encrypted
* Metadata tampering detectable via hash verification
* S3 credentials only allow write access to backup prefix
## performance
* Streaming processing (no temp files)
* Parallel blob uploads
* Deduplication reduces storage and bandwidth
* Local index enables fast incremental detection
* Configurable compression levels
## requirements
* Go 1.24.4 or later
* S3-compatible object storage
* age command-line tool (for key generation)
* SQLite3
* Sufficient disk space for local index
## author ## author
sneak sneak