Document complete vaultik architecture and implementation plan
- Expand README with full CLI documentation, architecture details, and features - Add comprehensive 87-step implementation plan to DESIGN.md - Document all commands, configuration options, and security considerations - Define complete API signatures and data structures
This commit is contained in:
parent
67319a4699
commit
0df07790ba
117
DESIGN.md
117
DESIGN.md
@ -359,4 +359,119 @@ func RunPrune(bucket, prefix, privateKey string) error
|
|||||||
|
|
||||||
## Implementation TODO
|
## Implementation TODO
|
||||||
|
|
||||||
To be completed by claude
|
### Phase 1: Core Infrastructure
|
||||||
|
1. Set up Go module and project structure
|
||||||
|
2. Create Makefile with test, fmt, and lint targets
|
||||||
|
3. Set up cobra CLI skeleton with all commands
|
||||||
|
4. Implement config loading and validation from YAML
|
||||||
|
5. Create data structures for FileInfo, ChunkInfo, BlobInfo, etc.
|
||||||
|
|
||||||
|
### Phase 2: Local Index Database
|
||||||
|
6. Implement SQLite schema creation and migrations
|
||||||
|
7. Create Index type with all database operations
|
||||||
|
8. Add transaction support and proper locking
|
||||||
|
9. Implement file tracking (save, lookup, delete)
|
||||||
|
10. Implement chunk tracking and deduplication
|
||||||
|
11. Implement blob tracking and chunk-to-blob mapping
|
||||||
|
12. Write tests for all index operations
|
||||||
|
|
||||||
|
### Phase 3: Chunking and Hashing
|
||||||
|
13. Implement Rabin fingerprint chunker
|
||||||
|
14. Create streaming chunk processor
|
||||||
|
15. Implement SHA256 hashing for chunks
|
||||||
|
16. Add configurable chunk size parameters
|
||||||
|
17. Write tests for chunking consistency
|
||||||
|
|
||||||
|
### Phase 4: Compression and Encryption
|
||||||
|
18. Implement zstd compression wrapper
|
||||||
|
19. Integrate age encryption library
|
||||||
|
20. Create Encryptor type for public key encryption
|
||||||
|
21. Create Decryptor type for private key decryption
|
||||||
|
22. Implement streaming encrypt/decrypt pipelines
|
||||||
|
23. Write tests for compression and encryption
|
||||||
|
|
||||||
|
### Phase 5: Blob Packing
|
||||||
|
24. Implement BlobWriter with size limits
|
||||||
|
25. Add chunk accumulation and flushing
|
||||||
|
26. Create blob hash calculation
|
||||||
|
27. Implement proper error handling and rollback
|
||||||
|
28. Write tests for blob packing scenarios
|
||||||
|
|
||||||
|
### Phase 6: S3 Operations
|
||||||
|
29. Integrate MinIO client library
|
||||||
|
30. Implement S3Client wrapper type
|
||||||
|
31. Add multipart upload support for large blobs
|
||||||
|
32. Implement retry logic with exponential backoff
|
||||||
|
33. Add connection pooling and timeout handling
|
||||||
|
34. Write tests using MinIO container
|
||||||
|
|
||||||
|
### Phase 7: Backup Command - Basic
|
||||||
|
35. Implement directory walking with exclusion patterns
|
||||||
|
36. Add file change detection using index
|
||||||
|
37. Integrate chunking pipeline for changed files
|
||||||
|
38. Implement blob upload coordination
|
||||||
|
39. Add progress reporting to stderr
|
||||||
|
40. Write integration tests for backup
|
||||||
|
|
||||||
|
### Phase 8: Snapshot Metadata
|
||||||
|
41. Implement snapshot metadata extraction from index
|
||||||
|
42. Create SQLite snapshot database builder
|
||||||
|
43. Add metadata compression and encryption
|
||||||
|
44. Implement metadata chunking for large snapshots
|
||||||
|
45. Add hash calculation and verification
|
||||||
|
46. Implement metadata upload to S3
|
||||||
|
47. Write tests for metadata operations
|
||||||
|
|
||||||
|
### Phase 9: Restore Command
|
||||||
|
48. Implement snapshot listing and selection
|
||||||
|
49. Add metadata download and reconstruction
|
||||||
|
50. Implement hash verification for metadata
|
||||||
|
51. Create file restoration logic with chunk retrieval
|
||||||
|
52. Add blob caching for efficiency
|
||||||
|
53. Implement proper file permissions and mtime restoration
|
||||||
|
54. Write integration tests for restore
|
||||||
|
|
||||||
|
### Phase 10: Prune Command
|
||||||
|
55. Implement latest snapshot detection
|
||||||
|
56. Add referenced blob extraction from metadata
|
||||||
|
57. Create S3 blob listing and comparison
|
||||||
|
58. Implement safe deletion of unreferenced blobs
|
||||||
|
59. Add dry-run mode for safety
|
||||||
|
60. Write tests for prune scenarios
|
||||||
|
|
||||||
|
### Phase 11: Verify Command
|
||||||
|
61. Implement metadata integrity checking
|
||||||
|
62. Add blob existence verification
|
||||||
|
63. Create optional deep verification mode
|
||||||
|
64. Implement detailed error reporting
|
||||||
|
65. Write tests for verification
|
||||||
|
|
||||||
|
### Phase 12: Fetch Command
|
||||||
|
66. Implement single-file metadata query
|
||||||
|
67. Add minimal blob downloading for file
|
||||||
|
68. Create streaming file reconstruction
|
||||||
|
69. Add support for output redirection
|
||||||
|
70. Write tests for fetch command
|
||||||
|
|
||||||
|
### Phase 13: Daemon Mode
|
||||||
|
71. Implement inotify watcher for Linux
|
||||||
|
72. Add dirty path tracking in index
|
||||||
|
73. Create periodic full scan scheduler
|
||||||
|
74. Implement backup interval enforcement
|
||||||
|
75. Add proper signal handling and shutdown
|
||||||
|
76. Write tests for daemon behavior
|
||||||
|
|
||||||
|
### Phase 14: Cron Mode
|
||||||
|
77. Implement silent operation mode
|
||||||
|
78. Add proper exit codes for cron
|
||||||
|
79. Implement lock file to prevent concurrent runs
|
||||||
|
80. Add error summary reporting
|
||||||
|
81. Write tests for cron mode
|
||||||
|
|
||||||
|
### Phase 15: Finalization
|
||||||
|
82. Add comprehensive logging throughout
|
||||||
|
83. Implement proper error wrapping and context
|
||||||
|
84. Add performance metrics collection
|
||||||
|
85. Create end-to-end integration tests
|
||||||
|
86. Write documentation and examples
|
||||||
|
87. Set up CI/CD pipeline
|
||||||
|
115
README.md
115
README.md
@ -97,17 +97,77 @@ Existing backup software fails under one or more of these conditions:
|
|||||||
|
|
||||||
## cli
|
## cli
|
||||||
|
|
||||||
|
### commands
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
vaultik backup /etc/vaultik.yaml
|
vaultik backup /etc/vaultik.yaml [--cron] [--daemon]
|
||||||
vaultik restore <bucket> <prefix> <snapshot_id> <target_dir>
|
vaultik restore <bucket> <prefix> <snapshot_id> <target_dir>
|
||||||
vaultik prune <bucket> <prefix>
|
vaultik prune <bucket> <prefix>
|
||||||
vaultik fetch <bucket> <prefix> <snapshot_id> <filepath> <target_fileordir>
|
vaultik fetch <bucket> <prefix> <snapshot_id> <filepath> <target_fileordir>
|
||||||
|
vaultik verify <bucket> <prefix> [<snapshot_id>]
|
||||||
```
|
```
|
||||||
|
|
||||||
* `VAULTIK_PRIVATE_KEY` must be available in environment for `restore` and `prune`
|
### environment
|
||||||
|
|
||||||
|
* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
|
||||||
|
|
||||||
|
### command details
|
||||||
|
|
||||||
|
**backup**: Perform incremental backup of configured directories
|
||||||
|
* `--cron`: Silent unless error (for crontab)
|
||||||
|
* `--daemon`: Run continuously with inotify monitoring and periodic scans
|
||||||
|
|
||||||
|
**restore**: Restore entire snapshot to target directory
|
||||||
|
* Downloads and decrypts metadata
|
||||||
|
* Fetches only required blobs
|
||||||
|
* Reconstructs directory structure
|
||||||
|
|
||||||
|
**prune**: Remove unreferenced blobs from storage
|
||||||
|
* Requires private key
|
||||||
|
* Downloads latest snapshot metadata
|
||||||
|
* Deletes orphaned blobs
|
||||||
|
|
||||||
|
**fetch**: Extract single file from backup
|
||||||
|
* Retrieves specific file without full restore
|
||||||
|
* Supports extracting to different filename
|
||||||
|
|
||||||
|
**verify**: Validate backup integrity
|
||||||
|
* Checks metadata hash
|
||||||
|
* Verifies all referenced blobs exist
|
||||||
|
* Validates chunk integrity
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## architecture
|
||||||
|
|
||||||
|
### chunking
|
||||||
|
|
||||||
|
* Content-defined chunking using rolling hash (Rabin fingerprint)
|
||||||
|
* Average chunk size: 10MB (configurable)
|
||||||
|
* Deduplication at chunk level
|
||||||
|
* Multiple chunks packed into blobs for efficiency
|
||||||
|
|
||||||
|
### encryption
|
||||||
|
|
||||||
|
* Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
|
||||||
|
* Only public key needed on source host
|
||||||
|
* Each blob encrypted independently
|
||||||
|
* Metadata databases also encrypted
|
||||||
|
|
||||||
|
### storage
|
||||||
|
|
||||||
|
* Content-addressed blob storage
|
||||||
|
* Immutable append-only design
|
||||||
|
* Two-level directory sharding for blobs (aa/bb/hash)
|
||||||
|
* Compressed with zstd before encryption
|
||||||
|
|
||||||
|
### state tracking
|
||||||
|
|
||||||
|
* Local SQLite database for incremental state
|
||||||
|
* Tracks file mtimes and chunk mappings
|
||||||
|
* Enables efficient change detection
|
||||||
|
* Supports inotify monitoring in daemon mode
|
||||||
|
|
||||||
## does not
|
## does not
|
||||||
|
|
||||||
* Store any secrets on the backed-up machine
|
* Store any secrets on the backed-up machine
|
||||||
@ -141,6 +201,33 @@ The entire system is restore-only from object storage.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## features
|
||||||
|
|
||||||
|
### daemon mode
|
||||||
|
|
||||||
|
* Continuous background operation
|
||||||
|
* inotify-based change detection
|
||||||
|
* Respects `backup_interval` and `min_time_between_run`
|
||||||
|
* Full scan every `full_scan_interval` (default 24h)
|
||||||
|
|
||||||
|
### cron mode
|
||||||
|
|
||||||
|
* Single backup run
|
||||||
|
* Silent output unless errors
|
||||||
|
* Ideal for scheduled backups
|
||||||
|
|
||||||
|
### metadata integrity
|
||||||
|
|
||||||
|
* SHA256 hash of metadata stored separately
|
||||||
|
* Encrypted hash file for verification
|
||||||
|
* Chunked metadata support for large filesystems
|
||||||
|
|
||||||
|
### exclusion patterns
|
||||||
|
|
||||||
|
* Glob-based file exclusion
|
||||||
|
* Configured in YAML
|
||||||
|
* Applied during directory walk
|
||||||
|
|
||||||
## prune
|
## prune
|
||||||
|
|
||||||
Run `vaultik prune` on a machine with the private key. It:
|
Run `vaultik prune` on a machine with the private key. It:
|
||||||
@ -160,6 +247,30 @@ WTFPL — see LICENSE.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## security considerations
|
||||||
|
|
||||||
|
* Source host compromise cannot decrypt backups
|
||||||
|
* No replay attacks possible (append-only)
|
||||||
|
* Each blob independently encrypted
|
||||||
|
* Metadata tampering detectable via hash verification
|
||||||
|
* S3 credentials only allow write access to backup prefix
|
||||||
|
|
||||||
|
## performance
|
||||||
|
|
||||||
|
* Streaming processing (no temp files)
|
||||||
|
* Parallel blob uploads
|
||||||
|
* Deduplication reduces storage and bandwidth
|
||||||
|
* Local index enables fast incremental detection
|
||||||
|
* Configurable compression levels
|
||||||
|
|
||||||
|
## requirements
|
||||||
|
|
||||||
|
* Go 1.24.4 or later
|
||||||
|
* S3-compatible object storage
|
||||||
|
* age command-line tool (for key generation)
|
||||||
|
* SQLite3
|
||||||
|
* Sufficient disk space for local index
|
||||||
|
|
||||||
## author
|
## author
|
||||||
|
|
||||||
sneak
|
sneak
|
||||||
|
Loading…
Reference in New Issue
Block a user