diff --git a/DESIGN.md b/DESIGN.md index f1675ae..be42233 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -359,4 +359,119 @@ func RunPrune(bucket, prefix, privateKey string) error ## Implementation TODO -To be completed by claude +### Phase 1: Core Infrastructure +1. Set up Go module and project structure +2. Create Makefile with test, fmt, and lint targets +3. Set up cobra CLI skeleton with all commands +4. Implement config loading and validation from YAML +5. Create data structures for FileInfo, ChunkInfo, BlobInfo, etc. + +### Phase 2: Local Index Database +6. Implement SQLite schema creation and migrations +7. Create Index type with all database operations +8. Add transaction support and proper locking +9. Implement file tracking (save, lookup, delete) +10. Implement chunk tracking and deduplication +11. Implement blob tracking and chunk-to-blob mapping +12. Write tests for all index operations + +### Phase 3: Chunking and Hashing +13. Implement Rabin fingerprint chunker +14. Create streaming chunk processor +15. Implement SHA256 hashing for chunks +16. Add configurable chunk size parameters +17. Write tests for chunking consistency + +### Phase 4: Compression and Encryption +18. Implement zstd compression wrapper +19. Integrate age encryption library +20. Create Encryptor type for public key encryption +21. Create Decryptor type for private key decryption +22. Implement streaming encrypt/decrypt pipelines +23. Write tests for compression and encryption + +### Phase 5: Blob Packing +24. Implement BlobWriter with size limits +25. Add chunk accumulation and flushing +26. Create blob hash calculation +27. Implement proper error handling and rollback +28. Write tests for blob packing scenarios + +### Phase 6: S3 Operations +29. Integrate MinIO client library +30. Implement S3Client wrapper type +31. Add multipart upload support for large blobs +32. Implement retry logic with exponential backoff +33. Add connection pooling and timeout handling +34. Write tests using MinIO container + +### Phase 7: Backup Command - Basic +35. Implement directory walking with exclusion patterns +36. Add file change detection using index +37. Integrate chunking pipeline for changed files +38. Implement blob upload coordination +39. Add progress reporting to stderr +40. Write integration tests for backup + +### Phase 8: Snapshot Metadata +41. Implement snapshot metadata extraction from index +42. Create SQLite snapshot database builder +43. Add metadata compression and encryption +44. Implement metadata chunking for large snapshots +45. Add hash calculation and verification +46. Implement metadata upload to S3 +47. Write tests for metadata operations + +### Phase 9: Restore Command +48. Implement snapshot listing and selection +49. Add metadata download and reconstruction +50. Implement hash verification for metadata +51. Create file restoration logic with chunk retrieval +52. Add blob caching for efficiency +53. Implement proper file permissions and mtime restoration +54. Write integration tests for restore + +### Phase 10: Prune Command +55. Implement latest snapshot detection +56. Add referenced blob extraction from metadata +57. Create S3 blob listing and comparison +58. Implement safe deletion of unreferenced blobs +59. Add dry-run mode for safety +60. Write tests for prune scenarios + +### Phase 11: Verify Command +61. Implement metadata integrity checking +62. Add blob existence verification +63. Create optional deep verification mode +64. Implement detailed error reporting +65. Write tests for verification + +### Phase 12: Fetch Command +66. Implement single-file metadata query +67. Add minimal blob downloading for file +68. Create streaming file reconstruction +69. Add support for output redirection +70. Write tests for fetch command + +### Phase 13: Daemon Mode +71. Implement inotify watcher for Linux +72. Add dirty path tracking in index +73. Create periodic full scan scheduler +74. Implement backup interval enforcement +75. Add proper signal handling and shutdown +76. Write tests for daemon behavior + +### Phase 14: Cron Mode +77. Implement silent operation mode +78. Add proper exit codes for cron +79. Implement lock file to prevent concurrent runs +80. Add error summary reporting +81. Write tests for cron mode + +### Phase 15: Finalization +82. Add comprehensive logging throughout +83. Implement proper error wrapping and context +84. Add performance metrics collection +85. Create end-to-end integration tests +86. Write documentation and examples +87. Set up CI/CD pipeline diff --git a/README.md b/README.md index a8ca991..acafc2f 100644 --- a/README.md +++ b/README.md @@ -97,17 +97,77 @@ Existing backup software fails under one or more of these conditions: ## cli +### commands + ```sh -vaultik backup /etc/vaultik.yaml +vaultik backup /etc/vaultik.yaml [--cron] [--daemon] vaultik restore vaultik prune vaultik fetch +vaultik verify [] ``` -* `VAULTIK_PRIVATE_KEY` must be available in environment for `restore` and `prune` +### environment + +* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption. + +### command details + +**backup**: Perform incremental backup of configured directories +* `--cron`: Silent unless error (for crontab) +* `--daemon`: Run continuously with inotify monitoring and periodic scans + +**restore**: Restore entire snapshot to target directory +* Downloads and decrypts metadata +* Fetches only required blobs +* Reconstructs directory structure + +**prune**: Remove unreferenced blobs from storage +* Requires private key +* Downloads latest snapshot metadata +* Deletes orphaned blobs + +**fetch**: Extract single file from backup +* Retrieves specific file without full restore +* Supports extracting to different filename + +**verify**: Validate backup integrity +* Checks metadata hash +* Verifies all referenced blobs exist +* Validates chunk integrity --- +## architecture + +### chunking + +* Content-defined chunking using rolling hash (Rabin fingerprint) +* Average chunk size: 10MB (configurable) +* Deduplication at chunk level +* Multiple chunks packed into blobs for efficiency + +### encryption + +* Asymmetric encryption using age (X25519 + XChaCha20-Poly1305) +* Only public key needed on source host +* Each blob encrypted independently +* Metadata databases also encrypted + +### storage + +* Content-addressed blob storage +* Immutable append-only design +* Two-level directory sharding for blobs (aa/bb/hash) +* Compressed with zstd before encryption + +### state tracking + +* Local SQLite database for incremental state +* Tracks file mtimes and chunk mappings +* Enables efficient change detection +* Supports inotify monitoring in daemon mode + ## does not * Store any secrets on the backed-up machine @@ -141,6 +201,33 @@ The entire system is restore-only from object storage. --- +## features + +### daemon mode + +* Continuous background operation +* inotify-based change detection +* Respects `backup_interval` and `min_time_between_run` +* Full scan every `full_scan_interval` (default 24h) + +### cron mode + +* Single backup run +* Silent output unless errors +* Ideal for scheduled backups + +### metadata integrity + +* SHA256 hash of metadata stored separately +* Encrypted hash file for verification +* Chunked metadata support for large filesystems + +### exclusion patterns + +* Glob-based file exclusion +* Configured in YAML +* Applied during directory walk + ## prune Run `vaultik prune` on a machine with the private key. It: @@ -160,6 +247,30 @@ WTFPL — see LICENSE. --- +## security considerations + +* Source host compromise cannot decrypt backups +* No replay attacks possible (append-only) +* Each blob independently encrypted +* Metadata tampering detectable via hash verification +* S3 credentials only allow write access to backup prefix + +## performance + +* Streaming processing (no temp files) +* Parallel blob uploads +* Deduplication reduces storage and bandwidth +* Local index enables fast incremental detection +* Configurable compression levels + +## requirements + +* Go 1.24.4 or later +* S3-compatible object storage +* age command-line tool (for key generation) +* SQLite3 +* Sufficient disk space for local index + ## author sneak