Document complete vaultik architecture and implementation plan

- Expand README with full CLI documentation, architecture details, and features - Add comprehensive 87-step implementation plan to DESIGN.md - Document all commands, configuration options, and security considerations - Define complete API signatures and data structures
2025-07-20 09:04:31 +02:00 · 2025-07-20 09:04:31 +02:00 · 0df07790ba
commit 0df07790ba
parent 67319a4699
2 changed files with 229 additions and 3 deletions
--- a/DESIGN.md
+++ b/DESIGN.md
@ -359,4 +359,119 @@ func RunPrune(bucket, prefix, privateKey string) error
 ## Implementation TODO
-To be completed by claude
+### Phase 1: Core Infrastructure
 1. Set up Go module and project structure
 2. Create Makefile with test, fmt, and lint targets
 3. Set up cobra CLI skeleton with all commands
 4. Implement config loading and validation from YAML
 5. Create data structures for FileInfo, ChunkInfo, BlobInfo, etc.
 ### Phase 2: Local Index Database
 6. Implement SQLite schema creation and migrations
 7. Create Index type with all database operations
 8. Add transaction support and proper locking
 9. Implement file tracking (save, lookup, delete)
 10. Implement chunk tracking and deduplication
 11. Implement blob tracking and chunk-to-blob mapping
 12. Write tests for all index operations
 ### Phase 3: Chunking and Hashing
 13. Implement Rabin fingerprint chunker
 14. Create streaming chunk processor
 15. Implement SHA256 hashing for chunks
 16. Add configurable chunk size parameters
 17. Write tests for chunking consistency
 ### Phase 4: Compression and Encryption
 18. Implement zstd compression wrapper
 19. Integrate age encryption library
 20. Create Encryptor type for public key encryption
 21. Create Decryptor type for private key decryption
 22. Implement streaming encrypt/decrypt pipelines
 23. Write tests for compression and encryption
 ### Phase 5: Blob Packing
 24. Implement BlobWriter with size limits
 25. Add chunk accumulation and flushing
 26. Create blob hash calculation
 27. Implement proper error handling and rollback
 28. Write tests for blob packing scenarios
 ### Phase 6: S3 Operations
 29. Integrate MinIO client library
 30. Implement S3Client wrapper type
 31. Add multipart upload support for large blobs
 32. Implement retry logic with exponential backoff
 33. Add connection pooling and timeout handling
 34. Write tests using MinIO container
 ### Phase 7: Backup Command - Basic
 35. Implement directory walking with exclusion patterns
 36. Add file change detection using index
 37. Integrate chunking pipeline for changed files
 38. Implement blob upload coordination
 39. Add progress reporting to stderr
 40. Write integration tests for backup
 ### Phase 8: Snapshot Metadata
 41. Implement snapshot metadata extraction from index
 42. Create SQLite snapshot database builder
 43. Add metadata compression and encryption
 44. Implement metadata chunking for large snapshots
 45. Add hash calculation and verification
 46. Implement metadata upload to S3
 47. Write tests for metadata operations
 ### Phase 9: Restore Command
 48. Implement snapshot listing and selection
 49. Add metadata download and reconstruction
 50. Implement hash verification for metadata
 51. Create file restoration logic with chunk retrieval
 52. Add blob caching for efficiency
 53. Implement proper file permissions and mtime restoration
 54. Write integration tests for restore
 ### Phase 10: Prune Command
 55. Implement latest snapshot detection
 56. Add referenced blob extraction from metadata
 57. Create S3 blob listing and comparison
 58. Implement safe deletion of unreferenced blobs
 59. Add dry-run mode for safety
 60. Write tests for prune scenarios
 ### Phase 11: Verify Command
 61. Implement metadata integrity checking
 62. Add blob existence verification
 63. Create optional deep verification mode
 64. Implement detailed error reporting
 65. Write tests for verification
 ### Phase 12: Fetch Command
 66. Implement single-file metadata query
 67. Add minimal blob downloading for file
 68. Create streaming file reconstruction
 69. Add support for output redirection
 70. Write tests for fetch command
 ### Phase 13: Daemon Mode
 71. Implement inotify watcher for Linux
 72. Add dirty path tracking in index
 73. Create periodic full scan scheduler
 74. Implement backup interval enforcement
 75. Add proper signal handling and shutdown
 76. Write tests for daemon behavior
 ### Phase 14: Cron Mode
 77. Implement silent operation mode
 78. Add proper exit codes for cron
 79. Implement lock file to prevent concurrent runs
 80. Add error summary reporting
 81. Write tests for cron mode
 ### Phase 15: Finalization
 82. Add comprehensive logging throughout
 83. Implement proper error wrapping and context
 84. Add performance metrics collection
 85. Create end-to-end integration tests
 86. Write documentation and examples
 87. Set up CI/CD pipeline
--- a/README.md
+++ b/README.md
@ -97,17 +97,77 @@ Existing backup software fails under one or more of these conditions:
 ## cli
 ### commands
 ```sh
-vaultik backup /etc/vaultik.yaml
+vaultik backup /etc/vaultik.yaml [--cron] [--daemon]
 vaultik restore <bucket> <prefix> <snapshot_id> <target_dir>
 vaultik prune <bucket> <prefix>
 vaultik fetch <bucket> <prefix> <snapshot_id> <filepath> <target_fileordir>
 vaultik verify <bucket> <prefix> [<snapshot_id>]
 ```
-* `VAULTIK_PRIVATE_KEY` must be available in environment for `restore` and `prune`
+### environment
 * `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
 ### command details
 **backup**: Perform incremental backup of configured directories
 * `--cron`: Silent unless error (for crontab)
 * `--daemon`: Run continuously with inotify monitoring and periodic scans
 **restore**: Restore entire snapshot to target directory
 * Downloads and decrypts metadata
 * Fetches only required blobs
 * Reconstructs directory structure
 **prune**: Remove unreferenced blobs from storage
 * Requires private key
 * Downloads latest snapshot metadata
 * Deletes orphaned blobs
 **fetch**: Extract single file from backup
 * Retrieves specific file without full restore
 * Supports extracting to different filename
 **verify**: Validate backup integrity
 * Checks metadata hash
 * Verifies all referenced blobs exist
 * Validates chunk integrity
 ---
 ## architecture
 ### chunking
 * Content-defined chunking using rolling hash (Rabin fingerprint)
 * Average chunk size: 10MB (configurable)
 * Deduplication at chunk level
 * Multiple chunks packed into blobs for efficiency
 ### encryption
 * Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
 * Only public key needed on source host
 * Each blob encrypted independently
 * Metadata databases also encrypted
 ### storage
 * Content-addressed blob storage
 * Immutable append-only design
 * Two-level directory sharding for blobs (aa/bb/hash)
 * Compressed with zstd before encryption
 ### state tracking
 * Local SQLite database for incremental state
 * Tracks file mtimes and chunk mappings
 * Enables efficient change detection
 * Supports inotify monitoring in daemon mode
 ## does not
 * Store any secrets on the backed-up machine
@ -141,6 +201,33 @@ The entire system is restore-only from object storage.
 ---
 ## features
 ### daemon mode
 * Continuous background operation
 * inotify-based change detection
 * Respects `backup_interval` and `min_time_between_run`
 * Full scan every `full_scan_interval` (default 24h)
 ### cron mode
 * Single backup run
 * Silent output unless errors
 * Ideal for scheduled backups
 ### metadata integrity
 * SHA256 hash of metadata stored separately
 * Encrypted hash file for verification
 * Chunked metadata support for large filesystems
 ### exclusion patterns
 * Glob-based file exclusion
 * Configured in YAML
 * Applied during directory walk
 ## prune
 Run `vaultik prune` on a machine with the private key. It:
@ -160,6 +247,30 @@ WTFPL — see LICENSE.
 ---
 ## security considerations
 * Source host compromise cannot decrypt backups
 * No replay attacks possible (append-only)
 * Each blob independently encrypted
 * Metadata tampering detectable via hash verification
 * S3 credentials only allow write access to backup prefix
 ## performance
 * Streaming processing (no temp files)
 * Parallel blob uploads
 * Deduplication reduces storage and bandwidth
 * Local index enables fast incremental detection
 * Configurable compression levels
 ## requirements
 * Go 1.24.4 or later
 * S3-compatible object storage
 * age command-line tool (for key generation)
 * SQLite3
 * Sufficient disk space for local index
 ## author
 sneak