- Implement exclude patterns with anchored pattern support: - Patterns starting with / only match from root of source dir - Unanchored patterns match anywhere in path - Support for glob patterns (*.log, .*, **/*.pack) - Directory patterns skip entire subtrees - Add gobwas/glob dependency for pattern matching - Add 16 comprehensive tests for exclude functionality - Add snapshot prune command to clean orphaned data: - Removes incomplete snapshots from database - Cleans orphaned files, chunks, and blobs - Runs automatically at backup start for consistency - Add snapshot remove command for deleting snapshots - Add VAULTIK_AGE_SECRET_KEY environment variable support - Fix duplicate fx module provider in restore command - Change snapshot ID format to hostname_YYYY-MM-DDTHH:MM:SSZ
177 lines
6.9 KiB
Markdown
177 lines
6.9 KiB
Markdown
# Implementation TODO
|
|
|
|
## Proposed: Store and Snapshot Commands
|
|
|
|
### Overview
|
|
Reorganize commands to provide better visibility into stored data and snapshots.
|
|
|
|
### Command Structure
|
|
|
|
#### `vaultik store` - Storage information commands
|
|
- `vaultik store info`
|
|
- Lists S3 bucket configuration
|
|
- Shows total number of snapshots (from metadata/ listing)
|
|
- Shows total number of blobs (from blobs/ listing)
|
|
- Shows total size of all blobs
|
|
- **No decryption required** - uses S3 listing only
|
|
|
|
#### `vaultik snapshot` - Snapshot management commands
|
|
- `vaultik snapshot create [path]`
|
|
- Renamed from `vaultik backup`
|
|
- Same functionality as current backup command
|
|
|
|
- `vaultik snapshot list [--json]`
|
|
- Lists all snapshots with:
|
|
- Snapshot ID
|
|
- Creation timestamp (parsed from snapshot ID)
|
|
- Compressed size (sum of referenced blob sizes from manifest)
|
|
- **No decryption required** - uses blob manifests only
|
|
- `--json` flag outputs in JSON format instead of table
|
|
|
|
- `vaultik snapshot purge`
|
|
- Requires one of:
|
|
- `--keep-latest` - keeps only the most recent snapshot
|
|
- `--older-than <duration>` - removes snapshots older than duration (e.g., "30d", "6m", "1y")
|
|
- Removes snapshot metadata and runs pruning to clean up unreferenced blobs
|
|
- Shows what would be deleted and requires confirmation
|
|
|
|
- `vaultik snapshot verify [--deep] <snapshot-id>`
|
|
- Basic mode: Verifies all blobs referenced in manifest exist in S3
|
|
- `--deep` mode: Downloads each blob and verifies its hash matches the stored hash
|
|
- **Stub implementation for now**
|
|
|
|
- `vaultik snapshot remove <snapshot-id>` (alias: `rm`)
|
|
- Removes a snapshot and any blobs that become orphaned
|
|
- Algorithm:
|
|
1. Validate target snapshot exists in storage
|
|
2. List all snapshots in storage
|
|
3. Download manifests from all OTHER snapshots to build "in-use" blob set
|
|
4. Download target snapshot's manifest to get its blob hashes
|
|
5. Identify orphaned blobs: target blobs NOT in the in-use set
|
|
6. Delete orphaned blobs from storage
|
|
7. Delete snapshot metadata using existing `deleteSnapshot()` helper
|
|
- Flags:
|
|
- `--force` / `-f`: Skip confirmation prompt
|
|
- `--dry-run`: Show what would be deleted without deleting
|
|
- Files to modify:
|
|
- `internal/cli/snapshot.go`: Add `newSnapshotRemoveCommand()`
|
|
- `internal/vaultik/snapshot.go`: Add `RemoveSnapshot()` method
|
|
- Reuse existing code:
|
|
- Snapshot enumeration pattern from `PruneBlobs()` in `prune.go`
|
|
- `v.downloadManifest(snapshotID)` for manifest downloading
|
|
- Blob path format: `blobs/{hash[:2]}/{hash[2:4]}/{hash}`
|
|
- `v.deleteSnapshot(snapshotID)` for metadata deletion
|
|
|
|
### Implementation Notes
|
|
|
|
1. **No Decryption Required**: All commands work with unencrypted blob manifests
|
|
2. **Blob Manifests**: Located at `metadata/{snapshot-id}/manifest.json.zst`
|
|
3. **S3 Operations**: Use S3 ListObjects to enumerate snapshots and blobs
|
|
4. **Size Calculations**: Sum blob sizes from S3 object metadata
|
|
5. **Timestamp Parsing**: Extract from snapshot ID format (e.g., `2024-01-15-143052-hostname`)
|
|
6. **S3 Metadata**: Only used for `snapshot verify` command
|
|
|
|
### Benefits
|
|
- Users can see storage usage without decryption keys
|
|
- Snapshot management doesn't require access to encrypted metadata
|
|
- Clean separation between storage info and snapshot operations
|
|
|
|
## Chunking and Hashing
|
|
1. ~~Implement content-defined chunking~~ (done with FastCDC)
|
|
1. ~~Create streaming chunk processor~~ (done in chunker)
|
|
1. ~~Implement SHA256 hashing for chunks~~ (done in scanner)
|
|
1. ~~Add configurable chunk size parameters~~ (done in scanner)
|
|
1. ~~Write tests for chunking consistency~~ (done)
|
|
|
|
## Compression and Encryption
|
|
1. ~~Implement compression~~ (done with zlib in blob packer)
|
|
1. ~~Integrate age encryption library~~ (done in crypto package)
|
|
1. ~~Create Encryptor type for public key encryption~~ (done)
|
|
1. ~~Implement streaming encrypt/decrypt pipelines~~ (done in packer)
|
|
1. ~~Write tests for compression and encryption~~ (done)
|
|
|
|
## Blob Packing
|
|
1. ~~Implement BlobWriter with size limits~~ (done in packer)
|
|
1. ~~Add chunk accumulation and flushing~~ (done)
|
|
1. ~~Create blob hash calculation~~ (done)
|
|
1. ~~Implement proper error handling and rollback~~ (done with transactions)
|
|
1. ~~Write tests for blob packing scenarios~~ (done)
|
|
|
|
## S3 Operations
|
|
1. ~~Integrate MinIO client library~~ (done in s3 package)
|
|
1. ~~Implement S3Client wrapper type~~ (done)
|
|
1. ~~Add multipart upload support for large blobs~~ (done - using standard upload)
|
|
1. ~~Implement retry logic~~ (handled by MinIO client)
|
|
1. ~~Write tests using MinIO container~~ (done with testcontainers)
|
|
|
|
## Backup Command - Basic
|
|
1. ~~Implement directory walking with exclusion patterns~~ (done with afero)
|
|
1. Add file change detection using index
|
|
1. ~~Integrate chunking pipeline for changed files~~ (done in scanner)
|
|
1. Implement blob upload coordination to S3
|
|
1. Add progress reporting to stderr
|
|
1. Write integration tests for backup
|
|
|
|
## Snapshot Metadata
|
|
1. Implement snapshot metadata extraction from index
|
|
1. Create SQLite snapshot database builder
|
|
1. Add metadata compression and encryption
|
|
1. Implement metadata chunking for large snapshots
|
|
1. Add hash calculation and verification
|
|
1. Implement metadata upload to S3
|
|
1. Write tests for metadata operations
|
|
|
|
## Restore Command
|
|
1. Implement snapshot listing and selection
|
|
1. Add metadata download and reconstruction
|
|
1. Implement hash verification for metadata
|
|
1. Create file restoration logic with chunk retrieval
|
|
1. Add blob caching for efficiency
|
|
1. Implement proper file permissions and mtime restoration
|
|
1. Write integration tests for restore
|
|
|
|
## Prune Command
|
|
1. Implement latest snapshot detection
|
|
1. Add referenced blob extraction from metadata
|
|
1. Create S3 blob listing and comparison
|
|
1. Implement safe deletion of unreferenced blobs
|
|
1. Add dry-run mode for safety
|
|
1. Write tests for prune scenarios
|
|
|
|
## Verify Command
|
|
1. Implement metadata integrity checking
|
|
1. Add blob existence verification
|
|
1. Implement quick mode (S3 hash checking)
|
|
1. Implement deep mode (download and verify chunks)
|
|
1. Add detailed error reporting
|
|
1. Write tests for verification
|
|
|
|
## Fetch Command
|
|
1. Implement single-file metadata query
|
|
1. Add minimal blob downloading for file
|
|
1. Create streaming file reconstruction
|
|
1. Add support for output redirection
|
|
1. Write tests for fetch command
|
|
|
|
## Daemon Mode
|
|
1. Implement inotify watcher for Linux
|
|
1. Add dirty path tracking in index
|
|
1. Create periodic full scan scheduler
|
|
1. Implement backup interval enforcement
|
|
1. Add proper signal handling and shutdown
|
|
1. Write tests for daemon behavior
|
|
|
|
## Cron Mode
|
|
1. Implement silent operation mode
|
|
1. Add proper exit codes for cron
|
|
1. Implement lock file to prevent concurrent runs
|
|
1. Add error summary reporting
|
|
1. Write tests for cron mode
|
|
|
|
## Finalization
|
|
1. Add comprehensive logging throughout
|
|
1. Implement proper error wrapping and context
|
|
1. Add performance metrics collection
|
|
1. Create end-to-end integration tests
|
|
1. Write documentation and examples
|
|
1. Set up CI/CD pipeline |