Add custom types, version command, and restore --verify flag

- Add internal/types package with type-safe wrappers for IDs, hashes,
  paths, and credentials (FileID, BlobID, ChunkHash, etc.)
- Implement driver.Valuer and sql.Scanner for UUID-based types
- Add `vaultik version` command showing version, commit, go version
- Add `--verify` flag to restore command that checksums all restored
  files against expected chunk hashes with progress bar
- Remove fetch.go (dead code, functionality in restore)
- Clean up TODO.md, remove completed items
- Update all database and snapshot code to use new custom types
This commit is contained in:
2026-01-14 17:11:52 -08:00
parent 2afd54d693
commit 417b25a5f5
53 changed files with 2330 additions and 1581 deletions

268
TODO.md
View File

@@ -1,177 +1,107 @@
# Implementation TODO
# Vaultik 1.0 TODO
## Proposed: Store and Snapshot Commands
### Overview
Reorganize commands to provide better visibility into stored data and snapshots.
### Command Structure
#### `vaultik store` - Storage information commands
- `vaultik store info`
- Lists S3 bucket configuration
- Shows total number of snapshots (from metadata/ listing)
- Shows total number of blobs (from blobs/ listing)
- Shows total size of all blobs
- **No decryption required** - uses S3 listing only
#### `vaultik snapshot` - Snapshot management commands
- `vaultik snapshot create [path]`
- Renamed from `vaultik backup`
- Same functionality as current backup command
- `vaultik snapshot list [--json]`
- Lists all snapshots with:
- Snapshot ID
- Creation timestamp (parsed from snapshot ID)
- Compressed size (sum of referenced blob sizes from manifest)
- **No decryption required** - uses blob manifests only
- `--json` flag outputs in JSON format instead of table
- `vaultik snapshot purge`
- Requires one of:
- `--keep-latest` - keeps only the most recent snapshot
- `--older-than <duration>` - removes snapshots older than duration (e.g., "30d", "6m", "1y")
- Removes snapshot metadata and runs pruning to clean up unreferenced blobs
- Shows what would be deleted and requires confirmation
- `vaultik snapshot verify [--deep] <snapshot-id>`
- Basic mode: Verifies all blobs referenced in manifest exist in S3
- `--deep` mode: Downloads each blob and verifies its hash matches the stored hash
- **Stub implementation for now**
- `vaultik snapshot remove <snapshot-id>` (alias: `rm`)
- Removes a snapshot and any blobs that become orphaned
- Algorithm:
1. Validate target snapshot exists in storage
2. List all snapshots in storage
3. Download manifests from all OTHER snapshots to build "in-use" blob set
4. Download target snapshot's manifest to get its blob hashes
5. Identify orphaned blobs: target blobs NOT in the in-use set
6. Delete orphaned blobs from storage
7. Delete snapshot metadata using existing `deleteSnapshot()` helper
- Flags:
- `--force` / `-f`: Skip confirmation prompt
- `--dry-run`: Show what would be deleted without deleting
- Files to modify:
- `internal/cli/snapshot.go`: Add `newSnapshotRemoveCommand()`
- `internal/vaultik/snapshot.go`: Add `RemoveSnapshot()` method
- Reuse existing code:
- Snapshot enumeration pattern from `PruneBlobs()` in `prune.go`
- `v.downloadManifest(snapshotID)` for manifest downloading
- Blob path format: `blobs/{hash[:2]}/{hash[2:4]}/{hash}`
- `v.deleteSnapshot(snapshotID)` for metadata deletion
### Implementation Notes
1. **No Decryption Required**: All commands work with unencrypted blob manifests
2. **Blob Manifests**: Located at `metadata/{snapshot-id}/manifest.json.zst`
3. **S3 Operations**: Use S3 ListObjects to enumerate snapshots and blobs
4. **Size Calculations**: Sum blob sizes from S3 object metadata
5. **Timestamp Parsing**: Extract from snapshot ID format (e.g., `2024-01-15-143052-hostname`)
6. **S3 Metadata**: Only used for `snapshot verify` command
### Benefits
- Users can see storage usage without decryption keys
- Snapshot management doesn't require access to encrypted metadata
- Clean separation between storage info and snapshot operations
## Chunking and Hashing
1. ~~Implement content-defined chunking~~ (done with FastCDC)
1. ~~Create streaming chunk processor~~ (done in chunker)
1. ~~Implement SHA256 hashing for chunks~~ (done in scanner)
1. ~~Add configurable chunk size parameters~~ (done in scanner)
1. ~~Write tests for chunking consistency~~ (done)
## Compression and Encryption
1. ~~Implement compression~~ (done with zlib in blob packer)
1. ~~Integrate age encryption library~~ (done in crypto package)
1. ~~Create Encryptor type for public key encryption~~ (done)
1. ~~Implement streaming encrypt/decrypt pipelines~~ (done in packer)
1. ~~Write tests for compression and encryption~~ (done)
## Blob Packing
1. ~~Implement BlobWriter with size limits~~ (done in packer)
1. ~~Add chunk accumulation and flushing~~ (done)
1. ~~Create blob hash calculation~~ (done)
1. ~~Implement proper error handling and rollback~~ (done with transactions)
1. ~~Write tests for blob packing scenarios~~ (done)
## S3 Operations
1. ~~Integrate MinIO client library~~ (done in s3 package)
1. ~~Implement S3Client wrapper type~~ (done)
1. ~~Add multipart upload support for large blobs~~ (done - using standard upload)
1. ~~Implement retry logic~~ (handled by MinIO client)
1. ~~Write tests using MinIO container~~ (done with testcontainers)
## Backup Command - Basic
1. ~~Implement directory walking with exclusion patterns~~ (done with afero)
1. Add file change detection using index
1. ~~Integrate chunking pipeline for changed files~~ (done in scanner)
1. Implement blob upload coordination to S3
1. Add progress reporting to stderr
1. Write integration tests for backup
## Snapshot Metadata
1. Implement snapshot metadata extraction from index
1. Create SQLite snapshot database builder
1. Add metadata compression and encryption
1. Implement metadata chunking for large snapshots
1. Add hash calculation and verification
1. Implement metadata upload to S3
1. Write tests for metadata operations
Linear list of tasks to complete before 1.0 release.
## Restore Command
1. Implement snapshot listing and selection
1. Add metadata download and reconstruction
1. Implement hash verification for metadata
1. Create file restoration logic with chunk retrieval
1. Add blob caching for efficiency
1. Implement proper file permissions and mtime restoration
1. Write integration tests for restore
## Prune Command
1. Implement latest snapshot detection
1. Add referenced blob extraction from metadata
1. Create S3 blob listing and comparison
1. Implement safe deletion of unreferenced blobs
1. Add dry-run mode for safety
1. Write tests for prune scenarios
## Verify Command
1. Implement metadata integrity checking
1. Add blob existence verification
1. Implement quick mode (S3 hash checking)
1. Implement deep mode (download and verify chunks)
1. Add detailed error reporting
1. Write tests for verification
## Fetch Command
1. Implement single-file metadata query
1. Add minimal blob downloading for file
1. Create streaming file reconstruction
1. Add support for output redirection
1. Write tests for fetch command
1. Write integration tests for restore command
## Daemon Mode
1. Implement inotify watcher for Linux
1. Add dirty path tracking in index
1. Create periodic full scan scheduler
1. Implement backup interval enforcement
1. Add proper signal handling and shutdown
1. Write tests for daemon behavior
## Cron Mode
1. Implement silent operation mode
1. Add proper exit codes for cron
1. Implement lock file to prevent concurrent runs
1. Add error summary reporting
1. Write tests for cron mode
1. Implement inotify file watcher for Linux
- Watch source directories for changes
- Track dirty paths in memory
## Finalization
1. Add comprehensive logging throughout
1. Implement proper error wrapping and context
1. Add performance metrics collection
1. Create end-to-end integration tests
1. Write documentation and examples
1. Set up CI/CD pipeline
1. Implement FSEvents watcher for macOS
- Watch source directories for changes
- Track dirty paths in memory
1. Implement backup scheduler in daemon mode
- Respect backup_interval config
- Trigger backup when dirty paths exist and interval elapsed
- Implement full_scan_interval for periodic full scans
1. Add proper signal handling for daemon
- Graceful shutdown on SIGTERM/SIGINT
- Complete in-progress backup before exit
1. Write tests for daemon mode
## CLI Polish
1. Add `--quiet` flag to all commands
- Suppress non-error output
- Useful for scripting
1. Add `--json` output flag to more commands
- `snapshot verify` - output verification results as JSON
- `snapshot remove` - output deletion stats as JSON
- `prune` - output pruning stats as JSON
1. Improve error messages throughout
- Ensure all errors include actionable context
- Add suggestions for common issues
## Testing
1. Write end-to-end integration test
- Create backup
- Verify backup
- Restore backup
- Compare restored files to originals
1. Add tests for edge cases
- Empty directories
- Symlinks
- Special characters in filenames
- Very large files (multi-GB)
- Many small files (100k+)
1. Add tests for error conditions
- Network failures during upload
- Disk full during restore
- Corrupted blobs
- Missing blobs
## Documentation
1. Add man page or --help improvements
- Detailed help for each command
- Examples in help output
## Performance
1. Profile and optimize restore performance
- Parallel blob downloads
- Streaming decompression/decryption
- Efficient chunk reassembly
1. Add bandwidth limiting option
- `--bwlimit` flag for upload/download speed limiting
## Security
1. Audit encryption implementation
- Verify age encryption is used correctly
- Ensure no plaintext leaks in logs or errors
1. Add config file permission check
- Warn if config file is world-readable (contains secrets)
1. Secure memory handling for secrets
- Clear age_secret_key from memory after use
## Final Polish
1. Ensure version is set correctly in releases
1. Create release process
- Binary releases for supported platforms
- Checksums for binaries
- Release notes template
1. Final code review
- Remove debug statements
- Ensure consistent code style
1. Tag and release v1.0.0