- Changed blob table to use ID (UUID) as primary key instead of hash - Blob records are now created at packing start, enabling immediate chunk associations - Implemented streaming chunking to process large files without memory exhaustion - Fixed blob manifest generation to include all referenced blobs - Updated all foreign key references from blob_hash to blob_id - Added progress reporting and improved error handling - Enforced encryption requirement for all blob packing - Updated tests to use test encryption keys - Added Cyrillic transliteration to README
5.8 KiB
5.8 KiB
Implementation TODO
Proposed: Store and Snapshot Commands
Overview
Reorganize commands to provide better visibility into stored data and snapshots.
Command Structure
vaultik store
- Storage information commands
vaultik store info
- Lists S3 bucket configuration
- Shows total number of snapshots (from metadata/ listing)
- Shows total number of blobs (from blobs/ listing)
- Shows total size of all blobs
- No decryption required - uses S3 listing only
vaultik snapshot
- Snapshot management commands
-
vaultik snapshot create [path]
- Renamed from
vaultik backup
- Same functionality as current backup command
- Renamed from
-
vaultik snapshot list [--json]
- Lists all snapshots with:
- Snapshot ID
- Creation timestamp (parsed from snapshot ID)
- Compressed size (sum of referenced blob sizes from manifest)
- No decryption required - uses blob manifests only
--json
flag outputs in JSON format instead of table
- Lists all snapshots with:
-
vaultik snapshot purge
- Requires one of:
--keep-latest
- keeps only the most recent snapshot--older-than <duration>
- removes snapshots older than duration (e.g., "30d", "6m", "1y")
- Removes snapshot metadata and runs pruning to clean up unreferenced blobs
- Shows what would be deleted and requires confirmation
- Requires one of:
-
vaultik snapshot verify [--deep] <snapshot-id>
- Basic mode: Verifies all blobs referenced in manifest exist in S3
--deep
mode: Downloads each blob and verifies its hash matches the stored hash- Stub implementation for now
Implementation Notes
- No Decryption Required: All commands work with unencrypted blob manifests
- Blob Manifests: Located at
metadata/{snapshot-id}/manifest.json.zst
- S3 Operations: Use S3 ListObjects to enumerate snapshots and blobs
- Size Calculations: Sum blob sizes from S3 object metadata
- Timestamp Parsing: Extract from snapshot ID format (e.g.,
2024-01-15-143052-hostname
) - S3 Metadata: Only used for
snapshot verify
command
Benefits
- Users can see storage usage without decryption keys
- Snapshot management doesn't require access to encrypted metadata
- Clean separation between storage info and snapshot operations
Chunking and Hashing
Implement content-defined chunking(done with FastCDC)Create streaming chunk processor(done in chunker)Implement SHA256 hashing for chunks(done in scanner)Add configurable chunk size parameters(done in scanner)Write tests for chunking consistency(done)
Compression and Encryption
Implement compression(done with zlib in blob packer)Integrate age encryption library(done in crypto package)Create Encryptor type for public key encryption(done)Implement streaming encrypt/decrypt pipelines(done in packer)Write tests for compression and encryption(done)
Blob Packing
Implement BlobWriter with size limits(done in packer)Add chunk accumulation and flushing(done)Create blob hash calculation(done)Implement proper error handling and rollback(done with transactions)Write tests for blob packing scenarios(done)
S3 Operations
Integrate MinIO client library(done in s3 package)Implement S3Client wrapper type(done)Add multipart upload support for large blobs(done - using standard upload)Implement retry logic(handled by MinIO client)Write tests using MinIO container(done with testcontainers)
Backup Command - Basic
Implement directory walking with exclusion patterns(done with afero)- Add file change detection using index
Integrate chunking pipeline for changed files(done in scanner)- Implement blob upload coordination to S3
- Add progress reporting to stderr
- Write integration tests for backup
Snapshot Metadata
- Implement snapshot metadata extraction from index
- Create SQLite snapshot database builder
- Add metadata compression and encryption
- Implement metadata chunking for large snapshots
- Add hash calculation and verification
- Implement metadata upload to S3
- Write tests for metadata operations
Restore Command
- Implement snapshot listing and selection
- Add metadata download and reconstruction
- Implement hash verification for metadata
- Create file restoration logic with chunk retrieval
- Add blob caching for efficiency
- Implement proper file permissions and mtime restoration
- Write integration tests for restore
Prune Command
- Implement latest snapshot detection
- Add referenced blob extraction from metadata
- Create S3 blob listing and comparison
- Implement safe deletion of unreferenced blobs
- Add dry-run mode for safety
- Write tests for prune scenarios
Verify Command
- Implement metadata integrity checking
- Add blob existence verification
- Implement quick mode (S3 hash checking)
- Implement deep mode (download and verify chunks)
- Add detailed error reporting
- Write tests for verification
Fetch Command
- Implement single-file metadata query
- Add minimal blob downloading for file
- Create streaming file reconstruction
- Add support for output redirection
- Write tests for fetch command
Daemon Mode
- Implement inotify watcher for Linux
- Add dirty path tracking in index
- Create periodic full scan scheduler
- Implement backup interval enforcement
- Add proper signal handling and shutdown
- Write tests for daemon behavior
Cron Mode
- Implement silent operation mode
- Add proper exit codes for cron
- Implement lock file to prevent concurrent runs
- Add error summary reporting
- Write tests for cron mode
Finalization
- Add comprehensive logging throughout
- Implement proper error wrapping and context
- Add performance metrics collection
- Create end-to-end integration tests
- Write documentation and examples
- Set up CI/CD pipeline