- Changed blob table to use ID (UUID) as primary key instead of hash - Blob records are now created at packing start, enabling immediate chunk associations - Implemented streaming chunking to process large files without memory exhaustion - Fixed blob manifest generation to include all referenced blobs - Updated all foreign key references from blob_hash to blob_id - Added progress reporting and improved error handling - Enforced encryption requirement for all blob packing - Updated tests to use test encryption keys - Added Cyrillic transliteration to README
155 lines
5.8 KiB
Markdown
155 lines
5.8 KiB
Markdown
# Implementation TODO
|
|
|
|
## Proposed: Store and Snapshot Commands
|
|
|
|
### Overview
|
|
Reorganize commands to provide better visibility into stored data and snapshots.
|
|
|
|
### Command Structure
|
|
|
|
#### `vaultik store` - Storage information commands
|
|
- `vaultik store info`
|
|
- Lists S3 bucket configuration
|
|
- Shows total number of snapshots (from metadata/ listing)
|
|
- Shows total number of blobs (from blobs/ listing)
|
|
- Shows total size of all blobs
|
|
- **No decryption required** - uses S3 listing only
|
|
|
|
#### `vaultik snapshot` - Snapshot management commands
|
|
- `vaultik snapshot create [path]`
|
|
- Renamed from `vaultik backup`
|
|
- Same functionality as current backup command
|
|
|
|
- `vaultik snapshot list [--json]`
|
|
- Lists all snapshots with:
|
|
- Snapshot ID
|
|
- Creation timestamp (parsed from snapshot ID)
|
|
- Compressed size (sum of referenced blob sizes from manifest)
|
|
- **No decryption required** - uses blob manifests only
|
|
- `--json` flag outputs in JSON format instead of table
|
|
|
|
- `vaultik snapshot purge`
|
|
- Requires one of:
|
|
- `--keep-latest` - keeps only the most recent snapshot
|
|
- `--older-than <duration>` - removes snapshots older than duration (e.g., "30d", "6m", "1y")
|
|
- Removes snapshot metadata and runs pruning to clean up unreferenced blobs
|
|
- Shows what would be deleted and requires confirmation
|
|
|
|
- `vaultik snapshot verify [--deep] <snapshot-id>`
|
|
- Basic mode: Verifies all blobs referenced in manifest exist in S3
|
|
- `--deep` mode: Downloads each blob and verifies its hash matches the stored hash
|
|
- **Stub implementation for now**
|
|
|
|
### Implementation Notes
|
|
|
|
1. **No Decryption Required**: All commands work with unencrypted blob manifests
|
|
2. **Blob Manifests**: Located at `metadata/{snapshot-id}/manifest.json.zst`
|
|
3. **S3 Operations**: Use S3 ListObjects to enumerate snapshots and blobs
|
|
4. **Size Calculations**: Sum blob sizes from S3 object metadata
|
|
5. **Timestamp Parsing**: Extract from snapshot ID format (e.g., `2024-01-15-143052-hostname`)
|
|
6. **S3 Metadata**: Only used for `snapshot verify` command
|
|
|
|
### Benefits
|
|
- Users can see storage usage without decryption keys
|
|
- Snapshot management doesn't require access to encrypted metadata
|
|
- Clean separation between storage info and snapshot operations
|
|
|
|
## Chunking and Hashing
|
|
1. ~~Implement content-defined chunking~~ (done with FastCDC)
|
|
1. ~~Create streaming chunk processor~~ (done in chunker)
|
|
1. ~~Implement SHA256 hashing for chunks~~ (done in scanner)
|
|
1. ~~Add configurable chunk size parameters~~ (done in scanner)
|
|
1. ~~Write tests for chunking consistency~~ (done)
|
|
|
|
## Compression and Encryption
|
|
1. ~~Implement compression~~ (done with zlib in blob packer)
|
|
1. ~~Integrate age encryption library~~ (done in crypto package)
|
|
1. ~~Create Encryptor type for public key encryption~~ (done)
|
|
1. ~~Implement streaming encrypt/decrypt pipelines~~ (done in packer)
|
|
1. ~~Write tests for compression and encryption~~ (done)
|
|
|
|
## Blob Packing
|
|
1. ~~Implement BlobWriter with size limits~~ (done in packer)
|
|
1. ~~Add chunk accumulation and flushing~~ (done)
|
|
1. ~~Create blob hash calculation~~ (done)
|
|
1. ~~Implement proper error handling and rollback~~ (done with transactions)
|
|
1. ~~Write tests for blob packing scenarios~~ (done)
|
|
|
|
## S3 Operations
|
|
1. ~~Integrate MinIO client library~~ (done in s3 package)
|
|
1. ~~Implement S3Client wrapper type~~ (done)
|
|
1. ~~Add multipart upload support for large blobs~~ (done - using standard upload)
|
|
1. ~~Implement retry logic~~ (handled by MinIO client)
|
|
1. ~~Write tests using MinIO container~~ (done with testcontainers)
|
|
|
|
## Backup Command - Basic
|
|
1. ~~Implement directory walking with exclusion patterns~~ (done with afero)
|
|
1. Add file change detection using index
|
|
1. ~~Integrate chunking pipeline for changed files~~ (done in scanner)
|
|
1. Implement blob upload coordination to S3
|
|
1. Add progress reporting to stderr
|
|
1. Write integration tests for backup
|
|
|
|
## Snapshot Metadata
|
|
1. Implement snapshot metadata extraction from index
|
|
1. Create SQLite snapshot database builder
|
|
1. Add metadata compression and encryption
|
|
1. Implement metadata chunking for large snapshots
|
|
1. Add hash calculation and verification
|
|
1. Implement metadata upload to S3
|
|
1. Write tests for metadata operations
|
|
|
|
## Restore Command
|
|
1. Implement snapshot listing and selection
|
|
1. Add metadata download and reconstruction
|
|
1. Implement hash verification for metadata
|
|
1. Create file restoration logic with chunk retrieval
|
|
1. Add blob caching for efficiency
|
|
1. Implement proper file permissions and mtime restoration
|
|
1. Write integration tests for restore
|
|
|
|
## Prune Command
|
|
1. Implement latest snapshot detection
|
|
1. Add referenced blob extraction from metadata
|
|
1. Create S3 blob listing and comparison
|
|
1. Implement safe deletion of unreferenced blobs
|
|
1. Add dry-run mode for safety
|
|
1. Write tests for prune scenarios
|
|
|
|
## Verify Command
|
|
1. Implement metadata integrity checking
|
|
1. Add blob existence verification
|
|
1. Implement quick mode (S3 hash checking)
|
|
1. Implement deep mode (download and verify chunks)
|
|
1. Add detailed error reporting
|
|
1. Write tests for verification
|
|
|
|
## Fetch Command
|
|
1. Implement single-file metadata query
|
|
1. Add minimal blob downloading for file
|
|
1. Create streaming file reconstruction
|
|
1. Add support for output redirection
|
|
1. Write tests for fetch command
|
|
|
|
## Daemon Mode
|
|
1. Implement inotify watcher for Linux
|
|
1. Add dirty path tracking in index
|
|
1. Create periodic full scan scheduler
|
|
1. Implement backup interval enforcement
|
|
1. Add proper signal handling and shutdown
|
|
1. Write tests for daemon behavior
|
|
|
|
## Cron Mode
|
|
1. Implement silent operation mode
|
|
1. Add proper exit codes for cron
|
|
1. Implement lock file to prevent concurrent runs
|
|
1. Add error summary reporting
|
|
1. Write tests for cron mode
|
|
|
|
## Finalization
|
|
1. Add comprehensive logging throughout
|
|
1. Implement proper error wrapping and context
|
|
1. Add performance metrics collection
|
|
1. Create end-to-end integration tests
|
|
1. Write documentation and examples
|
|
1. Set up CI/CD pipeline |