Refactor blob storage to use UUID primary keys and implement streaming chunking

- Changed blob table to use ID (UUID) as primary key instead of hash - Blob records are now created at packing start, enabling immediate chunk associations - Implemented streaming chunking to process large files without memory exhaustion - Fixed blob manifest generation to include all referenced blobs - Updated all foreign key references from blob_hash to blob_id - Added progress reporting and improved error handling - Enforced encryption requirement for all blob packing - Updated tests to use test encryption keys - Added Cyrillic transliteration to README
2025-07-22 07:43:39 +02:00
parent 26db096913
commit 86b533d6ee
49 changed files with 5709 additions and 324 deletions
--- a/TODO.md
+++ b/TODO.md
@@ -1,40 +1,92 @@
 # Implementation TODO

+## Proposed: Store and Snapshot Commands
+
+### Overview
+Reorganize commands to provide better visibility into stored data and snapshots.
+
+### Command Structure
+
+#### `vaultik store` - Storage information commands
+- `vaultik store info`
+  - Lists S3 bucket configuration
+  - Shows total number of snapshots (from metadata/ listing)
+  - Shows total number of blobs (from blobs/ listing)
+  - Shows total size of all blobs
+  - **No decryption required** - uses S3 listing only
+
+#### `vaultik snapshot` - Snapshot management commands  
+- `vaultik snapshot create [path]`
+  - Renamed from `vaultik backup`
+  - Same functionality as current backup command
+  
+- `vaultik snapshot list [--json]`
+  - Lists all snapshots with:
+    - Snapshot ID
+    - Creation timestamp (parsed from snapshot ID)
+    - Compressed size (sum of referenced blob sizes from manifest)
+  - **No decryption required** - uses blob manifests only
+  - `--json` flag outputs in JSON format instead of table
+  
+- `vaultik snapshot purge`
+  - Requires one of:
+    - `--keep-latest` - keeps only the most recent snapshot
+    - `--older-than <duration>` - removes snapshots older than duration (e.g., "30d", "6m", "1y")
+  - Removes snapshot metadata and runs pruning to clean up unreferenced blobs
+  - Shows what would be deleted and requires confirmation
+
+- `vaultik snapshot verify [--deep] <snapshot-id>`
+  - Basic mode: Verifies all blobs referenced in manifest exist in S3
+  - `--deep` mode: Downloads each blob and verifies its hash matches the stored hash
+  - **Stub implementation for now**
+
+### Implementation Notes
+
+1. **No Decryption Required**: All commands work with unencrypted blob manifests
+2. **Blob Manifests**: Located at `metadata/{snapshot-id}/manifest.json.zst`
+3. **S3 Operations**: Use S3 ListObjects to enumerate snapshots and blobs
+4. **Size Calculations**: Sum blob sizes from S3 object metadata
+5. **Timestamp Parsing**: Extract from snapshot ID format (e.g., `2024-01-15-143052-hostname`)
+6. **S3 Metadata**: Only used for `snapshot verify` command
+
+### Benefits
+- Users can see storage usage without decryption keys
+- Snapshot management doesn't require access to encrypted metadata
+- Clean separation between storage info and snapshot operations
+
 ## Chunking and Hashing
-1. Implement Rabin fingerprint chunker
-1. Create streaming chunk processor  
+1. ~~Implement content-defined chunking~~ (done with FastCDC)
+1. ~~Create streaming chunk processor~~ (done in chunker)
 1. ~~Implement SHA256 hashing for chunks~~ (done in scanner)
 1. ~~Add configurable chunk size parameters~~ (done in scanner)
-1. Write tests for chunking consistency
+1. ~~Write tests for chunking consistency~~ (done)

 ## Compression and Encryption
-1. Implement zstd compression wrapper
-1. Integrate age encryption library
-1. Create Encryptor type for public key encryption
-1. Create Decryptor type for private key decryption
-1. Implement streaming encrypt/decrypt pipelines
-1. Write tests for compression and encryption
+1. ~~Implement compression~~ (done with zlib in blob packer)
+1. ~~Integrate age encryption library~~ (done in crypto package)
+1. ~~Create Encryptor type for public key encryption~~ (done)
+1. ~~Implement streaming encrypt/decrypt pipelines~~ (done in packer)
+1. ~~Write tests for compression and encryption~~ (done)

 ## Blob Packing
-1. Implement BlobWriter with size limits
-1. Add chunk accumulation and flushing
-1. Create blob hash calculation
-1. Implement proper error handling and rollback
-1. Write tests for blob packing scenarios
+1. ~~Implement BlobWriter with size limits~~ (done in packer)
+1. ~~Add chunk accumulation and flushing~~ (done)
+1. ~~Create blob hash calculation~~ (done)
+1. ~~Implement proper error handling and rollback~~ (done with transactions)
+1. ~~Write tests for blob packing scenarios~~ (done)

 ## S3 Operations
-1. Integrate MinIO client library
-1. Implement S3Client wrapper type
-1. Add multipart upload support for large blobs
-1. Implement retry logic with exponential backoff
-1. Add connection pooling and timeout handling
-1. Write tests using MinIO container
+1. ~~Integrate MinIO client library~~ (done in s3 package)
+1. ~~Implement S3Client wrapper type~~ (done)
+1. ~~Add multipart upload support for large blobs~~ (done - using standard upload)
+1. ~~Implement retry logic~~ (handled by MinIO client)
+1. ~~Write tests using MinIO container~~ (done with testcontainers)

 ## Backup Command - Basic
 1. ~~Implement directory walking with exclusion patterns~~ (done with afero)
 1. Add file change detection using index
 1. ~~Integrate chunking pipeline for changed files~~ (done in scanner)
-1. Implement blob upload coordination
+1. Implement blob upload coordination to S3
 1. Add progress reporting to stderr
 1. Write integration tests for backup