sneak 86b533d6ee Refactor blob storage to use UUID primary keys and implement streaming chunking

- Changed blob table to use ID (UUID) as primary key instead of hash
- Blob records are now created at packing start, enabling immediate chunk associations
- Implemented streaming chunking to process large files without memory exhaustion
- Fixed blob manifest generation to include all referenced blobs
- Updated all foreign key references from blob_hash to blob_id
- Added progress reporting and improved error handling
- Enforced encryption requirement for all blob packing
- Updated tests to use test encryption keys
- Added Cyrillic transliteration to README

2025-07-22 07:43:39 +02:00

5.8 KiB

Raw Permalink Blame History

Implementation TODO

Proposed: Store and Snapshot Commands

Overview

Reorganize commands to provide better visibility into stored data and snapshots.

Command Structure

`vaultik store` - Storage information commands

vaultik store info
- Lists S3 bucket configuration
- Shows total number of snapshots (from metadata/ listing)
- Shows total number of blobs (from blobs/ listing)
- Shows total size of all blobs
- No decryption required - uses S3 listing only

`vaultik snapshot` - Snapshot management commands

vaultik snapshot create [path]
- Renamed from vaultik backup
- Same functionality as current backup command
vaultik snapshot list [--json]
- Lists all snapshots with:
  - Snapshot ID
  - Creation timestamp (parsed from snapshot ID)
  - Compressed size (sum of referenced blob sizes from manifest)
- No decryption required - uses blob manifests only
- --json flag outputs in JSON format instead of table
vaultik snapshot purge
- Requires one of:
  - --keep-latest - keeps only the most recent snapshot
  - --older-than <duration> - removes snapshots older than duration (e.g., "30d", "6m", "1y")
- Removes snapshot metadata and runs pruning to clean up unreferenced blobs
- Shows what would be deleted and requires confirmation
vaultik snapshot verify [--deep] <snapshot-id>
- Basic mode: Verifies all blobs referenced in manifest exist in S3
- --deep mode: Downloads each blob and verifies its hash matches the stored hash
- Stub implementation for now

Implementation Notes

No Decryption Required: All commands work with unencrypted blob manifests
Blob Manifests: Located at metadata/{snapshot-id}/manifest.json.zst
S3 Operations: Use S3 ListObjects to enumerate snapshots and blobs
Size Calculations: Sum blob sizes from S3 object metadata
Timestamp Parsing: Extract from snapshot ID format (e.g., 2024-01-15-143052-hostname)
S3 Metadata: Only used for snapshot verify command

Benefits

Users can see storage usage without decryption keys
Snapshot management doesn't require access to encrypted metadata
Clean separation between storage info and snapshot operations

Chunking and Hashing

~~Implement content-defined chunking~~ (done with FastCDC)
~~Create streaming chunk processor~~ (done in chunker)
~~Implement SHA256 hashing for chunks~~ (done in scanner)
~~Add configurable chunk size parameters~~ (done in scanner)
~~Write tests for chunking consistency~~ (done)

Compression and Encryption

~~Implement compression~~ (done with zlib in blob packer)
~~Integrate age encryption library~~ (done in crypto package)
~~Create Encryptor type for public key encryption~~ (done)
~~Implement streaming encrypt/decrypt pipelines~~ (done in packer)
~~Write tests for compression and encryption~~ (done)

Blob Packing

~~Implement BlobWriter with size limits~~ (done in packer)
~~Add chunk accumulation and flushing~~ (done)
~~Create blob hash calculation~~ (done)
~~Implement proper error handling and rollback~~ (done with transactions)
~~Write tests for blob packing scenarios~~ (done)

S3 Operations

~~Integrate MinIO client library~~ (done in s3 package)
~~Implement S3Client wrapper type~~ (done)
~~Add multipart upload support for large blobs~~ (done - using standard upload)
~~Implement retry logic~~ (handled by MinIO client)
~~Write tests using MinIO container~~ (done with testcontainers)

Backup Command - Basic

~~Implement directory walking with exclusion patterns~~ (done with afero)
Add file change detection using index
~~Integrate chunking pipeline for changed files~~ (done in scanner)
Implement blob upload coordination to S3
Add progress reporting to stderr
Write integration tests for backup

Snapshot Metadata

Implement snapshot metadata extraction from index
Create SQLite snapshot database builder
Add metadata compression and encryption
Implement metadata chunking for large snapshots
Add hash calculation and verification
Implement metadata upload to S3
Write tests for metadata operations

Restore Command

Implement snapshot listing and selection
Add metadata download and reconstruction
Implement hash verification for metadata
Create file restoration logic with chunk retrieval
Add blob caching for efficiency
Implement proper file permissions and mtime restoration
Write integration tests for restore

Prune Command

Implement latest snapshot detection
Add referenced blob extraction from metadata
Create S3 blob listing and comparison
Implement safe deletion of unreferenced blobs
Add dry-run mode for safety
Write tests for prune scenarios

Verify Command

Implement metadata integrity checking
Add blob existence verification
Implement quick mode (S3 hash checking)
Implement deep mode (download and verify chunks)
Add detailed error reporting
Write tests for verification

Fetch Command

Implement single-file metadata query
Add minimal blob downloading for file
Create streaming file reconstruction
Add support for output redirection
Write tests for fetch command

Daemon Mode

Implement inotify watcher for Linux
Add dirty path tracking in index
Create periodic full scan scheduler
Implement backup interval enforcement
Add proper signal handling and shutdown
Write tests for daemon behavior

Cron Mode

Implement silent operation mode
Add proper exit codes for cron
Implement lock file to prevent concurrent runs
Add error summary reporting
Write tests for cron mode

Finalization

Add comprehensive logging throughout
Implement proper error wrapping and context
Add performance metrics collection
Create end-to-end integration tests
Write documentation and examples
Set up CI/CD pipeline

5.8 KiB Raw Permalink Blame History

Implementation TODO

Proposed: Store and Snapshot Commands

Overview

Command Structure

vaultik store - Storage information commands

vaultik snapshot - Snapshot management commands

Implementation Notes

Benefits

Chunking and Hashing

Compression and Encryption

Blob Packing

S3 Operations

Backup Command - Basic

Snapshot Metadata

Restore Command

Prune Command

Verify Command

Fetch Command

Daemon Mode

Cron Mode

Finalization

5.8 KiB

Raw Permalink Blame History

`vaultik store` - Storage information commands

`vaultik snapshot` - Snapshot management commands