vaultik/TODO.md
sneak 86b533d6ee Refactor blob storage to use UUID primary keys and implement streaming chunking
- Changed blob table to use ID (UUID) as primary key instead of hash
- Blob records are now created at packing start, enabling immediate chunk associations
- Implemented streaming chunking to process large files without memory exhaustion
- Fixed blob manifest generation to include all referenced blobs
- Updated all foreign key references from blob_hash to blob_id
- Added progress reporting and improved error handling
- Enforced encryption requirement for all blob packing
- Updated tests to use test encryption keys
- Added Cyrillic transliteration to README
2025-07-22 07:43:39 +02:00

5.8 KiB

Implementation TODO

Proposed: Store and Snapshot Commands

Overview

Reorganize commands to provide better visibility into stored data and snapshots.

Command Structure

vaultik store - Storage information commands

  • vaultik store info
    • Lists S3 bucket configuration
    • Shows total number of snapshots (from metadata/ listing)
    • Shows total number of blobs (from blobs/ listing)
    • Shows total size of all blobs
    • No decryption required - uses S3 listing only

vaultik snapshot - Snapshot management commands

  • vaultik snapshot create [path]

    • Renamed from vaultik backup
    • Same functionality as current backup command
  • vaultik snapshot list [--json]

    • Lists all snapshots with:
      • Snapshot ID
      • Creation timestamp (parsed from snapshot ID)
      • Compressed size (sum of referenced blob sizes from manifest)
    • No decryption required - uses blob manifests only
    • --json flag outputs in JSON format instead of table
  • vaultik snapshot purge

    • Requires one of:
      • --keep-latest - keeps only the most recent snapshot
      • --older-than <duration> - removes snapshots older than duration (e.g., "30d", "6m", "1y")
    • Removes snapshot metadata and runs pruning to clean up unreferenced blobs
    • Shows what would be deleted and requires confirmation
  • vaultik snapshot verify [--deep] <snapshot-id>

    • Basic mode: Verifies all blobs referenced in manifest exist in S3
    • --deep mode: Downloads each blob and verifies its hash matches the stored hash
    • Stub implementation for now

Implementation Notes

  1. No Decryption Required: All commands work with unencrypted blob manifests
  2. Blob Manifests: Located at metadata/{snapshot-id}/manifest.json.zst
  3. S3 Operations: Use S3 ListObjects to enumerate snapshots and blobs
  4. Size Calculations: Sum blob sizes from S3 object metadata
  5. Timestamp Parsing: Extract from snapshot ID format (e.g., 2024-01-15-143052-hostname)
  6. S3 Metadata: Only used for snapshot verify command

Benefits

  • Users can see storage usage without decryption keys
  • Snapshot management doesn't require access to encrypted metadata
  • Clean separation between storage info and snapshot operations

Chunking and Hashing

  1. Implement content-defined chunking (done with FastCDC)
  2. Create streaming chunk processor (done in chunker)
  3. Implement SHA256 hashing for chunks (done in scanner)
  4. Add configurable chunk size parameters (done in scanner)
  5. Write tests for chunking consistency (done)

Compression and Encryption

  1. Implement compression (done with zlib in blob packer)
  2. Integrate age encryption library (done in crypto package)
  3. Create Encryptor type for public key encryption (done)
  4. Implement streaming encrypt/decrypt pipelines (done in packer)
  5. Write tests for compression and encryption (done)

Blob Packing

  1. Implement BlobWriter with size limits (done in packer)
  2. Add chunk accumulation and flushing (done)
  3. Create blob hash calculation (done)
  4. Implement proper error handling and rollback (done with transactions)
  5. Write tests for blob packing scenarios (done)

S3 Operations

  1. Integrate MinIO client library (done in s3 package)
  2. Implement S3Client wrapper type (done)
  3. Add multipart upload support for large blobs (done - using standard upload)
  4. Implement retry logic (handled by MinIO client)
  5. Write tests using MinIO container (done with testcontainers)

Backup Command - Basic

  1. Implement directory walking with exclusion patterns (done with afero)
  2. Add file change detection using index
  3. Integrate chunking pipeline for changed files (done in scanner)
  4. Implement blob upload coordination to S3
  5. Add progress reporting to stderr
  6. Write integration tests for backup

Snapshot Metadata

  1. Implement snapshot metadata extraction from index
  2. Create SQLite snapshot database builder
  3. Add metadata compression and encryption
  4. Implement metadata chunking for large snapshots
  5. Add hash calculation and verification
  6. Implement metadata upload to S3
  7. Write tests for metadata operations

Restore Command

  1. Implement snapshot listing and selection
  2. Add metadata download and reconstruction
  3. Implement hash verification for metadata
  4. Create file restoration logic with chunk retrieval
  5. Add blob caching for efficiency
  6. Implement proper file permissions and mtime restoration
  7. Write integration tests for restore

Prune Command

  1. Implement latest snapshot detection
  2. Add referenced blob extraction from metadata
  3. Create S3 blob listing and comparison
  4. Implement safe deletion of unreferenced blobs
  5. Add dry-run mode for safety
  6. Write tests for prune scenarios

Verify Command

  1. Implement metadata integrity checking
  2. Add blob existence verification
  3. Implement quick mode (S3 hash checking)
  4. Implement deep mode (download and verify chunks)
  5. Add detailed error reporting
  6. Write tests for verification

Fetch Command

  1. Implement single-file metadata query
  2. Add minimal blob downloading for file
  3. Create streaming file reconstruction
  4. Add support for output redirection
  5. Write tests for fetch command

Daemon Mode

  1. Implement inotify watcher for Linux
  2. Add dirty path tracking in index
  3. Create periodic full scan scheduler
  4. Implement backup interval enforcement
  5. Add proper signal handling and shutdown
  6. Write tests for daemon behavior

Cron Mode

  1. Implement silent operation mode
  2. Add proper exit codes for cron
  3. Implement lock file to prevent concurrent runs
  4. Add error summary reporting
  5. Write tests for cron mode

Finalization

  1. Add comprehensive logging throughout
  2. Implement proper error wrapping and context
  3. Add performance metrics collection
  4. Create end-to-end integration tests
  5. Write documentation and examples
  6. Set up CI/CD pipeline