vaultik/TODO.md

# Implementation TODO

## Proposed: Store and Snapshot Commands

### Overview
Reorganize commands to provide better visibility into stored data and snapshots.

### Command Structure

#### `vaultik store` - Storage information commands
- `vaultik store info`
  - Lists S3 bucket configuration
  - Shows total number of snapshots (from metadata/ listing)
  - Shows total number of blobs (from blobs/ listing)
  - Shows total size of all blobs
  - **No decryption required** - uses S3 listing only

#### `vaultik snapshot` - Snapshot management commands
- `vaultik snapshot create [path]`
  - Renamed from `vaultik backup`
  - Same functionality as current backup command

- `vaultik snapshot list [--json]`
  - Lists all snapshots with:
    - Snapshot ID
    - Creation timestamp (parsed from snapshot ID)
    - Compressed size (sum of referenced blob sizes from manifest)
  - **No decryption required** - uses blob manifests only
  - `--json` flag outputs in JSON format instead of table

- `vaultik snapshot purge`
  - Requires one of:
    - `--keep-latest` - keeps only the most recent snapshot
    - `--older-than <duration>` - removes snapshots older than duration (e.g., "30d", "6m", "1y")
  - Removes snapshot metadata and runs pruning to clean up unreferenced blobs
  - Shows what would be deleted and requires confirmation

- `vaultik snapshot verify [--deep] <snapshot-id>`
  - Basic mode: Verifies all blobs referenced in manifest exist in S3
  - `--deep` mode: Downloads each blob and verifies its hash matches the stored hash
  - **Stub implementation for now**

- `vaultik snapshot remove <snapshot-id>` (alias: `rm`)
  - Removes a snapshot and any blobs that become orphaned
  - Algorithm:
    1. Validate target snapshot exists in storage
    2. List all snapshots in storage
    3. Download manifests from all OTHER snapshots to build "in-use" blob set
    4. Download target snapshot's manifest to get its blob hashes
    5. Identify orphaned blobs: target blobs NOT in the in-use set
    6. Delete orphaned blobs from storage
    7. Delete snapshot metadata using existing `deleteSnapshot()` helper
  - Flags:
    - `--force` / `-f`: Skip confirmation prompt
    - `--dry-run`: Show what would be deleted without deleting
  - Files to modify:
    - `internal/cli/snapshot.go`: Add `newSnapshotRemoveCommand()`
    - `internal/vaultik/snapshot.go`: Add `RemoveSnapshot()` method
  - Reuse existing code:
    - Snapshot enumeration pattern from `PruneBlobs()` in `prune.go`
    - `v.downloadManifest(snapshotID)` for manifest downloading
    - Blob path format: `blobs/{hash[:2]}/{hash[2:4]}/{hash}`
    - `v.deleteSnapshot(snapshotID)` for metadata deletion

### Implementation Notes

1. **No Decryption Required**: All commands work with unencrypted blob manifests
2. **Blob Manifests**: Located at `metadata/{snapshot-id}/manifest.json.zst`
3. **S3 Operations**: Use S3 ListObjects to enumerate snapshots and blobs
4. **Size Calculations**: Sum blob sizes from S3 object metadata
5. **Timestamp Parsing**: Extract from snapshot ID format (e.g., `2024-01-15-143052-hostname`)
6. **S3 Metadata**: Only used for `snapshot verify` command

### Benefits
- Users can see storage usage without decryption keys
- Snapshot management doesn't require access to encrypted metadata
- Clean separation between storage info and snapshot operations

## Chunking and Hashing
1. ~~Implement content-defined chunking~~ (done with FastCDC)
1. ~~Create streaming chunk processor~~ (done in chunker)
1. ~~Implement SHA256 hashing for chunks~~ (done in scanner)
1. ~~Add configurable chunk size parameters~~ (done in scanner)
1. ~~Write tests for chunking consistency~~ (done)

## Compression and Encryption
1. ~~Implement compression~~ (done with zlib in blob packer)
1. ~~Integrate age encryption library~~ (done in crypto package)
1. ~~Create Encryptor type for public key encryption~~ (done)
1. ~~Implement streaming encrypt/decrypt pipelines~~ (done in packer)
1. ~~Write tests for compression and encryption~~ (done)

## Blob Packing
1. ~~Implement BlobWriter with size limits~~ (done in packer)
1. ~~Add chunk accumulation and flushing~~ (done)
1. ~~Create blob hash calculation~~ (done)
1. ~~Implement proper error handling and rollback~~ (done with transactions)
1. ~~Write tests for blob packing scenarios~~ (done)

## S3 Operations
1. ~~Integrate MinIO client library~~ (done in s3 package)
1. ~~Implement S3Client wrapper type~~ (done)
1. ~~Add multipart upload support for large blobs~~ (done - using standard upload)
1. ~~Implement retry logic~~ (handled by MinIO client)
1. ~~Write tests using MinIO container~~ (done with testcontainers)

## Backup Command - Basic
1. ~~Implement directory walking with exclusion patterns~~ (done with afero)
1. Add file change detection using index
1. ~~Integrate chunking pipeline for changed files~~ (done in scanner)
1. Implement blob upload coordination to S3
1. Add progress reporting to stderr
1. Write integration tests for backup

## Snapshot Metadata
1. Implement snapshot metadata extraction from index
1. Create SQLite snapshot database builder
1. Add metadata compression and encryption
1. Implement metadata chunking for large snapshots
1. Add hash calculation and verification
1. Implement metadata upload to S3
1. Write tests for metadata operations

## Restore Command
1. Implement snapshot listing and selection
1. Add metadata download and reconstruction
1. Implement hash verification for metadata
1. Create file restoration logic with chunk retrieval
1. Add blob caching for efficiency
1. Implement proper file permissions and mtime restoration
1. Write integration tests for restore

## Prune Command
1. Implement latest snapshot detection
1. Add referenced blob extraction from metadata
1. Create S3 blob listing and comparison
1. Implement safe deletion of unreferenced blobs
1. Add dry-run mode for safety
1. Write tests for prune scenarios

## Verify Command
1. Implement metadata integrity checking
1. Add blob existence verification
1. Implement quick mode (S3 hash checking)
1. Implement deep mode (download and verify chunks)
1. Add detailed error reporting
1. Write tests for verification

## Fetch Command
1. Implement single-file metadata query
1. Add minimal blob downloading for file
1. Create streaming file reconstruction
1. Add support for output redirection
1. Write tests for fetch command

## Daemon Mode
1. Implement inotify watcher for Linux
1. Add dirty path tracking in index
1. Create periodic full scan scheduler
1. Implement backup interval enforcement
1. Add proper signal handling and shutdown
1. Write tests for daemon behavior

## Cron Mode
1. Implement silent operation mode
1. Add proper exit codes for cron
1. Implement lock file to prevent concurrent runs
1. Add error summary reporting
1. Write tests for cron mode

## Finalization
1. Add comprehensive logging throughout
1. Implement proper error wrapping and context
1. Add performance metrics collection
1. Create end-to-end integration tests
1. Write documentation and examples
1. Set up CI/CD pipeline