Commit Graph

6 Commits

Author SHA1 Message Date
7ae49a1b2c Stream blobs to disk and restore files in blob-locality order
Three coordinated changes drop restore wall-clock by orders of
magnitude on real-world snapshots and bring memory use back under
control:

  * Streaming download into the disk cache. New
    blobDiskCache.PutFromReader takes an io.Reader and io.Copy's it
    straight into the cache file. The old downloadBlob path did
    io.ReadAll on the decrypted plaintext stream — for a 50 GB blob
    that meant 50 GB in RAM before the cache write. The whole chain
    (Storage.Get → age.Decrypt → zstd.NewReader → io.Copy) is now
    fully streaming; peak RAM per blob is bounded by ~64 KB of
    internal age/zstd buffers plus the io.Copy buffer.

  * Chunk extraction via ReadAt. After a blob is on disk, restore
    reads chunks via blobDiskCache.ReadAt(hash, offset, length) so
    only the chunk's bytes ever touch RAM. The previous code path
    called blobCache.Get for every cache-hit chunk, which read the
    entire blob (e.g. 10 GB) from disk into a []byte just to slice
    out a few KB — single-handedly ~900 ms per cache hit on the
    user's photo snapshot.

  * Locality-aware restore ordering. New restorePlan indexes
    file→blob_set and blob→file_set at restore start, then drives
    the loop so that every file whose full blob set is on disk is
    drained before any new blob downloads. After the drain queue
    empties, the planner picks the pending file with the smallest
    uncached-blob count, downloads those blobs, and drains again.
    A sweep is forced right before each download so the just-
    completed blob is evicted before the next one is Put, keeping
    peak disk-cache occupancy at 1 for path-order-adversarial
    snapshots.

The restore hot path also moves onto a restoreSession struct so
restoreFile/restoreRegularFile/etc. take only the per-call file
argument instead of threading 9+ parameters through every signature.
The new BlobRepository.GetAll method lets the session build a single
blob-id → blob-hash map at start instead of doing one DB query per
chunk.

TestRestoreLocalityAndReadAt passes: peak_len ≤ 1, get_calls = 0,
readat_calls > 0, every blob fetched exactly once.
2026-06-24 08:33:22 +02:00
d479bfcd52 Adopt sneak.berlin/go/vaultik vanity import path, README overhaul
Module path changed from git.eeqj.de/sneak/vaultik to
sneak.berlin/go/vaultik (vanity redirect). All imports, ldflags,
Dockerfile, goreleaser config, and docs updated. App data/config
directories now use plain "vaultik" instead of the reverse-DNS name.

README:
- New copy-pasteable quickstart at top: go install, config init,
  age keypair, config set for key + file:// destination, home backup
- All command names in command details are code-quoted
- config set/get gained sequence index support (age_recipients.0)
  so lists are settable from the CLI
- Dockerfile build is CGO_ENABLED=0 to match the pure-Go build
2026-06-10 11:37:23 -07:00
78af626759 Major refactoring: UUID-based storage, streaming architecture, and CLI improvements
This commit represents a significant architectural overhaul of vaultik:

Database Schema Changes:
- Switch files table to use UUID primary keys instead of path-based keys
- Add UUID primary keys to blobs table for immediate chunk association
- Update all foreign key relationships to use UUIDs
- Add comprehensive schema documentation in DATAMODEL.md
- Add SQLite busy timeout handling for concurrent operations

Streaming and Performance Improvements:
- Implement true streaming blob packing without intermediate storage
- Add streaming chunk processing to reduce memory usage
- Improve progress reporting with real-time metrics
- Add upload metrics tracking in new uploads table

CLI Refactoring:
- Restructure CLI to use subcommands: snapshot create/list/purge/verify
- Add store info command for S3 configuration display
- Add custom duration parser supporting days/weeks/months/years
- Remove old backup.go in favor of enhanced snapshot.go
- Add --cron flag for silent operation

Configuration Changes:
- Remove unused index_prefix configuration option
- Add support for snapshot pruning retention policies
- Improve configuration validation and error messages

Testing Improvements:
- Add comprehensive repository tests with edge cases
- Add cascade delete debugging tests
- Fix concurrent operation tests to use SQLite busy timeout
- Remove tolerance for SQLITE_BUSY errors in tests

Documentation:
- Add MIT LICENSE file
- Update README with new command structure
- Add comprehensive DATAMODEL.md explaining database schema
- Update DESIGN.md with UUID-based architecture

Other Changes:
- Add test-config.yml for testing
- Update Makefile with better test output formatting
- Fix various race conditions in concurrent operations
- Improve error handling throughout
2025-07-22 14:56:44 +02:00
86b533d6ee Refactor blob storage to use UUID primary keys and implement streaming chunking
- Changed blob table to use ID (UUID) as primary key instead of hash
- Blob records are now created at packing start, enabling immediate chunk associations
- Implemented streaming chunking to process large files without memory exhaustion
- Fixed blob manifest generation to include all referenced blobs
- Updated all foreign key references from blob_hash to blob_id
- Added progress reporting and improved error handling
- Enforced encryption requirement for all blob packing
- Updated tests to use test encryption keys
- Added Cyrillic transliteration to README
2025-07-22 07:43:39 +02:00
8529ae9735 Implement SQLite index database layer
- Add pure Go SQLite driver (modernc.org/sqlite) to avoid CGO dependency
- Implement database connection management with WAL mode
- Add write mutex for serializing concurrent writes
- Create schema for all tables matching DESIGN.md specifications
- Implement repository pattern for all database entities:
  - Files, FileChunks, Chunks, Blobs, BlobChunks, ChunkFiles, Snapshots
- Add transaction support with proper rollback handling
- Add fatal error handling for database integrity issues
- Add snapshot fields for tracking file sizes and compression ratios
- Make index path configurable via VAULTIK_INDEX_PATH environment variable
- Add comprehensive test coverage for all repositories
- Add format check to Makefile to ensure code formatting
2025-07-20 10:56:30 +02:00
b2e85d9e76 Implement local SQLite index database with repositories
- Add SQLite database connection management with proper error handling
- Implement schema for files, chunks, blobs, and snapshots tables
- Create repository pattern for each database table
- Add transaction support with proper rollback handling
- Integrate database module with fx dependency injection
- Make index path configurable via VAULTIK_INDEX_PATH env var
- Add fatal error handling for database integrity issues
- Update DESIGN.md to clarify file_chunks vs chunk_files distinction
- Remove FinalHash from BlobInfo (blobs are content-addressable)
- Add file metadata support (mtime, ctime, mode, uid, gid, symlinks)
2025-07-20 10:26:15 +02:00