Commit Graph

32 Commits

Author SHA1 Message Date
24c5e8c5a6 Refactor: Create file records only after successful chunking
- Scan phase now only collects files to process, no DB writes
- Unchanged files get snapshot_files associations via batch (no new records)
- New/changed files get records created during processing after chunking
- Reduces DB writes significantly (only changed files need new records)
- Avoids orphaned file records if backup is interrupted mid-way
2025-12-19 12:40:45 +07:00
40fff09594 Update progress output format with compact file counts
New format: Progress [5.7k/610k] 6.7 GB/44 GB (15.4%), 106 MB/sec, 500 files/sec, running for 1m30s, ETA: 5m49s

- Compact file counts with k/M suffixes in brackets
- Bytes processed/total with percentage
- Both byte rate and file rate
- Elapsed time shown as "running for X"
2025-12-19 12:33:38 +07:00
8a8651c690 Fix foreign key error when deleting incomplete snapshots
Delete uploads table entries before deleting the snapshot itself.
The uploads table has a foreign key to snapshots(id) without CASCADE,
so we must explicitly delete upload records first.
2025-12-19 12:27:05 +07:00
a1d559c30d Improve processing progress output with bytes and blob messages
- Show bytes processed/total instead of just files
- Display data rate in bytes/sec
- Calculate ETA based on bytes (more accurate than files)
- Print message when each blob is stored with size and speed
2025-12-19 12:24:55 +07:00
88e2508dc7 Eliminate redundant filesystem traversal in scan phase
Remove the separate enumerateFiles() function that was doing a full
directory walk using Readdir() which calls stat() on every file.
Instead, build the existingFiles map during the scan phase walk,
and detect deleted files afterward.

This eliminates one full filesystem traversal, significantly speeding
up the scan phase for large directories.
2025-12-19 12:15:13 +07:00
c3725e745e Optimize scan phase: in-memory change detection and batched DB writes
Performance improvements:
- Load all known files from DB into memory at startup
- Check file changes against in-memory map (no per-file DB queries)
- Batch database writes in groups of 1000 files per transaction
- Scan phase now only counts regular files, not directories

This should improve scan speed from ~600 files/sec to potentially
10,000+ files/sec by eliminating per-file database round trips.
2025-12-19 12:08:47 +07:00
badc0c07e0 Add pluggable storage backend, PID locking, and improved scan progress
Storage backend:
- Add internal/storage package with Storer interface
- Implement FileStorer for local filesystem storage (file:// URLs)
- Implement S3Storer wrapping existing s3.Client
- Support storage_url config field (s3:// or file://)
- Migrate all consumers to use storage.Storer interface

PID locking:
- Add internal/pidlock package to prevent concurrent instances
- Acquire lock before app start, release on exit
- Detect stale locks from crashed processes

Scan progress improvements:
- Add fast file enumeration pass before stat() phase
- Use enumerated set for deletion detection (no extra filesystem access)
- Show progress with percentage, files/sec, elapsed time, and ETA
- Change "changed" to "changed/new" for clarity

Config improvements:
- Add tilde expansion for paths (~/)
- Use xdg library for platform-specific default index path
2025-12-19 11:52:51 +07:00
cda0cf865a Add ARCHITECTURE.md documenting internal design
Document the data model, type instantiation flow, and module
responsibilities. Covers chunker, packer, vaultik, cli, snapshot,
and database modules with detailed explanations of relationships
between File, Chunk, Blob, and Snapshot entities.
2025-12-18 19:49:42 -08:00
0736bd070b Add godoc documentation to exported types and methods
Add proper godoc comments to exported items in:
- internal/globals: Appname, Version, Commit variables; Globals type; New function
- internal/log: LogLevel type; level constants; Config type; Initialize, Fatal,
  Error, Warn, Notice, Info, Debug functions and variants; TTYHandler type and
  methods; Module variable; LogOptions type
2025-12-18 18:51:52 -08:00
d7cd9aac27 Add end-to-end integration tests for Vaultik
- Create comprehensive integration tests with mock S3 client
- Add in-memory filesystem and SQLite database support for testing
- Test full backup workflow including chunking, packing, and uploading
- Add test to verify encrypted blob content
- Fix scanner to use afero filesystem for temp file cleanup
- Demonstrate successful backup and verification with mock dependencies
2025-07-26 15:52:23 +02:00
bb38f8c5d6 Integrate afero filesystem abstraction library
- Add afero.Fs field to Vaultik struct for filesystem operations
- Vaultik now owns and manages the filesystem instance
- SnapshotManager receives filesystem via SetFilesystem() setter
- Update blob packer to use afero for temporary files
- Convert all filesystem operations to use afero abstraction
- Remove filesystem module - Vaultik manages filesystem directly
- Update tests: remove symlink test (unsupported by afero memfs)
- Fix TestMultipleFileChanges to handle scanner examining directories

This enables full end-to-end testing without touching disk by using
memory-backed filesystems. Database operations continue using real
filesystem as SQLite requires actual files.
2025-07-26 15:33:18 +02:00
e29a995120 Refactor: Move Vaultik struct and methods to internal/vaultik package
- Created new internal/vaultik package with unified Vaultik struct
- Moved all command methods (snapshot, info, prune, verify) from CLI to vaultik package
- Implemented single constructor that handles crypto capabilities automatically
- Added CanDecrypt() method to check if decryption is available
- Updated all CLI commands to use the new vaultik.Vaultik struct
- Removed old fragmented App structs and WithCrypto wrapper
- Fixed context management - Vaultik now owns its context lifecycle
- Cleaned up package imports and dependencies

This creates a cleaner separation between CLI/Cobra code and business logic,
with all vaultik operations now centralized in the internal/vaultik package.
2025-07-26 14:47:26 +02:00
5c70405a85 Fix snapshot list to fail on manifest errors
- Remove error suppression for manifest decoding errors
- Manifest read/deserialize errors now fail immediately with clear error messages
- This ensures we catch format mismatches and other issues early
2025-07-26 03:31:09 +02:00
a544fa80f2 Major refactoring: Updated manifest format and renamed backup to snapshot
- Created manifest.go with proper Manifest structure including blob sizes
- Updated manifest generation to include compressed size for each blob
- Added TotalCompressedSize field to manifest for quick access
- Renamed backup package to snapshot for clarity
- Updated snapshot list to show all remote snapshots
- Remote snapshots not in local DB fetch manifest to get size
- Local snapshots not in remote are automatically deleted
- Removed backwards compatibility code (pre-1.0, no users)
- Fixed prune command to use new manifest format
- Updated all imports and references from backup to snapshot
2025-07-26 03:27:47 +02:00
c07d8eec0a Fix snapshot list to not download manifests
- Removed unnecessary manifest downloads from snapshot list command
- Removed blob size calculation from listing operation
- Removed COMPRESSED SIZE column from output since we're not calculating it
- This makes snapshot list much faster and avoids 404 errors for old snapshots
2025-07-26 03:16:18 +02:00
0cbb5aa0a6 Update snapshot list to sync with remote
- Added syncWithRemote method that lists remote snapshots from S3
- Removes local snapshots that don't exist in remote storage
- Ensures local database stays in sync with actual remote state
- This prevents showing snapshots that have been deleted from S3
2025-07-26 03:14:20 +02:00
fb220685a2 Fix manifest generation to not encrypt manifests
- Manifests are now only compressed (not encrypted) so pruning operations can work without private keys
- Updated generateBlobManifest to use zstd compression directly
- Updated prune command to handle unencrypted manifests
- Updated snapshot list command to handle new manifest format
- Updated documentation to reflect manifest.json.zst (not .age)
- Removed unnecessary VAULTIK_PRIVATE_KEY check from prune command
2025-07-26 02:54:52 +02:00
1d027bde57 Fix prune command to use config file for bucket and prefix
- Remove --bucket and --prefix command line flags
- Use bucket and prefix from S3 configuration in config file
- Update command to follow same pattern as other commands
- Maintain consistency that all configuration comes from config file
2025-07-26 02:41:00 +02:00
bb2292de7f Fix file content change handling and improve log messages
- Delete old file_chunks and chunk_files when file content changes
- Add DeleteByFileID method to ChunkFileRepository
- Add tests to verify old chunks are properly disassociated
- Make log messages more precise throughout scanner and snapshot
- Support metadata-only snapshots when no files have changed
- Add periodic status output during scan and snapshot operations
- Improve scan summary output with clearer information
2025-07-26 02:38:50 +02:00
d3afa65420 Fix foreign key constraints and improve snapshot tracking
- Add unified compression/encryption package in internal/blobgen
- Update DATAMODEL.md to reflect current schema implementation
- Refactor snapshot cleanup into well-named methods for clarity
- Add snapshot_id to uploads table to track new blobs per snapshot
- Fix blob count reporting for incremental backups
- Add DeleteOrphaned method to BlobChunkRepository
- Fix cleanup order to respect foreign key constraints
- Update tests to reflect schema changes
2025-07-26 02:22:25 +02:00
78af626759 Major refactoring: UUID-based storage, streaming architecture, and CLI improvements
This commit represents a significant architectural overhaul of vaultik:

Database Schema Changes:
- Switch files table to use UUID primary keys instead of path-based keys
- Add UUID primary keys to blobs table for immediate chunk association
- Update all foreign key relationships to use UUIDs
- Add comprehensive schema documentation in DATAMODEL.md
- Add SQLite busy timeout handling for concurrent operations

Streaming and Performance Improvements:
- Implement true streaming blob packing without intermediate storage
- Add streaming chunk processing to reduce memory usage
- Improve progress reporting with real-time metrics
- Add upload metrics tracking in new uploads table

CLI Refactoring:
- Restructure CLI to use subcommands: snapshot create/list/purge/verify
- Add store info command for S3 configuration display
- Add custom duration parser supporting days/weeks/months/years
- Remove old backup.go in favor of enhanced snapshot.go
- Add --cron flag for silent operation

Configuration Changes:
- Remove unused index_prefix configuration option
- Add support for snapshot pruning retention policies
- Improve configuration validation and error messages

Testing Improvements:
- Add comprehensive repository tests with edge cases
- Add cascade delete debugging tests
- Fix concurrent operation tests to use SQLite busy timeout
- Remove tolerance for SQLITE_BUSY errors in tests

Documentation:
- Add MIT LICENSE file
- Update README with new command structure
- Add comprehensive DATAMODEL.md explaining database schema
- Update DESIGN.md with UUID-based architecture

Other Changes:
- Add test-config.yml for testing
- Update Makefile with better test output formatting
- Fix various race conditions in concurrent operations
- Improve error handling throughout
2025-07-22 14:56:44 +02:00
86b533d6ee Refactor blob storage to use UUID primary keys and implement streaming chunking
- Changed blob table to use ID (UUID) as primary key instead of hash
- Blob records are now created at packing start, enabling immediate chunk associations
- Implemented streaming chunking to process large files without memory exhaustion
- Fixed blob manifest generation to include all referenced blobs
- Updated all foreign key references from blob_hash to blob_id
- Added progress reporting and improved error handling
- Enforced encryption requirement for all blob packing
- Updated tests to use test encryption keys
- Added Cyrillic transliteration to README
2025-07-22 07:43:39 +02:00
26db096913 Move StartTime initialization to application startup hook
- Remove StartTime initialization from globals.New()
- Add setupGlobals function in app.go to set StartTime during fx OnStart
- Simplify globals package to be just a key/value store
- Remove fx dependencies from globals test
2025-07-20 12:05:24 +02:00
36c59cb7b3 Set up S3 testing infrastructure for backup implementation
- Add gofakes3 for in-process S3-compatible test server
- Create test server that runs on localhost:9999 with temp directory
- Implement basic S3 client wrapper with standard operations
- Add comprehensive tests for blob and metadata storage patterns
- Test cleanup properly removes temporary directories
- Use AWS SDK v2 for S3 operations with proper error handling
2025-07-20 11:19:16 +02:00
9c072166fa Add blob manifest for pruning without decryption
- Update bucket structure to include unencrypted blob manifest files
- Add <snapshot_id>.manifest.json.zst containing list of referenced blobs
- This enables pruning operations without requiring decryption keys
- Add snapshot management commands: list, rm, latest (stubs)
- Add --prune flag to backup command for automatic cleanup
- Update DESIGN.md to document manifest format and updated prune flow
2025-07-20 11:03:53 +02:00
8529ae9735 Implement SQLite index database layer
- Add pure Go SQLite driver (modernc.org/sqlite) to avoid CGO dependency
- Implement database connection management with WAL mode
- Add write mutex for serializing concurrent writes
- Create schema for all tables matching DESIGN.md specifications
- Implement repository pattern for all database entities:
  - Files, FileChunks, Chunks, Blobs, BlobChunks, ChunkFiles, Snapshots
- Add transaction support with proper rollback handling
- Add fatal error handling for database integrity issues
- Add snapshot fields for tracking file sizes and compression ratios
- Make index path configurable via VAULTIK_INDEX_PATH environment variable
- Add comprehensive test coverage for all repositories
- Add format check to Makefile to ensure code formatting
2025-07-20 10:56:30 +02:00
b2e85d9e76 Implement local SQLite index database with repositories
- Add SQLite database connection management with proper error handling
- Implement schema for files, chunks, blobs, and snapshots tables
- Create repository pattern for each database table
- Add transaction support with proper rollback handling
- Integrate database module with fx dependency injection
- Make index path configurable via VAULTIK_INDEX_PATH env var
- Add fatal error handling for database integrity issues
- Update DESIGN.md to clarify file_chunks vs chunk_files distinction
- Remove FinalHash from BlobInfo (blobs are content-addressable)
- Add file metadata support (mtime, ctime, mode, uid, gid, symlinks)
2025-07-20 10:26:15 +02:00
9de439a0a4 Move TODO list to separate file and update verify command
- Extract implementation TODO from DESIGN.md to TODO.md
- Remove completed Phase 1 tasks
- Add --quick option to verify command for S3 hash checking
- Update documentation to reflect deep verification as default
2025-07-20 09:49:10 +02:00
bcbc186286 Refactor CLI to use flags instead of positional arguments
- Change all commands to use flags (--bucket, --prefix, etc.)
- Add --config flag to backup command
- Support VAULTIK_CONFIG environment variable for config path
- Use /etc/vaultik/config.yml as default config location
- Add test/config.yaml for testing
- Update tests to use environment variable for config path
- Add .gitignore for build artifacts and local configs
- Update documentation to reflect new CLI syntax
2025-07-20 09:45:24 +02:00
3e8b98dec6 Implement CLI skeleton with cobra and fx dependency injection
- Set up cobra CLI with all commands (backup, restore, prune, verify, fetch)
- Integrate uber/fx for dependency injection and lifecycle management
- Add globals package with build-time variables (Version, Commit)
- Implement config loading from YAML with validation
- Create core data models (FileInfo, ChunkInfo, BlobInfo, Snapshot)
- Add Makefile with build, test, lint, and clean targets
- Include minimal test suite for compilation verification
- Update documentation with --quick flag for verify command
- Fix markdown numbering in implementation TODO
2025-07-20 09:34:14 +02:00
0df07790ba Document complete vaultik architecture and implementation plan
- Expand README with full CLI documentation, architecture details, and features
- Add comprehensive 87-step implementation plan to DESIGN.md
- Document all commands, configuration options, and security considerations
- Define complete API signatures and data structures
2025-07-20 09:04:31 +02:00
67319a4699 initial design 2025-07-20 08:51:38 +02:00