Major refactoring: UUID-based storage, streaming architecture, and CLI improvements
This commit represents a significant architectural overhaul of vaultik: Database Schema Changes: - Switch files table to use UUID primary keys instead of path-based keys - Add UUID primary keys to blobs table for immediate chunk association - Update all foreign key relationships to use UUIDs - Add comprehensive schema documentation in DATAMODEL.md - Add SQLite busy timeout handling for concurrent operations Streaming and Performance Improvements: - Implement true streaming blob packing without intermediate storage - Add streaming chunk processing to reduce memory usage - Improve progress reporting with real-time metrics - Add upload metrics tracking in new uploads table CLI Refactoring: - Restructure CLI to use subcommands: snapshot create/list/purge/verify - Add store info command for S3 configuration display - Add custom duration parser supporting days/weeks/months/years - Remove old backup.go in favor of enhanced snapshot.go - Add --cron flag for silent operation Configuration Changes: - Remove unused index_prefix configuration option - Add support for snapshot pruning retention policies - Improve configuration validation and error messages Testing Improvements: - Add comprehensive repository tests with edge cases - Add cascade delete debugging tests - Fix concurrent operation tests to use SQLite busy timeout - Remove tolerance for SQLITE_BUSY errors in tests Documentation: - Add MIT LICENSE file - Update README with new command structure - Add comprehensive DATAMODEL.md explaining database schema - Update DESIGN.md with UUID-based architecture Other Changes: - Add test-config.yml for testing - Update Makefile with better test output formatting - Fix various race conditions in concurrent operations - Improve error handling throughout
This commit is contained in:
@@ -1,9 +1,15 @@
|
||||
// Package database provides data models and repository interfaces for the Vaultik backup system.
|
||||
// It includes types for files, chunks, blobs, snapshots, and their relationships.
|
||||
package database
|
||||
|
||||
import "time"
|
||||
|
||||
// File represents a file record in the database
|
||||
// File represents a file or directory in the backup system.
|
||||
// It stores metadata about files including timestamps, permissions, ownership,
|
||||
// and symlink targets. This information is used to restore files with their
|
||||
// original attributes.
|
||||
type File struct {
|
||||
ID string // UUID primary key
|
||||
Path string
|
||||
MTime time.Time
|
||||
CTime time.Time
|
||||
@@ -14,37 +20,52 @@ type File struct {
|
||||
LinkTarget string // empty for regular files, target path for symlinks
|
||||
}
|
||||
|
||||
// IsSymlink returns true if this file is a symbolic link
|
||||
// IsSymlink returns true if this file is a symbolic link.
|
||||
// A file is considered a symlink if it has a non-empty LinkTarget.
|
||||
func (f *File) IsSymlink() bool {
|
||||
return f.LinkTarget != ""
|
||||
}
|
||||
|
||||
// FileChunk represents the mapping between files and chunks
|
||||
// FileChunk represents the mapping between files and their constituent chunks.
|
||||
// Large files are split into multiple chunks for efficient deduplication and storage.
|
||||
// The Idx field maintains the order of chunks within a file.
|
||||
type FileChunk struct {
|
||||
Path string
|
||||
FileID string
|
||||
Idx int
|
||||
ChunkHash string
|
||||
}
|
||||
|
||||
// Chunk represents a chunk record in the database
|
||||
// Chunk represents a data chunk in the deduplication system.
|
||||
// Files are split into chunks which are content-addressed by their hash.
|
||||
// The ChunkHash is used for deduplication, while SHA256 provides
|
||||
// an additional verification hash.
|
||||
type Chunk struct {
|
||||
ChunkHash string
|
||||
SHA256 string
|
||||
Size int64
|
||||
}
|
||||
|
||||
// Blob represents a blob record in the database
|
||||
// Blob represents a blob record in the database.
|
||||
// A blob is Vaultik's final storage unit - a large file (up to 10GB) containing
|
||||
// many compressed and encrypted chunks from multiple source files.
|
||||
// Blobs are content-addressed, meaning their filename in S3 is derived from
|
||||
// the SHA256 hash of their compressed and encrypted content.
|
||||
// The blob creation process is: chunks are accumulated -> compressed with zstd
|
||||
// -> encrypted with age -> hashed -> uploaded to S3 with the hash as filename.
|
||||
type Blob struct {
|
||||
ID string
|
||||
Hash string // Can be empty until blob is finalized
|
||||
CreatedTS time.Time
|
||||
FinishedTS *time.Time // nil if not yet finalized
|
||||
UncompressedSize int64
|
||||
CompressedSize int64
|
||||
UploadedTS *time.Time // nil if not yet uploaded
|
||||
ID string // UUID assigned when blob creation starts
|
||||
Hash string // SHA256 of final compressed+encrypted content (empty until finalized)
|
||||
CreatedTS time.Time // When blob creation started
|
||||
FinishedTS *time.Time // When blob was finalized (nil if still packing)
|
||||
UncompressedSize int64 // Total size of raw chunks before compression
|
||||
CompressedSize int64 // Size after compression and encryption
|
||||
UploadedTS *time.Time // When blob was uploaded to S3 (nil if not uploaded)
|
||||
}
|
||||
|
||||
// BlobChunk represents the mapping between blobs and chunks
|
||||
// BlobChunk represents the mapping between blobs and the chunks they contain.
|
||||
// This allows tracking which chunks are stored in which blobs, along with
|
||||
// their position and size within the blob. The offset and length fields
|
||||
// enable extracting specific chunks from a blob without processing the entire blob.
|
||||
type BlobChunk struct {
|
||||
BlobID string
|
||||
ChunkHash string
|
||||
@@ -52,27 +73,34 @@ type BlobChunk struct {
|
||||
Length int64
|
||||
}
|
||||
|
||||
// ChunkFile represents the reverse mapping of chunks to files
|
||||
// ChunkFile represents the reverse mapping showing which files contain a specific chunk.
|
||||
// This is used during deduplication to identify all files that share a chunk,
|
||||
// which is important for garbage collection and integrity verification.
|
||||
type ChunkFile struct {
|
||||
ChunkHash string
|
||||
FilePath string
|
||||
FileID string
|
||||
FileOffset int64
|
||||
Length int64
|
||||
}
|
||||
|
||||
// Snapshot represents a snapshot record in the database
|
||||
type Snapshot struct {
|
||||
ID string
|
||||
Hostname string
|
||||
VaultikVersion string
|
||||
StartedAt time.Time
|
||||
CompletedAt *time.Time // nil if still in progress
|
||||
FileCount int64
|
||||
ChunkCount int64
|
||||
BlobCount int64
|
||||
TotalSize int64 // Total size of all referenced files
|
||||
BlobSize int64 // Total size of all referenced blobs (compressed and encrypted)
|
||||
CompressionRatio float64 // Compression ratio (BlobSize / TotalSize)
|
||||
ID string
|
||||
Hostname string
|
||||
VaultikVersion string
|
||||
VaultikGitRevision string
|
||||
StartedAt time.Time
|
||||
CompletedAt *time.Time // nil if still in progress
|
||||
FileCount int64
|
||||
ChunkCount int64
|
||||
BlobCount int64
|
||||
TotalSize int64 // Total size of all referenced files
|
||||
BlobSize int64 // Total size of all referenced blobs (compressed and encrypted)
|
||||
BlobUncompressedSize int64 // Total uncompressed size of all referenced blobs
|
||||
CompressionRatio float64 // Compression ratio (BlobSize / BlobUncompressedSize)
|
||||
CompressionLevel int // Compression level used for this snapshot
|
||||
UploadBytes int64 // Total bytes uploaded during this snapshot
|
||||
UploadDurationMs int64 // Total milliseconds spent uploading to S3
|
||||
}
|
||||
|
||||
// IsComplete returns true if the snapshot has completed
|
||||
@@ -83,7 +111,7 @@ func (s *Snapshot) IsComplete() bool {
|
||||
// SnapshotFile represents the mapping between snapshots and files
|
||||
type SnapshotFile struct {
|
||||
SnapshotID string
|
||||
FilePath string
|
||||
FileID string
|
||||
}
|
||||
|
||||
// SnapshotBlob represents the mapping between snapshots and blobs
|
||||
|
||||
Reference in New Issue
Block a user