Document the data model, type instantiation flow, and module responsibilities. Covers chunker, packer, vaultik, cli, snapshot, and database modules with detailed explanations of relationships between File, Chunk, Blob, and Snapshot entities.
14 KiB
Vaultik Architecture
This document describes the internal architecture of Vaultik, focusing on the data model, type instantiation, and the relationships between core modules.
Overview
Vaultik is a backup system that uses content-defined chunking for deduplication and packs chunks into large, compressed, encrypted blobs for efficient cloud storage. The system is built around dependency injection using uber-go/fx.
Data Flow
Source Files
│
▼
┌─────────────────┐
│ Scanner │ Walks directories, detects changed files
└────────┬────────┘
│
▼
┌─────────────────┐
│ Chunker │ Splits files into variable-size chunks (FastCDC)
└────────┬────────┘
│
▼
┌─────────────────┐
│ Packer │ Accumulates chunks, compresses (zstd), encrypts (age)
└────────┬────────┘
│
▼
┌─────────────────┐
│ S3 Client │ Uploads blobs to remote storage
└─────────────────┘
Data Model
Core Entities
The database tracks five primary entities and their relationships:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Snapshot │────▶│ File │────▶│ Chunk │
└──────────────┘ └──────────────┘ └──────────────┘
│ │
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Blob │◀─────────────────────────│ BlobChunk │
└──────────────┘ └──────────────┘
Entity Descriptions
File (database.File)
Represents a file or directory in the backup system. Stores metadata needed for restoration:
- Path, timestamps (mtime, ctime)
- Size, mode, ownership (uid, gid)
- Symlink target (if applicable)
Chunk (database.Chunk)
A content-addressed unit of data. Files are split into variable-size chunks using the FastCDC algorithm:
ChunkHash: SHA256 hash of chunk content (primary key)Size: Chunk size in bytes
Chunk sizes vary between avgChunkSize/4 and avgChunkSize*4 (typically 16KB-256KB for 64KB average).
FileChunk (database.FileChunk)
Maps files to their constituent chunks:
FileID: Reference to the fileIdx: Position of this chunk within the file (0-indexed)ChunkHash: Reference to the chunk
Blob (database.Blob)
The final storage unit uploaded to S3. Contains many compressed and encrypted chunks:
ID: UUID assigned at creationHash: SHA256 of final compressed+encrypted contentUncompressedSize: Total raw chunk data before compressionCompressedSize: Size after zstd compression and age encryptionCreatedTS,FinishedTS,UploadedTS: Lifecycle timestamps
Blob creation process:
- Chunks are accumulated (up to MaxBlobSize, typically 10GB)
- Compressed with zstd
- Encrypted with age (recipients configured in config)
- SHA256 hash computed → becomes filename in S3
- Uploaded to
blobs/{hash[0:2]}/{hash[2:4]}/{hash}
BlobChunk (database.BlobChunk)
Maps chunks to their position within blobs:
BlobID: Reference to the blobChunkHash: Reference to the chunkOffset: Byte offset within the uncompressed blobLength: Chunk size
Snapshot (database.Snapshot)
Represents a point-in-time backup:
ID: Format is{hostname}-{YYYYMMDD}-{HHMMSS}Z- Tracks file count, chunk count, blob count, sizes, compression ratio
CompletedAt: Null until snapshot finishes successfully
SnapshotFile / SnapshotBlob
Join tables linking snapshots to their files and blobs.
Relationship Summary
Snapshot 1──────────▶ N SnapshotFile N ◀────────── 1 File
Snapshot 1──────────▶ N SnapshotBlob N ◀────────── 1 Blob
File 1──────────▶ N FileChunk N ◀────────── 1 Chunk
Blob 1──────────▶ N BlobChunk N ◀────────── 1 Chunk
Type Instantiation
Application Startup
The CLI uses fx for dependency injection. Here's the instantiation order:
// cli/app.go: NewApp()
fx.New(
fx.Supply(config.ConfigPath(opts.ConfigPath)), // 1. Config path
fx.Supply(opts.LogOptions), // 2. Log options
fx.Provide(globals.New), // 3. Globals
fx.Provide(log.New), // 4. Logger config
config.Module, // 5. Config
database.Module, // 6. Database + Repositories
log.Module, // 7. Logger initialization
s3.Module, // 8. S3 client
snapshot.Module, // 9. SnapshotManager + ScannerFactory
fx.Provide(vaultik.New), // 10. Vaultik orchestrator
)
Key Type Instantiation Points
1. Config (config.Config)
- Created by:
config.Moduleviaconfig.LoadConfig() - When: Application startup (fx DI)
- Contains: All configuration from YAML file (S3 credentials, encryption keys, paths, etc.)
2. Database (database.DB)
- Created by:
database.Moduleviadatabase.New() - When: Application startup (fx DI)
- Contains: SQLite connection, path reference
3. Repositories (database.Repositories)
- Created by:
database.Moduleviadatabase.NewRepositories() - When: Application startup (fx DI)
- Contains: All repository interfaces (Files, Chunks, Blobs, Snapshots, etc.)
4. Vaultik (vaultik.Vaultik)
- Created by:
vaultik.New(VaultikParams) - When: Application startup (fx DI)
- Contains: All dependencies for backup operations
type Vaultik struct {
Globals *globals.Globals
Config *config.Config
DB *database.DB
Repositories *database.Repositories
S3Client *s3.Client
ScannerFactory snapshot.ScannerFactory
SnapshotManager *snapshot.SnapshotManager
Shutdowner fx.Shutdowner
Fs afero.Fs
ctx context.Context
cancel context.CancelFunc
}
5. SnapshotManager (snapshot.SnapshotManager)
- Created by:
snapshot.Moduleviasnapshot.NewSnapshotManager() - When: Application startup (fx DI)
- Responsibility: Creates/completes snapshots, exports metadata to S3
6. Scanner (snapshot.Scanner)
- Created by:
ScannerFactory(ScannerParams) - When: Each
CreateSnapshot()call - Contains: Chunker, Packer, progress reporter
// vaultik/snapshot.go: CreateSnapshot()
scanner := v.ScannerFactory(snapshot.ScannerParams{
EnableProgress: !opts.Cron,
Fs: v.Fs,
})
7. Chunker (chunker.Chunker)
- Created by:
chunker.NewChunker(avgChunkSize) - When: Inside
snapshot.NewScanner() - Configuration:
avgChunkSize: From config (typically 64KB)minChunkSize: avgChunkSize / 4maxChunkSize: avgChunkSize * 4
8. Packer (blob.Packer)
- Created by:
blob.NewPacker(PackerConfig) - When: Inside
snapshot.NewScanner() - Configuration:
MaxBlobSize: Maximum blob size before finalization (typically 10GB)CompressionLevel: zstd level (1-19)Recipients: age public keys for encryption
// snapshot/scanner.go: NewScanner()
packerCfg := blob.PackerConfig{
MaxBlobSize: cfg.MaxBlobSize,
CompressionLevel: cfg.CompressionLevel,
Recipients: cfg.AgeRecipients,
Repositories: cfg.Repositories,
Fs: cfg.FS,
}
packer, err := blob.NewPacker(packerCfg)
Module Responsibilities
internal/cli
Entry point for fx application. Combines all modules and handles signal interrupts.
Key functions:
NewApp(AppOptions)→ Creates fx.App with all modulesRunApp(ctx, app)→ Starts app, handles graceful shutdownRunWithApp(ctx, opts)→ Convenience wrapper
internal/vaultik
Main orchestrator containing all dependencies and command implementations.
Key methods:
New(VaultikParams)→ Constructor (fx DI)CreateSnapshot(opts)→ Main backup operationListSnapshots(jsonOutput)→ List available snapshotsVerifySnapshot(id, deep)→ Verify snapshot integrityPurgeSnapshots(...)→ Remove old snapshots
internal/chunker
Content-defined chunking using FastCDC algorithm.
Key types:
Chunk→ Hash, Data, Offset, SizeChunker→ avgChunkSize, minChunkSize, maxChunkSize
Key methods:
NewChunker(avgChunkSize)→ ConstructorChunkReaderStreaming(reader, callback)→ Stream chunks with callback (preferred)ChunkReader(reader)→ Return all chunks at once (memory-intensive)
internal/blob
Blob packing: accumulates chunks, compresses, encrypts, tracks metadata.
Key types:
Packer→ Thread-safe blob accumulatorChunkRef→ Hash + Data for adding to packerFinishedBlob→ Completed blob ready for uploadBlobWithReader→ FinishedBlob + io.Reader for streaming upload
Key methods:
NewPacker(PackerConfig)→ ConstructorAddChunk(ChunkRef)→ Add chunk to current blobFinalizeBlob()→ Compress, encrypt, hash current blobFlush()→ Finalize any in-progress blobSetBlobHandler(func)→ Set callback for upload
internal/snapshot
Scanner
Orchestrates the backup process for a directory.
Key methods:
NewScanner(ScannerConfig)→ Constructor (creates Chunker + Packer)Scan(ctx, path, snapshotID)→ Main scan operation
Scan phases:
- Phase 0: Detect deleted files from previous snapshots
- Phase 1: Walk directory, identify files needing processing
- Phase 2: Process files (chunk → pack → upload)
SnapshotManager
Manages snapshot lifecycle and metadata export.
Key methods:
CreateSnapshot(ctx, hostname, version, commit)→ Create snapshot recordCompleteSnapshot(ctx, snapshotID)→ Mark snapshot completeExportSnapshotMetadata(ctx, dbPath, snapshotID)→ Export to S3CleanupIncompleteSnapshots(ctx, hostname)→ Remove failed snapshots
internal/database
SQLite database for local index. Single-writer mode for thread safety.
Key types:
DB→ Database connection wrapperRepositories→ Collection of all repository interfaces
Repository interfaces:
FilesRepository→ CRUD for File recordsChunksRepository→ CRUD for Chunk recordsBlobsRepository→ CRUD for Blob recordsSnapshotsRepository→ CRUD for Snapshot records- Plus join table repositories (FileChunks, BlobChunks, etc.)
Snapshot Creation Flow
CreateSnapshot(opts)
│
├─► CleanupIncompleteSnapshots() // Critical: avoid dedup errors
│
├─► SnapshotManager.CreateSnapshot() // Create DB record
│
├─► For each source directory:
│ │
│ ├─► scanner.Scan(ctx, path, snapshotID)
│ │ │
│ │ ├─► Phase 0: detectDeletedFiles()
│ │ │
│ │ ├─► Phase 1: scanPhase()
│ │ │ Walk directory
│ │ │ Check file metadata changes
│ │ │ Build list of files to process
│ │ │
│ │ └─► Phase 2: processPhase()
│ │ For each file:
│ │ chunker.ChunkReaderStreaming()
│ │ For each chunk:
│ │ packer.AddChunk()
│ │ If blob full → FinalizeBlob()
│ │ → handleBlobReady()
│ │ → s3Client.PutObjectWithProgress()
│ │ packer.Flush() // Final blob
│ │
│ └─► Accumulate statistics
│
├─► SnapshotManager.UpdateSnapshotStatsExtended()
│
├─► SnapshotManager.CompleteSnapshot()
│
└─► SnapshotManager.ExportSnapshotMetadata()
│
├─► Copy database to temp file
├─► Clean to only current snapshot data
├─► Dump to SQL
├─► Compress with zstd
├─► Encrypt with age
├─► Upload db.zst.age to S3
└─► Upload manifest.json.zst to S3
Deduplication Strategy
-
File-level: Files unchanged since last backup are skipped (metadata comparison: size, mtime, mode, uid, gid)
-
Chunk-level: Chunks are content-addressed by SHA256 hash. If a chunk hash already exists in the database, the chunk data is not re-uploaded.
-
Blob-level: Blobs contain only unique chunks. Duplicate chunks within a blob are skipped.
Storage Layout in S3
bucket/
├── blobs/
│ └── {hash[0:2]}/
│ └── {hash[2:4]}/
│ └── {full-hash} # Compressed+encrypted blob
│
└── metadata/
└── {snapshot-id}/
├── db.zst.age # Encrypted database dump
└── manifest.json.zst # Blob list (for verification)
Thread Safety
Packer: Thread-safe via mutex. Multiple goroutines can callAddChunk().Scanner: UsespackerMumutex to coordinate blob finalization.Database: Single-writer mode (MaxOpenConns=1) ensures SQLite thread safety.Repositories.WithTx(): Handles transaction lifecycle automatically.