ctime is ambiguous cross-platform (macOS birth time vs Linux inode change time), never used operationally (scanning triggers on mtime), cannot be restored on either platform, and was write-only forensic data with no consumer. Removes ctime from: - files table schema (schema.sql) - File struct (models.go) - all SQL queries and scan targets (files.go) - scanner file metadata collection (scanner.go) - all test files - ARCHITECTURE.md and docs/DATAMODEL.md closes #54
14 KiB
Vaultik Architecture
This document describes the internal architecture of Vaultik, focusing on the data model, type instantiation, and the relationships between core modules.
Overview
Vaultik is a backup system that uses content-defined chunking for deduplication and packs chunks into large, compressed, encrypted blobs for efficient cloud storage. The system is built around dependency injection using uber-go/fx.
Data Flow
Source Files
│
▼
┌─────────────────┐
│ Scanner │ Walks directories, detects changed files
└────────┬────────┘
│
▼
┌─────────────────┐
│ Chunker │ Splits files into variable-size chunks (FastCDC)
└────────┬────────┘
│
▼
┌─────────────────┐
│ Packer │ Accumulates chunks, compresses (zstd), encrypts (age)
└────────┬────────┘
│
▼
┌─────────────────┐
│ S3 Client │ Uploads blobs to remote storage
└─────────────────┘
Data Model
Core Entities
The database tracks five primary entities and their relationships:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Snapshot │────▶│ File │────▶│ Chunk │
└──────────────┘ └──────────────┘ └──────────────┘
│ │
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Blob │◀─────────────────────────│ BlobChunk │
└──────────────┘ └──────────────┘
Entity Descriptions
File (database.File)
Represents a file or directory in the backup system. Stores metadata needed for restoration:
- Path, mtime
- Size, mode, ownership (uid, gid)
- Symlink target (if applicable)
Chunk (database.Chunk)
A content-addressed unit of data. Files are split into variable-size chunks using the FastCDC algorithm:
ChunkHash: SHA256 hash of chunk content (primary key)Size: Chunk size in bytes
Chunk sizes vary between avgChunkSize/4 and avgChunkSize*4 (typically 16KB-256KB for 64KB average).
FileChunk (database.FileChunk)
Maps files to their constituent chunks:
FileID: Reference to the fileIdx: Position of this chunk within the file (0-indexed)ChunkHash: Reference to the chunk
Blob (database.Blob)
The final storage unit uploaded to S3. Contains many compressed and encrypted chunks:
ID: UUID assigned at creationHash: SHA256 of final compressed+encrypted contentUncompressedSize: Total raw chunk data before compressionCompressedSize: Size after zstd compression and age encryptionCreatedTS,FinishedTS,UploadedTS: Lifecycle timestamps
Blob creation process:
- Chunks are accumulated (up to MaxBlobSize, typically 10GB)
- Compressed with zstd
- Encrypted with age (recipients configured in config)
- SHA256 hash computed → becomes filename in S3
- Uploaded to
blobs/{hash[0:2]}/{hash[2:4]}/{hash}
BlobChunk (database.BlobChunk)
Maps chunks to their position within blobs:
BlobID: Reference to the blobChunkHash: Reference to the chunkOffset: Byte offset within the uncompressed blobLength: Chunk size
Snapshot (database.Snapshot)
Represents a point-in-time backup:
ID: Format is{hostname}-{YYYYMMDD}-{HHMMSS}Z- Tracks file count, chunk count, blob count, sizes, compression ratio
CompletedAt: Null until snapshot finishes successfully
SnapshotFile / SnapshotBlob
Join tables linking snapshots to their files and blobs.
Relationship Summary
Snapshot 1──────────▶ N SnapshotFile N ◀────────── 1 File
Snapshot 1──────────▶ N SnapshotBlob N ◀────────── 1 Blob
File 1──────────▶ N FileChunk N ◀────────── 1 Chunk
Blob 1──────────▶ N BlobChunk N ◀────────── 1 Chunk
Type Instantiation
Application Startup
The CLI uses fx for dependency injection. Here's the instantiation order:
// cli/app.go: NewApp()
fx.New(
fx.Supply(config.ConfigPath(opts.ConfigPath)), // 1. Config path
fx.Supply(opts.LogOptions), // 2. Log options
fx.Provide(globals.New), // 3. Globals
fx.Provide(log.New), // 4. Logger config
config.Module, // 5. Config
database.Module, // 6. Database + Repositories
log.Module, // 7. Logger initialization
s3.Module, // 8. S3 client
snapshot.Module, // 9. SnapshotManager + ScannerFactory
fx.Provide(vaultik.New), // 10. Vaultik orchestrator
)
Key Type Instantiation Points
1. Config (config.Config)
- Created by:
config.Moduleviaconfig.LoadConfig() - When: Application startup (fx DI)
- Contains: All configuration from YAML file (S3 credentials, encryption keys, paths, etc.)
2. Database (database.DB)
- Created by:
database.Moduleviadatabase.New() - When: Application startup (fx DI)
- Contains: SQLite connection, path reference
3. Repositories (database.Repositories)
- Created by:
database.Moduleviadatabase.NewRepositories() - When: Application startup (fx DI)
- Contains: All repository interfaces (Files, Chunks, Blobs, Snapshots, etc.)
4. Vaultik (vaultik.Vaultik)
- Created by:
vaultik.New(VaultikParams) - When: Application startup (fx DI)
- Contains: All dependencies for backup operations
type Vaultik struct {
Globals *globals.Globals
Config *config.Config
DB *database.DB
Repositories *database.Repositories
S3Client *s3.Client
ScannerFactory snapshot.ScannerFactory
SnapshotManager *snapshot.SnapshotManager
Shutdowner fx.Shutdowner
Fs afero.Fs
ctx context.Context
cancel context.CancelFunc
}
5. SnapshotManager (snapshot.SnapshotManager)
- Created by:
snapshot.Moduleviasnapshot.NewSnapshotManager() - When: Application startup (fx DI)
- Responsibility: Creates/completes snapshots, exports metadata to S3
6. Scanner (snapshot.Scanner)
- Created by:
ScannerFactory(ScannerParams) - When: Each
CreateSnapshot()call - Contains: Chunker, Packer, progress reporter
// vaultik/snapshot.go: CreateSnapshot()
scanner := v.ScannerFactory(snapshot.ScannerParams{
EnableProgress: !opts.Cron,
Fs: v.Fs,
})
7. Chunker (chunker.Chunker)
- Created by:
chunker.NewChunker(avgChunkSize) - When: Inside
snapshot.NewScanner() - Configuration:
avgChunkSize: From config (typically 64KB)minChunkSize: avgChunkSize / 4maxChunkSize: avgChunkSize * 4
8. Packer (blob.Packer)
- Created by:
blob.NewPacker(PackerConfig) - When: Inside
snapshot.NewScanner() - Configuration:
MaxBlobSize: Maximum blob size before finalization (typically 10GB)CompressionLevel: zstd level (1-19)Recipients: age public keys for encryption
// snapshot/scanner.go: NewScanner()
packerCfg := blob.PackerConfig{
MaxBlobSize: cfg.MaxBlobSize,
CompressionLevel: cfg.CompressionLevel,
Recipients: cfg.AgeRecipients,
Repositories: cfg.Repositories,
Fs: cfg.FS,
}
packer, err := blob.NewPacker(packerCfg)
Module Responsibilities
internal/cli
Entry point for fx application. Combines all modules and handles signal interrupts.
Key functions:
NewApp(AppOptions)→ Creates fx.App with all modulesRunApp(ctx, app)→ Starts app, handles graceful shutdownRunWithApp(ctx, opts)→ Convenience wrapper
internal/vaultik
Main orchestrator containing all dependencies and command implementations.
Key methods:
New(VaultikParams)→ Constructor (fx DI)CreateSnapshot(opts)→ Main backup operationListSnapshots(jsonOutput)→ List available snapshotsVerifySnapshot(id, deep)→ Verify snapshot integrityPurgeSnapshots(...)→ Remove old snapshots
internal/chunker
Content-defined chunking using FastCDC algorithm.
Key types:
Chunk→ Hash, Data, Offset, SizeChunker→ avgChunkSize, minChunkSize, maxChunkSize
Key methods:
NewChunker(avgChunkSize)→ ConstructorChunkReaderStreaming(reader, callback)→ Stream chunks with callback (preferred)ChunkReader(reader)→ Return all chunks at once (memory-intensive)
internal/blob
Blob packing: accumulates chunks, compresses, encrypts, tracks metadata.
Key types:
Packer→ Thread-safe blob accumulatorChunkRef→ Hash + Data for adding to packerFinishedBlob→ Completed blob ready for uploadBlobWithReader→ FinishedBlob + io.Reader for streaming upload
Key methods:
NewPacker(PackerConfig)→ ConstructorAddChunk(ChunkRef)→ Add chunk to current blobFinalizeBlob()→ Compress, encrypt, hash current blobFlush()→ Finalize any in-progress blobSetBlobHandler(func)→ Set callback for upload
internal/snapshot
Scanner
Orchestrates the backup process for a directory.
Key methods:
NewScanner(ScannerConfig)→ Constructor (creates Chunker + Packer)Scan(ctx, path, snapshotID)→ Main scan operation
Scan phases:
- Phase 0: Detect deleted files from previous snapshots
- Phase 1: Walk directory, identify files needing processing
- Phase 2: Process files (chunk → pack → upload)
SnapshotManager
Manages snapshot lifecycle and metadata export.
Key methods:
CreateSnapshot(ctx, hostname, version, commit)→ Create snapshot recordCompleteSnapshot(ctx, snapshotID)→ Mark snapshot completeExportSnapshotMetadata(ctx, dbPath, snapshotID)→ Export to S3CleanupIncompleteSnapshots(ctx, hostname)→ Remove failed snapshots
internal/database
SQLite database for local index. Single-writer mode for thread safety.
Key types:
DB→ Database connection wrapperRepositories→ Collection of all repository interfaces
Repository interfaces:
FilesRepository→ CRUD for File recordsChunksRepository→ CRUD for Chunk recordsBlobsRepository→ CRUD for Blob recordsSnapshotsRepository→ CRUD for Snapshot records- Plus join table repositories (FileChunks, BlobChunks, etc.)
Snapshot Creation Flow
CreateSnapshot(opts)
│
├─► CleanupIncompleteSnapshots() // Critical: avoid dedup errors
│
├─► SnapshotManager.CreateSnapshot() // Create DB record
│
├─► For each source directory:
│ │
│ ├─► scanner.Scan(ctx, path, snapshotID)
│ │ │
│ │ ├─► Phase 0: detectDeletedFiles()
│ │ │
│ │ ├─► Phase 1: scanPhase()
│ │ │ Walk directory
│ │ │ Check file metadata changes
│ │ │ Build list of files to process
│ │ │
│ │ └─► Phase 2: processPhase()
│ │ For each file:
│ │ chunker.ChunkReaderStreaming()
│ │ For each chunk:
│ │ packer.AddChunk()
│ │ If blob full → FinalizeBlob()
│ │ → handleBlobReady()
│ │ → s3Client.PutObjectWithProgress()
│ │ packer.Flush() // Final blob
│ │
│ └─► Accumulate statistics
│
├─► SnapshotManager.UpdateSnapshotStatsExtended()
│
├─► SnapshotManager.CompleteSnapshot()
│
└─► SnapshotManager.ExportSnapshotMetadata()
│
├─► Copy database to temp file
├─► Clean to only current snapshot data
├─► Dump to SQL
├─► Compress with zstd
├─► Encrypt with age
├─► Upload db.zst.age to S3
└─► Upload manifest.json.zst to S3
Deduplication Strategy
-
File-level: Files unchanged since last backup are skipped (metadata comparison: size, mtime, mode, uid, gid)
-
Chunk-level: Chunks are content-addressed by SHA256 hash. If a chunk hash already exists in the database, the chunk data is not re-uploaded.
-
Blob-level: Blobs contain only unique chunks. Duplicate chunks within a blob are skipped.
Storage Layout in S3
bucket/
├── blobs/
│ └── {hash[0:2]}/
│ └── {hash[2:4]}/
│ └── {full-hash} # Compressed+encrypted blob
│
└── metadata/
└── {snapshot-id}/
├── db.zst.age # Encrypted database dump
└── manifest.json.zst # Blob list (for verification)
Thread Safety
Packer: Thread-safe via mutex. Multiple goroutines can callAddChunk().Scanner: UsespackerMumutex to coordinate blob finalization.Database: Single-writer mode (MaxOpenConns=1) ensures SQLite thread safety.Repositories.WithTx(): Handles transaction lifecycle automatically.