Major refactoring: UUID-based storage, streaming architecture, and CLI improvements

This commit represents a significant architectural overhaul of vaultik:

Database Schema Changes:
- Switch files table to use UUID primary keys instead of path-based keys
- Add UUID primary keys to blobs table for immediate chunk association
- Update all foreign key relationships to use UUIDs
- Add comprehensive schema documentation in DATAMODEL.md
- Add SQLite busy timeout handling for concurrent operations

Streaming and Performance Improvements:
- Implement true streaming blob packing without intermediate storage
- Add streaming chunk processing to reduce memory usage
- Improve progress reporting with real-time metrics
- Add upload metrics tracking in new uploads table

CLI Refactoring:
- Restructure CLI to use subcommands: snapshot create/list/purge/verify
- Add store info command for S3 configuration display
- Add custom duration parser supporting days/weeks/months/years
- Remove old backup.go in favor of enhanced snapshot.go
- Add --cron flag for silent operation

Configuration Changes:
- Remove unused index_prefix configuration option
- Add support for snapshot pruning retention policies
- Improve configuration validation and error messages

Testing Improvements:
- Add comprehensive repository tests with edge cases
- Add cascade delete debugging tests
- Fix concurrent operation tests to use SQLite busy timeout
- Remove tolerance for SQLITE_BUSY errors in tests

Documentation:
- Add MIT LICENSE file
- Update README with new command structure
- Add comprehensive DATAMODEL.md explaining database schema
- Update DESIGN.md with UUID-based architecture

Other Changes:
- Add test-config.yml for testing
- Update Makefile with better test output formatting
- Fix various race conditions in concurrent operations
- Improve error handling throughout
This commit is contained in:
2025-07-22 14:54:37 +02:00
parent 86b533d6ee
commit 78af626759
54 changed files with 5525 additions and 1109 deletions

View File

@@ -31,4 +31,8 @@ Read the rules in AGENTS.md and follow them.
deleting the local state file and doing a full backup to re-create it.
* When testing on a 2.5Gbit/s ethernet to an s3 server backed by 2000MB/sec SSD,
estimate about 4 seconds per gigabyte of backup time.
estimate about 4 seconds per gigabyte of backup time.
* When running tests, don't run individual tests, or grep the output. run the entire test suite every time and read the full output.
* When running tests, don't run individual tests, or try to grep the output. never run "go test". only ever run "make test" to run the full test suite, and examine the full output.

246
DATAMODEL.md Normal file
View File

@@ -0,0 +1,246 @@
# Vaultik Data Model
## Overview
Vaultik uses a local SQLite database to track file metadata, chunk mappings, and blob associations during the backup process. This database serves as an index for incremental backups and enables efficient deduplication.
**Important Notes:**
- **No Migration Support**: Vaultik does not support database schema migrations. If the schema changes, the local database must be deleted and recreated by performing a full backup.
- **Version Compatibility**: In rare cases, you may need to use the same version of Vaultik to restore a backup as was used to create it. This ensures compatibility with the metadata format stored in S3.
## Database Tables
### 1. `files`
Stores metadata about files in the filesystem being backed up.
**Columns:**
- `id` (TEXT PRIMARY KEY) - UUID for the file record
- `path` (TEXT UNIQUE) - Absolute file path
- `mtime` (INTEGER) - Modification time as Unix timestamp
- `ctime` (INTEGER) - Change time as Unix timestamp
- `size` (INTEGER) - File size in bytes
- `mode` (INTEGER) - Unix file permissions and type
- `uid` (INTEGER) - User ID of file owner
- `gid` (INTEGER) - Group ID of file owner
- `link_target` (TEXT) - Symlink target path (empty for regular files)
**Purpose:** Tracks file metadata to detect changes between backup runs. Used for incremental backup decisions. The UUID primary key provides stable references that don't change if files are moved.
### 2. `chunks`
Stores information about content-defined chunks created from files.
**Columns:**
- `chunk_hash` (TEXT PRIMARY KEY) - SHA256 hash of chunk content
- `sha256` (TEXT) - SHA256 hash (currently same as chunk_hash)
- `size` (INTEGER) - Chunk size in bytes
**Purpose:** Enables deduplication by tracking unique chunks across all files.
### 3. `file_chunks`
Maps files to their constituent chunks in order.
**Columns:**
- `file_id` (TEXT) - File ID (FK to files.id)
- `idx` (INTEGER) - Chunk index within file (0-based)
- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
- PRIMARY KEY (`file_id`, `idx`)
**Purpose:** Allows reconstruction of files from chunks during restore.
### 4. `chunk_files`
Reverse mapping showing which files contain each chunk.
**Columns:**
- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
- `file_id` (TEXT) - File ID (FK to files.id)
- `file_offset` (INTEGER) - Byte offset of chunk within file
- `length` (INTEGER) - Length of chunk in bytes
- PRIMARY KEY (`chunk_hash`, `file_id`)
**Purpose:** Supports efficient queries for chunk usage and deduplication statistics.
### 5. `blobs`
Stores information about packed, compressed, and encrypted blob files.
**Columns:**
- `id` (TEXT PRIMARY KEY) - UUID assigned when blob creation starts
- `hash` (TEXT) - SHA256 hash of final blob (empty until finalized)
- `created_ts` (INTEGER) - Creation timestamp
- `finished_ts` (INTEGER) - Finalization timestamp (NULL if in progress)
- `uncompressed_size` (INTEGER) - Total size of chunks before compression
- `compressed_size` (INTEGER) - Size after compression and encryption
- `uploaded_ts` (INTEGER) - Upload completion timestamp (NULL if not uploaded)
**Purpose:** Tracks blob lifecycle from creation through upload. The UUID primary key allows immediate association of chunks with blobs.
### 6. `blob_chunks`
Maps chunks to the blobs that contain them.
**Columns:**
- `blob_id` (TEXT) - Blob ID (FK to blobs.id)
- `chunk_hash` (TEXT) - Chunk hash (FK to chunks.chunk_hash)
- `offset` (INTEGER) - Byte offset of chunk within blob (before compression)
- `length` (INTEGER) - Length of chunk in bytes
- PRIMARY KEY (`blob_id`, `chunk_hash`)
**Purpose:** Enables chunk retrieval from blobs during restore operations.
### 7. `snapshots`
Tracks backup snapshots.
**Columns:**
- `id` (TEXT PRIMARY KEY) - Snapshot ID (format: hostname-YYYYMMDD-HHMMSSZ)
- `hostname` (TEXT) - Hostname where backup was created
- `vaultik_version` (TEXT) - Version of Vaultik used
- `vaultik_git_revision` (TEXT) - Git revision of Vaultik used
- `started_at` (INTEGER) - Start timestamp
- `completed_at` (INTEGER) - Completion timestamp (NULL if in progress)
- `file_count` (INTEGER) - Number of files in snapshot
- `chunk_count` (INTEGER) - Number of unique chunks
- `blob_count` (INTEGER) - Number of blobs referenced
- `total_size` (INTEGER) - Total size of all files
- `blob_size` (INTEGER) - Total size of all blobs (compressed)
- `blob_uncompressed_size` (INTEGER) - Total uncompressed size of all referenced blobs
- `compression_ratio` (REAL) - Compression ratio achieved
- `compression_level` (INTEGER) - Compression level used for this snapshot
- `upload_bytes` (INTEGER) - Total bytes uploaded during this snapshot
- `upload_duration_ms` (INTEGER) - Total milliseconds spent uploading to S3
**Purpose:** Provides snapshot metadata and statistics including version tracking for compatibility.
### 8. `snapshot_files`
Maps snapshots to the files they contain.
**Columns:**
- `snapshot_id` (TEXT) - Snapshot ID (FK to snapshots.id)
- `file_id` (TEXT) - File ID (FK to files.id)
- PRIMARY KEY (`snapshot_id`, `file_id`)
**Purpose:** Records which files are included in each snapshot.
### 9. `snapshot_blobs`
Maps snapshots to the blobs they reference.
**Columns:**
- `snapshot_id` (TEXT) - Snapshot ID (FK to snapshots.id)
- `blob_id` (TEXT) - Blob ID (FK to blobs.id)
- `blob_hash` (TEXT) - Denormalized blob hash for manifest generation
- PRIMARY KEY (`snapshot_id`, `blob_id`)
**Purpose:** Tracks blob dependencies for snapshots and enables manifest generation.
### 10. `uploads`
Tracks blob upload metrics.
**Columns:**
- `blob_hash` (TEXT PRIMARY KEY) - Hash of uploaded blob
- `uploaded_at` (INTEGER) - Upload timestamp
- `size` (INTEGER) - Size of uploaded blob
- `duration_ms` (INTEGER) - Upload duration in milliseconds
**Purpose:** Performance monitoring and upload tracking.
## Data Flow and Operations
### 1. Backup Process
1. **File Scanning**
- `INSERT OR REPLACE INTO files` - Update file metadata
- `SELECT * FROM files WHERE path = ?` - Check if file has changed
- `INSERT INTO snapshot_files` - Add file to current snapshot
2. **Chunking** (for changed files)
- `INSERT OR IGNORE INTO chunks` - Store new chunks
- `INSERT INTO file_chunks` - Map chunks to file
- `INSERT INTO chunk_files` - Create reverse mapping
3. **Blob Packing**
- `INSERT INTO blobs` - Create blob record with UUID (hash empty)
- `INSERT INTO blob_chunks` - Associate chunks with blob immediately
- `UPDATE blobs SET hash = ?, finished_ts = ?` - Finalize blob after packing
4. **Upload**
- `UPDATE blobs SET uploaded_ts = ?` - Mark blob as uploaded
- `INSERT INTO uploads` - Record upload metrics
- `INSERT INTO snapshot_blobs` - Associate blob with snapshot
5. **Snapshot Completion**
- `UPDATE snapshots SET completed_at = ?, stats...` - Finalize snapshot
- Generate and upload blob manifest from `snapshot_blobs`
### 2. Incremental Backup
1. **Change Detection**
- `SELECT * FROM files WHERE path = ?` - Get previous file metadata
- Compare mtime, size, mode to detect changes
- Skip unchanged files but still add to `snapshot_files`
2. **Chunk Reuse**
- `SELECT * FROM blob_chunks WHERE chunk_hash = ?` - Find existing chunks
- `INSERT INTO snapshot_blobs` - Reference existing blobs for unchanged files
### 3. Restore Process
The restore process doesn't use the local database. Instead:
1. Downloads snapshot metadata from S3
2. Downloads required blobs based on manifest
3. Reconstructs files from decrypted and decompressed chunks
### 4. Pruning
1. **Identify Unreferenced Blobs**
- Query blobs not referenced by any remaining snapshot
- Delete from S3 and local database
## Repository Pattern
Vaultik uses a repository pattern for database access:
- `FileRepository` - CRUD operations for files
- `ChunkRepository` - CRUD operations for chunks
- `FileChunkRepository` - Manage file-chunk mappings
- `BlobRepository` - Manage blob lifecycle
- `BlobChunkRepository` - Manage blob-chunk associations
- `SnapshotRepository` - Manage snapshots
- `UploadRepository` - Track upload metrics
Each repository provides methods like:
- `Create()` - Insert new record
- `GetByID()` / `GetByPath()` / `GetByHash()` - Retrieve records
- `Update()` - Update existing records
- `Delete()` - Remove records
- Specialized queries for each entity type
## Transaction Management
All database operations that modify multiple tables are wrapped in transactions:
```go
err := repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
// Multiple repository operations using tx
})
```
This ensures consistency, especially important for operations like:
- Creating file-chunk mappings
- Associating chunks with blobs
- Updating snapshot statistics
## Performance Considerations
1. **Indexes**: Primary keys are automatically indexed. Additional indexes may be needed for:
- `blobs.hash` for lookup performance
- `blob_chunks.chunk_hash` for chunk location queries
2. **Prepared Statements**: All queries use prepared statements for performance and security
3. **Batch Operations**: Where possible, operations are batched within transactions
4. **Write-Ahead Logging**: SQLite WAL mode is enabled for better concurrency
## Data Integrity
1. **Foreign Keys**: Enforced at the application level through repository methods
2. **Unique Constraints**: Chunk hashes and file paths are unique
3. **Null Handling**: Nullable fields clearly indicate in-progress operations
4. **Timestamp Tracking**: All major operations record timestamps for auditing

View File

@@ -125,7 +125,8 @@ This allows pruning operations to determine which blobs are referenced without r
```sql
CREATE TABLE files (
path TEXT PRIMARY KEY,
id TEXT PRIMARY KEY, -- UUID
path TEXT NOT NULL UNIQUE,
mtime INTEGER NOT NULL,
size INTEGER NOT NULL
);
@@ -133,10 +134,10 @@ CREATE TABLE files (
-- Maps files to their constituent chunks in sequence order
-- Used for reconstructing files from chunks during restore
CREATE TABLE file_chunks (
path TEXT NOT NULL,
file_id TEXT NOT NULL,
idx INTEGER NOT NULL,
chunk_hash TEXT NOT NULL,
PRIMARY KEY (path, idx)
PRIMARY KEY (file_id, idx)
);
CREATE TABLE chunks (
@@ -163,16 +164,17 @@ CREATE TABLE blob_chunks (
-- Used for deduplication and tracking chunk usage across files
CREATE TABLE chunk_files (
chunk_hash TEXT NOT NULL,
file_path TEXT NOT NULL,
file_id TEXT NOT NULL,
file_offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (chunk_hash, file_path)
PRIMARY KEY (chunk_hash, file_id)
);
CREATE TABLE snapshots (
id TEXT PRIMARY KEY,
hostname TEXT NOT NULL,
vaultik_version TEXT NOT NULL,
vaultik_git_revision TEXT NOT NULL,
created_ts INTEGER NOT NULL,
file_count INTEGER NOT NULL,
chunk_count INTEGER NOT NULL,

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Jeffrey Paul sneak@sneak.berlin
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -1,19 +1,27 @@
.PHONY: test fmt lint build clean all
# Version number
VERSION := 0.0.1
# Build variables
VERSION := $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
COMMIT := $(shell git rev-parse HEAD 2>/dev/null || echo "unknown")
GIT_REVISION := $(shell git rev-parse HEAD 2>/dev/null || echo "unknown")
# Linker flags
LDFLAGS := -X 'git.eeqj.de/sneak/vaultik/internal/globals.Version=$(VERSION)' \
-X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(COMMIT)'
-X 'git.eeqj.de/sneak/vaultik/internal/globals.Commit=$(GIT_REVISION)'
# Default target
all: test
# Run tests
test: lint fmt-check
go test -v ./...
@echo "Running tests..."
@if ! go test -v -timeout 10s ./... 2>&1; then \
echo ""; \
echo "TEST FAILURES DETECTED"; \
echo "Run 'go test -v ./internal/database' to see database test details"; \
exit 1; \
fi
# Check if code is formatted
fmt-check:

115
README.md
View File

@@ -5,7 +5,21 @@ encrypts data using an `age` public key and uploads each encrypted blob
directly to a remote S3-compatible object store. It requires no private
keys, secrets, or credentials stored on the backed-up system.
---
It includes table-stakes features such as:
* modern authenticated encryption
* deduplication
* incremental backups
* modern multithreaded zstd compression with configurable levels
* content-addressed immutable storage
* local state tracking in standard SQLite database
* inotify-based change detection
* streaming processing of all data to not require lots of ram or temp file
storage
* no mutable remote metadata
* no plaintext file paths or metadata stored in remote
* does not create huge numbers of small files (to keep S3 operation counts
down) even if the source system has many small files
## what
@@ -15,27 +29,29 @@ Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
encrypted with `age`, and uploaded directly to remote storage under a
content-addressed S3 path.
No plaintext file contents ever hit disk. No private key is needed or stored
locally. All encrypted data is streaming-processed and immediately discarded
once uploaded. Metadata is encrypted and pushed with the same mechanism.
No plaintext file contents ever hit disk. No private key or secret
passphrase is needed or stored locally. All encrypted data is
streaming-processed and immediately discarded once uploaded. Metadata is
encrypted and pushed with the same mechanism.
## why
Existing backup software fails under one or more of these conditions:
* Requires secrets (passwords, private keys) on the source system
* Requires secrets (passwords, private keys) on the source system, which
compromises encrypted backups in the case of host system compromise
* Depends on symmetric encryption unsuitable for zero-trust environments
* Stages temporary archives or repositories
* Writes plaintext metadata or plaintext file paths
* Creates one-blob-per-file, which results in excessive S3 operation counts
`vaultik` addresses all of these by using:
`vaultik` addresses these by using:
* Public-key-only encryption (via `age`) requires no secrets (other than
bucket access key) on the source system
* Blob-level deduplication and batching
* Local state cache for incremental detection
* S3-native chunked upload interface
* Self-contained encrypted snapshot metadata
remote storage api key) on the source system
* Local state cache for incremental detection does not require reading from
or decrypting remote storage
* Content-addressed immutable storage allows efficient deduplication
* Storage only of large encrypted blobs of configurable size (1G by default)
reduces S3 operation counts and improves performance
## how
@@ -63,6 +79,7 @@ Existing backup software fails under one or more of these conditions:
- '*.tmp'
age_recipient: age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
s3:
# endpoint is optional if using AWS S3, but who even does that?
endpoint: https://s3.example.com
bucket: vaultik-data
prefix: host1/
@@ -73,24 +90,30 @@ Existing backup software fails under one or more of these conditions:
full_scan_interval: 24h # normally we use inotify to mark dirty, but
# every 24h we do a full stat() scan
min_time_between_run: 15m # again, only for daemon mode
index_path: /var/lib/vaultik/index.sqlite
#index_path: /var/lib/vaultik/index.sqlite
chunk_size: 10MB
blob_size_limit: 10GB
index_prefix: index/
```
4. **run**
```sh
vaultik backup /etc/vaultik.yaml
vaultik --config /etc/vaultik.yaml snapshot create
```
```sh
vaultik backup /etc/vaultik.yaml --cron # silent unless error
vaultik --config /etc/vaultik.yaml snapshot create --cron # silent unless error
```
```sh
vaultik backup /etc/vaultik.yaml --daemon # runs in background, uses inotify
vaultik --config /etc/vaultik.yaml snapshot daemon # runs continuously in foreground, uses inotify to detect changes
# TODO
* make sure daemon mode does not make a snapshot if no files have
changed, even if the backup_interval has passed
* in daemon mode, if we are long enough since the last snapshot event, and we get
an inotify event, we should schedule the next snapshot creation for 10 minutes from the
time of the mark-dirty event.
```
---
@@ -100,26 +123,48 @@ Existing backup software fails under one or more of these conditions:
### commands
```sh
vaultik backup [--config <path>] [--cron] [--daemon]
vaultik [--config <path>] snapshot create [--cron] [--daemon]
vaultik [--config <path>] snapshot list [--json]
vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--force]
vaultik [--config <path>] snapshot verify <snapshot-id> [--deep]
vaultik [--config <path>] store info
# FIXME: remove 'bucket' and 'prefix' and 'snapshot' flags. it should be
# 'vaultik restore snapshot <snapshot> --target <dir>'. bucket and prefix are always
# from config file.
vaultik restore --bucket <bucket> --prefix <prefix> --snapshot <id> --target <dir>
# FIXME: remove prune, it's the old version of "snapshot purge"
vaultik prune --bucket <bucket> --prefix <prefix> [--dry-run]
# FIXME: change fetch to 'vaultik restore path <snapshot> <path> --target <path>'
vaultik fetch --bucket <bucket> --prefix <prefix> --snapshot <id> --file <path> --target <path>
# FIXME: remove this, it's redundant with 'snapshot verify'
vaultik verify --bucket <bucket> --prefix <prefix> [--snapshot <id>] [--quick]
```
### environment
* `VAULTIK_PRIVATE_KEY`: Required for `restore`, `prune`, `fetch`, and `verify` commands. Contains the age private key for decryption.
* `VAULTIK_CONFIG`: Optional path to config file. If set, `vaultik backup` can be run without specifying the config file path.
* `VAULTIK_CONFIG`: Optional path to config file. If set, config file path doesn't need to be specified on the command line.
### command details
**backup**: Perform incremental backup of configured directories
**snapshot create**: Perform incremental backup of configured directories
* Config is located at `/etc/vaultik/config.yml` by default
* `--config`: Override config file path
* `--cron`: Silent unless error (for crontab)
* `--daemon`: Run continuously with inotify monitoring and periodic scans
**snapshot list**: List all snapshots with their timestamps and sizes
* `--json`: Output in JSON format
**snapshot purge**: Remove old snapshots based on criteria
* `--keep-latest`: Keep only the most recent snapshot
* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
* `--force`: Skip confirmation prompt
**snapshot verify**: Verify snapshot integrity
* `--deep`: Download and verify blob hashes (not just existence)
**store info**: Display S3 bucket configuration and storage statistics
**restore**: Restore entire snapshot to target directory
* Downloads and decrypts metadata
* Fetches only required blobs
@@ -245,41 +290,23 @@ This enables garbage collection from immutable storage.
---
## license
## LICENSE
WTFPL — see LICENSE.
[MIT](https://opensource.org/license/mit/)
---
## security considerations
* Source host compromise cannot decrypt backups
* No replay attacks possible (append-only)
* Each blob independently encrypted
* Metadata tampering detectable via hash verification
* S3 credentials only allow write access to backup prefix
## performance
* Streaming processing (no temp files)
* Parallel blob uploads
* Deduplication reduces storage and bandwidth
* Local index enables fast incremental detection
* Configurable compression levels
## requirements
* Go 1.24.4 or later
* S3-compatible object storage
* age command-line tool (for key generation)
* SQLite3
* Sufficient disk space for local index
* Sufficient disk space for local index (typically <1GB)
## author
Made with love and lots of expensive SOTA AI by [sneak](https://sneak.berlin) in Berlin in the summer of 2025.
Released as a free software gift to the world, no strings attached, under the [WTFPL](https://www.wtfpl.net/) license.
Released as a free software gift to the world, no strings attached.
Contact: [sneak@sneak.berlin](mailto:sneak@sneak.berlin)

View File

@@ -118,10 +118,6 @@ s3:
# Default: /var/lib/vaultik/index.sqlite
#index_path: /var/lib/vaultik/index.sqlite
# Prefix for index/metadata files in S3
# Default: index/
#index_prefix: index/
# Average chunk size for content-defined chunking
# Smaller chunks = better deduplication but more metadata
# Supports: 10MB, 5M, 1GB, 500KB, 64MiB, etc.

41
go.mod
View File

@@ -3,19 +3,30 @@ module git.eeqj.de/sneak/vaultik
go 1.24.4
require (
filippo.io/age v1.2.1
github.com/aws/aws-sdk-go-v2 v1.36.6
github.com/aws/aws-sdk-go-v2/config v1.29.18
github.com/aws/aws-sdk-go-v2/credentials v1.17.71
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.17.85
github.com/aws/aws-sdk-go-v2/service/s3 v1.84.1
github.com/aws/smithy-go v1.22.4
github.com/dustin/go-humanize v1.0.1
github.com/google/uuid v1.6.0
github.com/johannesboyne/gofakes3 v0.0.0-20250603205740-ed9094be7668
github.com/jotfs/fastcdc-go v0.2.0
github.com/klauspost/compress v1.18.0
github.com/spf13/afero v1.14.0
github.com/spf13/cobra v1.9.1
github.com/stretchr/testify v1.9.0
go.uber.org/fx v1.24.0
golang.org/x/term v0.33.0
gopkg.in/yaml.v3 v3.0.1
modernc.org/sqlite v1.38.0
)
require (
filippo.io/age v1.2.1 // indirect
github.com/aws/aws-sdk-go v1.44.256 // indirect
github.com/aws/aws-sdk-go-v2 v1.36.6 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.6.11 // indirect
github.com/aws/aws-sdk-go-v2/config v1.29.18 // indirect
github.com/aws/aws-sdk-go-v2/credentials v1.17.71 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.33 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.37 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.37 // indirect
@@ -25,42 +36,24 @@ require (
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.7.5 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.18 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.18.18 // indirect
github.com/aws/aws-sdk-go-v2/service/s3 v1.84.1 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.25.6 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.30.4 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.34.1 // indirect
github.com/aws/smithy-go v1.22.4 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/go-ini/ini v1.67.0 // indirect
github.com/goccy/go-json v0.10.5 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/johannesboyne/gofakes3 v0.0.0-20250603205740-ed9094be7668 // indirect
github.com/jotfs/fastcdc-go v0.2.0 // indirect
github.com/klauspost/compress v1.18.0 // indirect
github.com/klauspost/cpuid/v2 v2.2.10 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/minio/crc64nvme v1.0.1 // indirect
github.com/minio/md5-simd v1.1.2 // indirect
github.com/minio/minio-go/v7 v7.0.94 // indirect
github.com/ncruces/go-strftime v0.1.9 // indirect
github.com/philhofer/fwd v1.1.3-0.20240916144458-20a13a1f6b7c // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
github.com/rs/xid v1.6.0 // indirect
github.com/ryszard/goskiplist v0.0.0-20150312221310-2dfbae5fcf46 // indirect
github.com/spf13/afero v1.14.0 // indirect
github.com/spf13/pflag v1.0.6 // indirect
github.com/tinylib/msgp v1.3.0 // indirect
github.com/zeebo/blake3 v0.2.4 // indirect
go.shabbyrobe.org/gocovmerge v0.0.0-20230507111327-fa4f82cfbf4d // indirect
go.uber.org/dig v1.19.0 // indirect
go.uber.org/multierr v1.10.0 // indirect
go.uber.org/zap v1.26.0 // indirect
golang.org/x/crypto v0.38.0 // indirect
golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 // indirect
golang.org/x/net v0.40.0 // indirect
golang.org/x/sys v0.34.0 // indirect
golang.org/x/term v0.33.0 // indirect
golang.org/x/text v0.25.0 // indirect
golang.org/x/tools v0.33.0 // indirect
modernc.org/libc v1.65.10 // indirect

39
go.sum
View File

@@ -1,3 +1,5 @@
c2sp.org/CCTV/age v0.0.0-20240306222714-3ec4d716e805 h1:u2qwJeEvnypw+OCPUHmoZE3IqwfuN5kgDfo5MLzpNM0=
c2sp.org/CCTV/age v0.0.0-20240306222714-3ec4d716e805/go.mod h1:FomMrUJ2Lxt5jCLmZkG3FHa72zUprnhd3v/Z18Snm4w=
filippo.io/age v1.2.1 h1:X0TZjehAZylOIj4DubWYU1vWQxv9bJpo+Uu2/LGhi1o=
filippo.io/age v1.2.1/go.mod h1:JL9ew2lTN+Pyft4RiNGguFfOpewKwSHm5ayKD/A4004=
github.com/aws/aws-sdk-go v1.44.256 h1:O8VH+bJqgLDguqkH/xQBFz5o/YheeZqgcOYIgsTVWY4=
@@ -12,6 +14,8 @@ github.com/aws/aws-sdk-go-v2/credentials v1.17.71 h1:r2w4mQWnrTMJjOyIsZtGp3R3XGY
github.com/aws/aws-sdk-go-v2/credentials v1.17.71/go.mod h1:E7VF3acIup4GB5ckzbKFrCK0vTvEQxOxgdq4U3vcMCY=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.33 h1:D9ixiWSG4lyUBL2DDNK924Px9V/NBVpML90MHqyTADY=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.33/go.mod h1:caS/m4DI+cij2paz3rtProRBI4s/+TCiWoaWZuQ9010=
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.17.85 h1:AfpstoiaenxGSCUheWiicgZE5XXS5Fi4CcQ4PA/x+Qw=
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.17.85/go.mod h1:HxiF0Fd6WHWjdjOffLkCauq7JqzWqMMq0iUVLS7cPQc=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.37 h1:osMWfm/sC/L4tvEdQ65Gri5ZZDCUpuYJZbTTDrsn4I0=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.37/go.mod h1:ZV2/1fbjOPr4G4v38G3Ww5TBT4+hmsK45s/rxu1fGy0=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.37 h1:v+X21AvTb2wZ+ycg1gx+orkB/9U6L7AOp93R7qYxsxM=
@@ -38,6 +42,7 @@ github.com/aws/aws-sdk-go-v2/service/sts v1.34.1 h1:aUrLQwJfZtwv3/ZNG2xRtEen+NqI
github.com/aws/aws-sdk-go-v2/service/sts v1.34.1/go.mod h1:3wFBZKoWnX3r+Sm7in79i54fBmNfwhdNdQuscCw7QIk=
github.com/aws/smithy-go v1.22.4 h1:uqXzVZNuNexwc/xrh6Tb56u89WDlJY6HS+KC0S4QSjw=
github.com/aws/smithy-go v1.22.4/go.mod h1:t1ufH5HMublsJYulve2RKmHDC15xu1f26kHCp/HgceI=
github.com/cevatbarisyilmaz/ara v0.0.4 h1:SGH10hXpBJhhTlObuZzTuFn1rrdmjQImITXnZVPSodc=
github.com/cevatbarisyilmaz/ara v0.0.4/go.mod h1:BfFOxnUd6Mj6xmcvRxHN3Sr21Z1T3U2MYkYOmoQe4Ts=
github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
@@ -45,16 +50,13 @@ github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/go-ini/ini v1.67.0 h1:z6ZrTEZqSWOTyH2FlglNbNgARyHG8oLW9gMELqKr06A=
github.com/go-ini/ini v1.67.0/go.mod h1:ByCAeIL28uOIIG0E3PJtZPDL8WnHpFKFOtgjp+3Ies8=
github.com/goccy/go-json v0.10.5 h1:Fq85nIqj+gXn/S5ahsiTlK3TmC85qgirsdTP/+DeaC4=
github.com/goccy/go-json v0.10.5/go.mod h1:oq7eo15ShAhp70Anwd5lgX2pLfOS3QCiwU/PULtXL6M=
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e h1:ijClszYn+mADRFY17kjQEVQ1XRhq2/JR1M3sGqeJoxs=
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
github.com/jmespath/go-jmespath v0.4.0 h1:BEgLn5cpjn8UN1mAw4NjwDrS35OdebyEtFe+9YPoQUg=
github.com/jmespath/go-jmespath v0.4.0/go.mod h1:T8mJZnbsbmF+m6zOOFylbeCJqk5+pHWvzYPziyZiYoo=
github.com/jmespath/go-jmespath/internal/testify v1.5.1/go.mod h1:L3OGu8Wl2/fWfCI6z80xFu9LTZmf1ZRjMHUOPmWr69U=
github.com/johannesboyne/gofakes3 v0.0.0-20250603205740-ed9094be7668 h1:+Mn8Sj5VzjOTuzyBCxfUnEcS+Iky4/5piUraOC3E5qQ=
@@ -63,28 +65,15 @@ github.com/jotfs/fastcdc-go v0.2.0 h1:WHYIGk3k9NumGWfp4YMsemEcx/s4JKpGAa6tpCpHJO
github.com/jotfs/fastcdc-go v0.2.0/go.mod h1:PGFBIloiASFbiKnkCd/hmHXxngxYDYtisyurJ/zyDNM=
github.com/klauspost/compress v1.18.0 h1:c/Cqfb0r+Yi+JtIEq73FWXVkRonBlf0CRNYc8Zttxdo=
github.com/klauspost/compress v1.18.0/go.mod h1:2Pp+KzxcywXVXMr50+X0Q/Lsb43OQHYWRCY2AiWywWQ=
github.com/klauspost/cpuid/v2 v2.0.1/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
github.com/klauspost/cpuid/v2 v2.2.10 h1:tBs3QSyvjDyFTq3uoc/9xFpCuOsJQFNPiAhYdw2skhE=
github.com/klauspost/cpuid/v2 v2.2.10/go.mod h1:hqwkgyIinND0mEev00jJYCxPNVRVXFQeu1XKlok6oO0=
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/minio/crc64nvme v1.0.1 h1:DHQPrYPdqK7jQG/Ls5CTBZWeex/2FMS3G5XGkycuFrY=
github.com/minio/crc64nvme v1.0.1/go.mod h1:eVfm2fAzLlxMdUGc0EEBGSMmPwmXD5XiNRpnu9J3bvg=
github.com/minio/md5-simd v1.1.2 h1:Gdi1DZK69+ZVMoNHRXJyNcxrMA4dSxoYHZSQbirFg34=
github.com/minio/md5-simd v1.1.2/go.mod h1:MzdKDxYpY2BT9XQFocsiZf/NKVtR7nkE4RoEpN+20RM=
github.com/minio/minio-go/v7 v7.0.94 h1:1ZoksIKPyaSt64AVOyaQvhDOgVC3MfZsWM6mZXRUGtM=
github.com/minio/minio-go/v7 v7.0.94/go.mod h1:71t2CqDt3ThzESgZUlU1rBN54mksGGlkLcFgguDnnAc=
github.com/ncruces/go-strftime v0.1.9 h1:bY0MQC28UADQmHmaF5dgpLmImcShSi2kHU9XLdhx/f4=
github.com/ncruces/go-strftime v0.1.9/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls=
github.com/philhofer/fwd v1.1.3-0.20240916144458-20a13a1f6b7c h1:dAMKvw0MlJT1GshSTtih8C2gDs04w8dReiOGXrGLNoY=
github.com/philhofer/fwd v1.1.3-0.20240916144458-20a13a1f6b7c/go.mod h1:RqIHx9QI14HlwKwm98g9Re5prTQ6LdeRQn+gXJFxsJM=
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94icq4NjY3clb7Lk8O1qJ8BdBEF8z0ibU0rE=
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo=
github.com/rs/xid v1.6.0 h1:fV591PaemRlL6JfRxGDEPl69wICngIQ3shQtzfy2gxU=
github.com/rs/xid v1.6.0/go.mod h1:7XoLgs4eV+QndskICGsho+ADou8ySMSjJKDIan90Nz0=
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/ryszard/goskiplist v0.0.0-20150312221310-2dfbae5fcf46 h1:GHRpF1pTW19a8tTFrMLUcfWwyC0pnifVo2ClaLq+hP8=
github.com/ryszard/goskiplist v0.0.0-20150312221310-2dfbae5fcf46/go.mod h1:uAQ5PCi+MFsC7HjREoAz1BU+Mq60+05gifQSsHSDG/8=
@@ -97,14 +86,9 @@ github.com/spf13/pflag v1.0.6 h1:jFzHGLGAlb3ruxLB8MhbI6A8+AQX/2eW4qeyNZXNp2o=
github.com/spf13/pflag v1.0.6/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.5.1/go.mod h1:5W2xD1RspED5o8YsWQXVCued0rvSQ+mT+I5cxcmMvtA=
github.com/stretchr/testify v1.8.1 h1:w7B6lhMri9wdJUVmEZPGGhZzrYTPvgJArz7wNPgYKsk=
github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
github.com/tinylib/msgp v1.3.0 h1:ULuf7GPooDaIlbyvgAxBV/FI7ynli6LZ1/nVUNu+0ww=
github.com/tinylib/msgp v1.3.0/go.mod h1:ykjzy2wzgrlvpDCRc4LA8UXy6D8bzMSuAF3WD57Gok0=
github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
github.com/zeebo/blake3 v0.2.4 h1:KYQPkhpRtcqh0ssGYcKLG1JYvddkEA8QwCM/yBqhaZI=
github.com/zeebo/blake3 v0.2.4/go.mod h1:7eeQ6d2iXWRGF6npfaxl2CU+xy2Fjo2gxeyZGCRUjcE=
go.etcd.io/bbolt v1.3.5/go.mod h1:G5EMThwa9y8QZGBClrRx5EY+Yw9kAhnjy3bSjsnlVTQ=
go.shabbyrobe.org/gocovmerge v0.0.0-20230507111327-fa4f82cfbf4d h1:Ns9kd1Rwzw7t0BR8XMphenji4SmIoNZPn8zhYmaVKP8=
go.shabbyrobe.org/gocovmerge v0.0.0-20230507111327-fa4f82cfbf4d/go.mod h1:92Uoe3l++MlthCm+koNi0tcUCX3anayogF0Pa/sp24k=
@@ -120,8 +104,6 @@ go.uber.org/zap v1.26.0 h1:sI7k6L95XOKS281NhVKOFCUNIvv9e0w4BF8N3u+tCRo=
go.uber.org/zap v1.26.0/go.mod h1:dtElttAiwGvoJ/vj4IwHBS/gXsEu/pZ50mUIRWuG0so=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.36.0 h1:AnAEvhDddvBdpY+uR+MyHmuZzzNqXSe/GvuDeob5L34=
golang.org/x/crypto v0.36.0/go.mod h1:Y4J0ReaxCR1IMaabaSMugxJES1EpwhBHhv2bDHklZvc=
golang.org/x/crypto v0.38.0 h1:jt+WWG8IZlBnVbomuhg2Mdq0+BBQaHbtqHEFEigjUV8=
golang.org/x/crypto v0.38.0/go.mod h1:MvrbAqul58NNYPKnOra203SB9vpuZW0e+RRZV+Ggqjw=
golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 h1:R84qjqJb5nVJMxqWYb3np9L5ZsaDtB+a39EqjV0JSUM=
@@ -137,9 +119,6 @@ golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug
golang.org/x/net v0.1.0/go.mod h1:Cx3nUiGt4eDBEyega/BKRp+/AlGL8hYe7U9odMt2Cco=
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
golang.org/x/net v0.9.0/go.mod h1:d48xBJpPfHeWQsugry2m+kC02ZBRGRgulfHnEXEuWns=
golang.org/x/net v0.38.0 h1:vRMAPTMaeGqVhG5QyLJHqNDwecKTomGeqbnfZyKlBI8=
golang.org/x/net v0.38.0/go.mod h1:ivrbrMbzFq5J41QOQh0siUuly180yBYtLp+CKbEaFx8=
golang.org/x/net v0.40.0/go.mod h1:y0hY0exeL2Pku80/zKK7tpntoX23cqL3Oa6njdgRtds=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
@@ -155,8 +134,6 @@ golang.org/x/sys v0.1.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.7.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.33.0 h1:q3i8TbbEz+JRD9ywIRlyRAQbM0qF7hu24q3teo2hbuw=
golang.org/x/sys v0.33.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
golang.org/x/sys v0.34.0 h1:H5Y5sJ2L2JRdyv7ROF1he/lPdvFsd0mJHFw2ThKHxLA=
golang.org/x/sys v0.34.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
@@ -172,8 +149,6 @@ golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
golang.org/x/text v0.4.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
golang.org/x/text v0.23.0 h1:D71I7dUrlY+VX0gQShAThNGHFxZ13dGLBHQLVl1mJlY=
golang.org/x/text v0.23.0/go.mod h1:/BLNzu4aZCJ1+kcD0DNRotWKage4q2rGVAg4o22unh4=
golang.org/x/text v0.25.0 h1:qVyWApTSYLk/drJRO5mDlNYskwQznZmkpV2c8q9zls4=
golang.org/x/text v0.25.0/go.mod h1:WEdwpYrmk1qmdHvhkSTNPm3app7v4rsT8F2UD6+VHIA=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=

View File

@@ -338,97 +338,103 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
return nil
}
// Process this file in a transaction
// Create file record in a short transaction
file := &database.File{
Path: path,
Size: info.Size(),
Mode: uint32(info.Mode()),
MTime: info.ModTime(),
CTime: info.ModTime(), // Use mtime as ctime for test
UID: 1000, // Default UID for test
GID: 1000, // Default GID for test
}
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
// Create file record
file := &database.File{
Path: path,
Size: info.Size(),
Mode: uint32(info.Mode()),
MTime: info.ModTime(),
CTime: info.ModTime(), // Use mtime as ctime for test
UID: 1000, // Default UID for test
GID: 1000, // Default GID for test
}
return b.repos.Files.Create(ctx, tx, file)
})
if err != nil {
return err
}
if err := b.repos.Files.Create(ctx, tx, file); err != nil {
fileCount++
totalSize += info.Size()
// Read and process file in chunks
f, err := fsys.Open(path)
if err != nil {
return err
}
defer func() {
if err := f.Close(); err != nil {
// Log but don't fail since we're already in an error path potentially
fmt.Fprintf(os.Stderr, "Failed to close file: %v\n", err)
}
}()
// Process file in chunks
chunkIndex := 0
buffer := make([]byte, defaultChunkSize)
for {
n, err := f.Read(buffer)
if err != nil && err != io.EOF {
return err
}
fileCount++
totalSize += info.Size()
// Read and process file in chunks
f, err := fsys.Open(path)
if err != nil {
return err
if n == 0 {
break
}
defer func() {
if err := f.Close(); err != nil {
// Log but don't fail since we're already in an error path potentially
fmt.Fprintf(os.Stderr, "Failed to close file: %v\n", err)
}
}()
// Process file in chunks
chunkIndex := 0
buffer := make([]byte, defaultChunkSize)
chunkData := buffer[:n]
chunkHash := calculateHash(chunkData)
for {
n, err := f.Read(buffer)
if err != nil && err != io.EOF {
return err
}
if n == 0 {
break
}
chunkData := buffer[:n]
chunkHash := calculateHash(chunkData)
// Check if chunk already exists
existingChunk, _ := b.repos.Chunks.GetByHash(ctx, chunkHash)
if existingChunk == nil {
// Create new chunk
// Check if chunk already exists (outside of transaction)
existingChunk, _ := b.repos.Chunks.GetByHash(ctx, chunkHash)
if existingChunk == nil {
// Create new chunk in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
chunk := &database.Chunk{
ChunkHash: chunkHash,
SHA256: chunkHash,
Size: int64(n),
}
if err := b.repos.Chunks.Create(ctx, tx, chunk); err != nil {
return err
}
processedChunks[chunkHash] = true
return b.repos.Chunks.Create(ctx, tx, chunk)
})
if err != nil {
return err
}
processedChunks[chunkHash] = true
}
// Create file-chunk mapping
// Create file-chunk mapping in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
fileChunk := &database.FileChunk{
Path: path,
FileID: file.ID,
Idx: chunkIndex,
ChunkHash: chunkHash,
}
if err := b.repos.FileChunks.Create(ctx, tx, fileChunk); err != nil {
return err
}
return b.repos.FileChunks.Create(ctx, tx, fileChunk)
})
if err != nil {
return err
}
// Create chunk-file mapping
// Create chunk-file mapping in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
chunkFile := &database.ChunkFile{
ChunkHash: chunkHash,
FilePath: path,
FileID: file.ID,
FileOffset: int64(chunkIndex * defaultChunkSize),
Length: int64(n),
}
if err := b.repos.ChunkFiles.Create(ctx, tx, chunkFile); err != nil {
return err
}
chunkIndex++
return b.repos.ChunkFiles.Create(ctx, tx, chunkFile)
})
if err != nil {
return err
}
return nil
})
chunkIndex++
}
return err
return nil
})
if err != nil {
@@ -436,61 +442,64 @@ func (b *BackupEngine) Backup(ctx context.Context, fsys fs.FS, root string) (str
}
// After all files are processed, create blobs for new chunks
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
for chunkHash := range processedChunks {
// Get chunk data
chunk, err := b.repos.Chunks.GetByHash(ctx, chunkHash)
if err != nil {
return err
}
for chunkHash := range processedChunks {
// Get chunk data (outside of transaction)
chunk, err := b.repos.Chunks.GetByHash(ctx, chunkHash)
if err != nil {
return "", err
}
chunkCount++
chunkCount++
// In a real system, blobs would contain multiple chunks and be encrypted
// For testing, we'll create a blob with a "blob-" prefix to differentiate
blobHash := "blob-" + chunkHash
// In a real system, blobs would contain multiple chunks and be encrypted
// For testing, we'll create a blob with a "blob-" prefix to differentiate
blobHash := "blob-" + chunkHash
// For the test, we'll create dummy data since we don't have the original
dummyData := []byte(chunkHash)
// For the test, we'll create dummy data since we don't have the original
dummyData := []byte(chunkHash)
// Upload to S3 as a blob
if err := b.s3Client.PutBlob(ctx, blobHash, dummyData); err != nil {
return err
}
// Upload to S3 as a blob
if err := b.s3Client.PutBlob(ctx, blobHash, dummyData); err != nil {
return "", err
}
// Create blob entry
// Create blob entry in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
blob := &database.Blob{
ID: "test-blob-" + blobHash[:8],
Hash: blobHash,
CreatedTS: time.Now(),
}
if err := b.repos.Blobs.Create(ctx, tx, blob); err != nil {
return err
}
blobCount++
blobSize += chunk.Size
return b.repos.Blobs.Create(ctx, tx, blob)
})
if err != nil {
return "", err
}
// Create blob-chunk mapping
blobCount++
blobSize += chunk.Size
// Create blob-chunk mapping in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
blobChunk := &database.BlobChunk{
BlobID: blob.ID,
BlobID: "test-blob-" + blobHash[:8],
ChunkHash: chunkHash,
Offset: 0,
Length: chunk.Size,
}
if err := b.repos.BlobChunks.Create(ctx, tx, blobChunk); err != nil {
return err
}
// Add blob to snapshot
if err := b.repos.Snapshots.AddBlob(ctx, tx, snapshotID, blob.ID, blob.Hash); err != nil {
return err
}
return b.repos.BlobChunks.Create(ctx, tx, blobChunk)
})
if err != nil {
return "", err
}
return nil
})
if err != nil {
return "", err
// Add blob to snapshot in a short transaction
err = b.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return b.repos.Snapshots.AddBlob(ctx, tx, snapshotID, "test-blob-"+blobHash[:8], blobHash)
})
if err != nil {
return "", err
}
}
// Update snapshot with final counts

View File

@@ -13,7 +13,9 @@ type ScannerParams struct {
EnableProgress bool
}
// Module exports backup functionality
// Module exports backup functionality as an fx module.
// It provides a ScannerFactory that can create Scanner instances
// with custom parameters while sharing common dependencies.
var Module = fx.Module("backup",
fx.Provide(
provideScannerFactory,

View File

@@ -15,9 +15,13 @@ import (
)
const (
// Progress reporting intervals
SummaryInterval = 10 * time.Second // One-line status updates
DetailInterval = 60 * time.Second // Multi-line detailed status
// SummaryInterval defines how often one-line status updates are printed.
// These updates show current progress, ETA, and the file being processed.
SummaryInterval = 10 * time.Second
// DetailInterval defines how often multi-line detailed status reports are printed.
// These reports include comprehensive statistics about files, chunks, blobs, and uploads.
DetailInterval = 60 * time.Second
)
// ProgressStats holds atomic counters for progress tracking
@@ -32,6 +36,7 @@ type ProgressStats struct {
BlobsCreated atomic.Int64
BlobsUploaded atomic.Int64
BytesUploaded atomic.Int64
UploadDurationMs atomic.Int64 // Total milliseconds spent uploading to S3
CurrentFile atomic.Value // stores string
TotalSize atomic.Int64 // Total size to process (set after scan phase)
TotalFiles atomic.Int64 // Total files to process in phase 2
@@ -66,8 +71,8 @@ type ProgressReporter struct {
// NewProgressReporter creates a new progress reporter
func NewProgressReporter() *ProgressReporter {
stats := &ProgressStats{
StartTime: time.Now(),
lastDetailTime: time.Now(),
StartTime: time.Now().UTC(),
lastDetailTime: time.Now().UTC(),
}
stats.CurrentFile.Store("")
@@ -115,7 +120,7 @@ func (pr *ProgressReporter) GetStats() *ProgressStats {
// SetTotalSize sets the total size to process (after scan phase)
func (pr *ProgressReporter) SetTotalSize(size int64) {
pr.stats.TotalSize.Store(size)
pr.stats.ProcessStartTime.Store(time.Now())
pr.stats.ProcessStartTime.Store(time.Now().UTC())
}
// run is the main progress reporting loop
@@ -186,7 +191,7 @@ func (pr *ProgressReporter) printSummaryStatus() {
filesProcessed := pr.stats.FilesProcessed.Load()
totalFiles := pr.stats.TotalFiles.Load()
status := fmt.Sprintf("Progress: %d/%d files, %s/%s (%.1f%%), %s/s%s",
status := fmt.Sprintf("Snapshot progress: %d/%d files, %s/%s (%.1f%%), %s/s%s",
filesProcessed,
totalFiles,
humanize.Bytes(uint64(bytesProcessed)),
@@ -206,7 +211,7 @@ func (pr *ProgressReporter) printSummaryStatus() {
// printDetailedStatus prints a multi-line detailed status
func (pr *ProgressReporter) printDetailedStatus() {
pr.stats.mu.Lock()
pr.stats.lastDetailTime = time.Now()
pr.stats.lastDetailTime = time.Now().UTC()
pr.stats.mu.Unlock()
elapsed := time.Since(pr.stats.StartTime)
@@ -225,7 +230,7 @@ func (pr *ProgressReporter) printDetailedStatus() {
totalBytes := bytesScanned + bytesSkipped
rate := float64(totalBytes) / elapsed.Seconds()
log.Notice("=== Backup Progress Report ===")
log.Notice("=== Snapshot Progress Report ===")
log.Info("Elapsed time", "duration", formatDuration(elapsed))
// Calculate and show ETA if we have data
@@ -264,7 +269,7 @@ func (pr *ProgressReporter) printDetailedStatus() {
"created", blobsCreated,
"uploaded", blobsUploaded,
"pending", blobsCreated-blobsUploaded)
log.Info("Upload progress",
log.Info("Total uploaded to S3",
"uploaded", humanize.Bytes(uint64(bytesUploaded)),
"compression_ratio", formatRatio(bytesUploaded, bytesScanned))
if currentFile != "" {
@@ -313,31 +318,8 @@ func truncatePath(path string, maxLen int) string {
// printUploadProgress prints upload progress
func (pr *ProgressReporter) printUploadProgress(info *UploadInfo) {
elapsed := time.Since(info.StartTime)
if elapsed < time.Millisecond {
elapsed = time.Millisecond // Avoid division by zero
}
bytesPerSec := float64(info.Size) / elapsed.Seconds()
bitsPerSec := bytesPerSec * 8
// Format speed in bits/second
var speedStr string
if bitsPerSec >= 1e9 {
speedStr = fmt.Sprintf("%.1fGbit/sec", bitsPerSec/1e9)
} else if bitsPerSec >= 1e6 {
speedStr = fmt.Sprintf("%.0fMbit/sec", bitsPerSec/1e6)
} else if bitsPerSec >= 1e3 {
speedStr = fmt.Sprintf("%.0fKbit/sec", bitsPerSec/1e3)
} else {
speedStr = fmt.Sprintf("%.0fbit/sec", bitsPerSec)
}
log.Info("Uploading blob",
"hash", info.BlobHash[:8]+"...",
"size", humanize.Bytes(uint64(info.Size)),
"elapsed", formatDuration(elapsed),
"speed", speedStr)
// This function is called repeatedly during upload, not just at start
// Don't print anything here - the actual progress is shown by ReportUploadProgress
}
// ReportUploadStart marks the beginning of a blob upload
@@ -345,7 +327,7 @@ func (pr *ProgressReporter) ReportUploadStart(blobHash string, size int64) {
info := &UploadInfo{
BlobHash: blobHash,
Size: size,
StartTime: time.Now(),
StartTime: time.Now().UTC(),
}
pr.stats.CurrentUpload.Store(info)
}
@@ -355,6 +337,9 @@ func (pr *ProgressReporter) ReportUploadComplete(blobHash string, size int64, du
// Clear current upload
pr.stats.CurrentUpload.Store((*UploadInfo)(nil))
// Add to total upload duration
pr.stats.UploadDurationMs.Add(duration.Milliseconds())
// Calculate speed
if duration < time.Millisecond {
duration = time.Millisecond
@@ -374,7 +359,7 @@ func (pr *ProgressReporter) ReportUploadComplete(blobHash string, size int64, du
speedStr = fmt.Sprintf("%.0fbit/sec", bitsPerSec)
}
log.Info("Blob uploaded",
log.Info("Blob upload completed",
"hash", blobHash[:8]+"...",
"size", humanize.Bytes(uint64(size)),
"duration", formatDuration(duration),
@@ -384,6 +369,44 @@ func (pr *ProgressReporter) ReportUploadComplete(blobHash string, size int64, du
// UpdateChunkingActivity updates the last chunking time
func (pr *ProgressReporter) UpdateChunkingActivity() {
pr.stats.mu.Lock()
pr.stats.lastChunkingTime = time.Now()
pr.stats.lastChunkingTime = time.Now().UTC()
pr.stats.mu.Unlock()
}
// ReportUploadProgress reports current upload progress with instantaneous speed
func (pr *ProgressReporter) ReportUploadProgress(blobHash string, bytesUploaded, totalSize int64, instantSpeed float64) {
// Update the current upload info with progress
if uploadInfo, ok := pr.stats.CurrentUpload.Load().(*UploadInfo); ok && uploadInfo != nil {
// Format speed in bits/second
bitsPerSec := instantSpeed * 8
var speedStr string
if bitsPerSec >= 1e9 {
speedStr = fmt.Sprintf("%.1fGbit/sec", bitsPerSec/1e9)
} else if bitsPerSec >= 1e6 {
speedStr = fmt.Sprintf("%.0fMbit/sec", bitsPerSec/1e6)
} else if bitsPerSec >= 1e3 {
speedStr = fmt.Sprintf("%.0fKbit/sec", bitsPerSec/1e3)
} else {
speedStr = fmt.Sprintf("%.0fbit/sec", bitsPerSec)
}
percent := float64(bytesUploaded) / float64(totalSize) * 100
// Calculate ETA based on current speed
etaStr := "unknown"
if instantSpeed > 0 && bytesUploaded < totalSize {
remainingBytes := totalSize - bytesUploaded
remainingSeconds := float64(remainingBytes) / instantSpeed
eta := time.Duration(remainingSeconds * float64(time.Second))
etaStr = formatDuration(eta)
}
log.Info("Blob upload progress",
"hash", blobHash[:8]+"...",
"progress", fmt.Sprintf("%.1f%%", percent),
"uploaded", humanize.Bytes(uint64(bytesUploaded)),
"total", humanize.Bytes(uint64(totalSize)),
"speed", speedStr,
"eta", etaStr)
}
}

View File

@@ -15,6 +15,7 @@ import (
"git.eeqj.de/sneak/vaultik/internal/crypto"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/s3"
"github.com/dustin/go-humanize"
"github.com/spf13/afero"
)
@@ -49,6 +50,8 @@ type Scanner struct {
// S3Client interface for blob storage operations
type S3Client interface {
PutObject(ctx context.Context, key string, data io.Reader) error
PutObjectWithProgress(ctx context.Context, key string, data io.Reader, size int64, progress s3.ProgressCallback) error
StatObject(ctx context.Context, key string) (*s3.ObjectInfo, error)
}
// ScannerConfig contains configuration for the scanner
@@ -125,7 +128,7 @@ func (s *Scanner) Scan(ctx context.Context, path string, snapshotID string) (*Sc
s.snapshotID = snapshotID
s.scanCtx = ctx
result := &ScanResult{
StartTime: time.Now(),
StartTime: time.Now().UTC(),
}
// Set blob handler for concurrent upload
@@ -143,7 +146,7 @@ func (s *Scanner) Scan(ctx context.Context, path string, snapshotID string) (*Sc
}
// Phase 1: Scan directory and collect files to process
log.Info("Phase 1: Scanning directory structure")
log.Info("Phase 1/3: Scanning directory structure")
filesToProcess, err := s.scanPhase(ctx, path, result)
if err != nil {
return nil, fmt.Errorf("scan phase failed: %w", err)
@@ -169,7 +172,7 @@ func (s *Scanner) Scan(ctx context.Context, path string, snapshotID string) (*Sc
// Phase 2: Process files and create chunks
if len(filesToProcess) > 0 {
log.Info("Phase 2: Processing files and creating chunks")
log.Info("Phase 2/3: Creating snapshot (chunking, compressing, encrypting, and uploading blobs)")
if err := s.processPhase(ctx, filesToProcess, result); err != nil {
return nil, fmt.Errorf("process phase failed: %w", err)
}
@@ -179,7 +182,7 @@ func (s *Scanner) Scan(ctx context.Context, path string, snapshotID string) (*Sc
blobs := s.packer.GetFinishedBlobs()
result.BlobsCreated += len(blobs)
result.EndTime = time.Now()
result.EndTime = time.Now().UTC()
return result, nil
}
@@ -290,21 +293,12 @@ func (s *Scanner) checkFileAndUpdateMetadata(ctx context.Context, path string, i
default:
}
var file *database.File
var needsProcessing bool
// Use a short transaction just for the database operations
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
var err error
file, needsProcessing, err = s.checkFile(txCtx, tx, path, info, result)
return err
})
return file, needsProcessing, err
// Process file without holding a long transaction
return s.checkFile(ctx, path, info, result)
}
// checkFile checks if a file needs processing and updates metadata within a transaction
func (s *Scanner) checkFile(ctx context.Context, tx *sql.Tx, path string, info os.FileInfo, result *ScanResult) (*database.File, bool, error) {
// checkFile checks if a file needs processing and updates metadata
func (s *Scanner) checkFile(ctx context.Context, path string, info os.FileInfo, result *ScanResult) (*database.File, bool, error) {
// Get file stats
stat, ok := info.Sys().(interface {
Uid() uint32
@@ -338,25 +332,31 @@ func (s *Scanner) checkFile(ctx context.Context, tx *sql.Tx, path string, info o
LinkTarget: linkTarget,
}
// Check if file has changed since last backup
// Check if file has changed since last backup (no transaction needed for read)
log.Debug("Checking if file exists in database", "path", path)
existingFile, err := s.repos.Files.GetByPathTx(ctx, tx, path)
existingFile, err := s.repos.Files.GetByPath(ctx, path)
if err != nil {
return nil, false, fmt.Errorf("checking existing file: %w", err)
}
fileChanged := existingFile == nil || s.hasFileChanged(existingFile, file)
// Always update file metadata
// Update file metadata in a short transaction
log.Debug("Updating file metadata", "path", path, "changed", fileChanged)
if err := s.repos.Files.Create(ctx, tx, file); err != nil {
err = s.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return s.repos.Files.Create(ctx, tx, file)
})
if err != nil {
return nil, false, err
}
log.Debug("File metadata updated", "path", path)
// Add file to snapshot
// Add file to snapshot in a short transaction
log.Debug("Adding file to snapshot", "path", path, "snapshot", s.snapshotID)
if err := s.repos.Snapshots.AddFile(ctx, tx, s.snapshotID, path); err != nil {
err = s.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return s.repos.Snapshots.AddFile(ctx, tx, s.snapshotID, path)
})
if err != nil {
return nil, false, fmt.Errorf("adding file to snapshot: %w", err)
}
log.Debug("File added to snapshot", "path", path)
@@ -381,7 +381,7 @@ func (s *Scanner) checkFile(ctx context.Context, tx *sql.Tx, path string, info o
}
// File hasn't changed, but we still need to associate existing chunks with this snapshot
log.Debug("File hasn't changed, associating existing chunks", "path", path)
if err := s.associateExistingChunks(ctx, tx, path); err != nil {
if err := s.associateExistingChunks(ctx, path); err != nil {
return nil, false, fmt.Errorf("associating existing chunks: %w", err)
}
log.Debug("Existing chunks associated", "path", path)
@@ -421,25 +421,25 @@ func (s *Scanner) hasFileChanged(existingFile, newFile *database.File) bool {
}
// associateExistingChunks links existing chunks from an unchanged file to the current snapshot
func (s *Scanner) associateExistingChunks(ctx context.Context, tx *sql.Tx, path string) error {
func (s *Scanner) associateExistingChunks(ctx context.Context, path string) error {
log.Debug("associateExistingChunks start", "path", path)
// Get existing file chunks
// Get existing file chunks (no transaction needed for read)
log.Debug("Getting existing file chunks", "path", path)
fileChunks, err := s.repos.FileChunks.GetByFileTx(ctx, tx, path)
fileChunks, err := s.repos.FileChunks.GetByFile(ctx, path)
if err != nil {
return fmt.Errorf("getting existing file chunks: %w", err)
}
log.Debug("Got file chunks", "path", path, "count", len(fileChunks))
// For each chunk, find its blob and associate with current snapshot
processedBlobs := make(map[string]bool)
// Collect unique blob IDs that need to be added to snapshot
blobsToAdd := make(map[string]string) // blob ID -> blob hash
for i, fc := range fileChunks {
log.Debug("Processing chunk", "path", path, "chunk_index", i, "chunk_hash", fc.ChunkHash)
// Find which blob contains this chunk
// Find which blob contains this chunk (no transaction needed for read)
log.Debug("Finding blob for chunk", "chunk_hash", fc.ChunkHash)
blobChunk, err := s.repos.BlobChunks.GetByChunkHashTx(ctx, tx, fc.ChunkHash)
blobChunk, err := s.repos.BlobChunks.GetByChunkHash(ctx, fc.ChunkHash)
if err != nil {
return fmt.Errorf("finding blob for chunk %s: %w", fc.ChunkHash, err)
}
@@ -449,28 +449,39 @@ func (s *Scanner) associateExistingChunks(ctx context.Context, tx *sql.Tx, path
}
log.Debug("Found blob for chunk", "chunk_hash", fc.ChunkHash, "blob_id", blobChunk.BlobID)
// Get blob to find its hash
blob, err := s.repos.Blobs.GetByID(ctx, blobChunk.BlobID)
if err != nil {
return fmt.Errorf("getting blob %s: %w", blobChunk.BlobID, err)
}
if blob == nil {
log.Warn("Blob record not found", "blob_id", blobChunk.BlobID)
continue
}
// Add blob to snapshot if not already processed
if !processedBlobs[blobChunk.BlobID] {
log.Debug("Adding blob to snapshot", "blob_id", blobChunk.BlobID, "blob_hash", blob.Hash, "snapshot", s.snapshotID)
if err := s.repos.Snapshots.AddBlob(ctx, tx, s.snapshotID, blobChunk.BlobID, blob.Hash); err != nil {
return fmt.Errorf("adding existing blob to snapshot: %w", err)
}
log.Debug("Added blob to snapshot", "blob_id", blobChunk.BlobID)
processedBlobs[blobChunk.BlobID] = true
// Track blob ID for later processing
if _, exists := blobsToAdd[blobChunk.BlobID]; !exists {
blobsToAdd[blobChunk.BlobID] = "" // We'll get the hash later
}
}
log.Debug("associateExistingChunks complete", "path", path, "blobs_processed", len(processedBlobs))
// Now get blob hashes outside of transaction operations
for blobID := range blobsToAdd {
blob, err := s.repos.Blobs.GetByID(ctx, blobID)
if err != nil {
return fmt.Errorf("getting blob %s: %w", blobID, err)
}
if blob == nil {
log.Warn("Blob record not found", "blob_id", blobID)
delete(blobsToAdd, blobID)
continue
}
blobsToAdd[blobID] = blob.Hash
}
// Add blobs to snapshot using short transactions
for blobID, blobHash := range blobsToAdd {
log.Debug("Adding blob to snapshot", "blob_id", blobID, "blob_hash", blobHash, "snapshot", s.snapshotID)
err := s.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
return s.repos.Snapshots.AddBlob(ctx, tx, s.snapshotID, blobID, blobHash)
})
if err != nil {
return fmt.Errorf("adding existing blob to snapshot: %w", err)
}
log.Debug("Added blob to snapshot", "blob_id", blobID)
}
log.Debug("associateExistingChunks complete", "path", path, "blobs_processed", len(blobsToAdd))
return nil
}
@@ -478,7 +489,7 @@ func (s *Scanner) associateExistingChunks(ctx context.Context, tx *sql.Tx, path
func (s *Scanner) handleBlobReady(blobWithReader *blob.BlobWithReader) error {
log.Debug("Blob handler called", "blob_hash", blobWithReader.Hash[:8]+"...")
startTime := time.Now()
startTime := time.Now().UTC()
finishedBlob := blobWithReader.FinishedBlob
// Report upload start
@@ -492,7 +503,40 @@ func (s *Scanner) handleBlobReady(blobWithReader *blob.BlobWithReader) error {
if ctx == nil {
ctx = context.Background()
}
if err := s.s3Client.PutObject(ctx, "blobs/"+finishedBlob.Hash, blobWithReader.Reader); err != nil {
// Track bytes uploaded for accurate speed calculation
lastProgressTime := time.Now()
lastProgressBytes := int64(0)
progressCallback := func(uploaded int64) error {
// Calculate instantaneous speed
now := time.Now()
elapsed := now.Sub(lastProgressTime).Seconds()
if elapsed > 0.5 { // Update speed every 0.5 seconds
bytesSinceLastUpdate := uploaded - lastProgressBytes
speed := float64(bytesSinceLastUpdate) / elapsed
if s.progress != nil {
s.progress.ReportUploadProgress(finishedBlob.Hash, uploaded, finishedBlob.Compressed, speed)
}
lastProgressTime = now
lastProgressBytes = uploaded
}
// Check for cancellation
select {
case <-ctx.Done():
return ctx.Err()
default:
return nil
}
}
// Create sharded path: blobs/ca/fe/cafebabe...
blobPath := fmt.Sprintf("blobs/%s/%s/%s", finishedBlob.Hash[:2], finishedBlob.Hash[2:4], finishedBlob.Hash)
if err := s.s3Client.PutObjectWithProgress(ctx, blobPath, blobWithReader.Reader, finishedBlob.Compressed, progressCallback); err != nil {
return fmt.Errorf("uploading blob %s to S3: %w", finishedBlob.Hash, err)
}
@@ -574,8 +618,8 @@ func (s *Scanner) processFileStreaming(ctx context.Context, fileToProcess *FileT
var chunks []chunkInfo
chunkIndex := 0
// Process chunks in streaming fashion
err = s.chunker.ChunkReaderStreaming(file, func(chunk chunker.Chunk) error {
// Process chunks in streaming fashion and get full file hash
fileHash, err := s.chunker.ChunkReaderStreaming(file, func(chunk chunker.Chunk) error {
// Check for cancellation
select {
case <-ctx.Done():
@@ -589,17 +633,16 @@ func (s *Scanner) processFileStreaming(ctx context.Context, fileToProcess *FileT
"hash", chunk.Hash,
"size", chunk.Size)
// Check if chunk already exists
chunkExists := false
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
existing, err := s.repos.Chunks.GetByHash(txCtx, chunk.Hash)
if err != nil {
return err
}
chunkExists = (existing != nil)
// Check if chunk already exists (outside of transaction)
existing, err := s.repos.Chunks.GetByHash(ctx, chunk.Hash)
if err != nil {
return fmt.Errorf("checking chunk existence: %w", err)
}
chunkExists := (existing != nil)
// Store chunk if new
if !chunkExists {
// Store chunk if new
if !chunkExists {
err := s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
dbChunk := &database.Chunk{
ChunkHash: chunk.Hash,
SHA256: chunk.Hash,
@@ -608,17 +651,17 @@ func (s *Scanner) processFileStreaming(ctx context.Context, fileToProcess *FileT
if err := s.repos.Chunks.Create(txCtx, tx, dbChunk); err != nil {
return fmt.Errorf("creating chunk: %w", err)
}
return nil
})
if err != nil {
return fmt.Errorf("storing chunk: %w", err)
}
return nil
})
if err != nil {
return fmt.Errorf("checking/storing chunk: %w", err)
}
// Track file chunk association for later storage
chunks = append(chunks, chunkInfo{
fileChunk: database.FileChunk{
Path: fileToProcess.Path,
FileID: fileToProcess.File.ID,
Idx: chunkIndex,
ChunkHash: chunk.Hash,
},
@@ -683,6 +726,11 @@ func (s *Scanner) processFileStreaming(ctx context.Context, fileToProcess *FileT
return fmt.Errorf("chunking file: %w", err)
}
log.Debug("Completed chunking file",
"path", fileToProcess.Path,
"file_hash", fileHash,
"chunks", len(chunks))
// Store file-chunk associations and chunk-file mappings in database
err = s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
for _, ci := range chunks {
@@ -694,7 +742,7 @@ func (s *Scanner) processFileStreaming(ctx context.Context, fileToProcess *FileT
// Create chunk-file mapping
chunkFile := &database.ChunkFile{
ChunkHash: ci.fileChunk.ChunkHash,
FilePath: fileToProcess.Path,
FileID: fileToProcess.File.ID,
FileOffset: ci.offset,
Length: ci.size,
}
@@ -704,7 +752,7 @@ func (s *Scanner) processFileStreaming(ctx context.Context, fileToProcess *FileT
}
// Add file to snapshot
if err := s.repos.Snapshots.AddFile(txCtx, tx, s.snapshotID, fileToProcess.Path); err != nil {
if err := s.repos.Snapshots.AddFileByID(txCtx, tx, s.snapshotID, fileToProcess.File.ID); err != nil {
return fmt.Errorf("adding file to snapshot: %w", err)
}
@@ -713,3 +761,8 @@ func (s *Scanner) processFileStreaming(ctx context.Context, fileToProcess *FileT
return err
}
// GetProgress returns the progress reporter for this scanner
func (s *Scanner) GetProgress() *ProgressReporter {
return s.progress
}

View File

@@ -213,7 +213,7 @@ func TestScannerWithSymlinks(t *testing.T) {
Repositories: repos,
MaxBlobSize: int64(1024 * 1024),
CompressionLevel: 3,
AgeRecipients: []string{},
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
})
// Create a snapshot record for testing
@@ -314,7 +314,7 @@ func TestScannerLargeFile(t *testing.T) {
Repositories: repos,
MaxBlobSize: int64(1024 * 1024),
CompressionLevel: 3,
AgeRecipients: []string{},
AgeRecipients: []string{"age1ezrjmfpwsc95svdg0y54mums3zevgzu0x0ecq2f7tp8a05gl0sjq9q9wjg"}, // Test public key
})
// Create a snapshot record for testing

View File

@@ -78,21 +78,22 @@ func NewSnapshotManager(repos *database.Repositories, s3Client S3Client, encrypt
}
// CreateSnapshot creates a new snapshot record in the database at the start of a backup
func (sm *SnapshotManager) CreateSnapshot(ctx context.Context, hostname, version string) (string, error) {
snapshotID := fmt.Sprintf("%s-%s", hostname, time.Now().Format("20060102-150405"))
func (sm *SnapshotManager) CreateSnapshot(ctx context.Context, hostname, version, gitRevision string) (string, error) {
snapshotID := fmt.Sprintf("%s-%s", hostname, time.Now().UTC().Format("20060102-150405Z"))
snapshot := &database.Snapshot{
ID: snapshotID,
Hostname: hostname,
VaultikVersion: version,
StartedAt: time.Now(),
CompletedAt: nil, // Not completed yet
FileCount: 0,
ChunkCount: 0,
BlobCount: 0,
TotalSize: 0,
BlobSize: 0,
CompressionRatio: 1.0,
ID: snapshotID,
Hostname: hostname,
VaultikVersion: version,
VaultikGitRevision: gitRevision,
StartedAt: time.Now().UTC(),
CompletedAt: nil, // Not completed yet
FileCount: 0,
ChunkCount: 0,
BlobCount: 0,
TotalSize: 0,
BlobSize: 0,
CompressionRatio: 1.0,
}
err := sm.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
@@ -126,6 +127,30 @@ func (sm *SnapshotManager) UpdateSnapshotStats(ctx context.Context, snapshotID s
return nil
}
// UpdateSnapshotStatsExtended updates snapshot statistics with extended metrics.
// This includes compression level, uncompressed blob size, and upload duration.
func (sm *SnapshotManager) UpdateSnapshotStatsExtended(ctx context.Context, snapshotID string, stats ExtendedBackupStats) error {
return sm.repos.WithTx(ctx, func(ctx context.Context, tx *sql.Tx) error {
// First update basic stats
if err := sm.repos.Snapshots.UpdateCounts(ctx, tx, snapshotID,
int64(stats.FilesScanned),
int64(stats.ChunksCreated),
int64(stats.BlobsCreated),
stats.BytesScanned,
stats.BytesUploaded,
); err != nil {
return err
}
// Then update extended stats
return sm.repos.Snapshots.UpdateExtendedStats(ctx, tx, snapshotID,
stats.BlobUncompressedSize,
stats.CompressionLevel,
stats.UploadDurationMs,
)
})
}
// CompleteSnapshot marks a snapshot as completed and exports its metadata
func (sm *SnapshotManager) CompleteSnapshot(ctx context.Context, snapshotID string) error {
// Mark the snapshot as completed
@@ -158,14 +183,16 @@ func (sm *SnapshotManager) CompleteSnapshot(ctx context.Context, snapshotID stri
//
// This ensures database consistency during the copy operation.
func (sm *SnapshotManager) ExportSnapshotMetadata(ctx context.Context, dbPath string, snapshotID string) error {
log.Info("Exporting snapshot metadata", "snapshot_id", snapshotID)
log.Info("Phase 3/3: Exporting snapshot metadata", "snapshot_id", snapshotID, "source_db", dbPath)
// Create temp directory for all temporary files
tempDir, err := os.MkdirTemp("", "vaultik-snapshot-*")
if err != nil {
return fmt.Errorf("creating temp dir: %w", err)
}
log.Debug("Created temporary directory", "path", tempDir)
defer func() {
log.Debug("Cleaning up temporary directory", "path", tempDir)
if err := os.RemoveAll(tempDir); err != nil {
log.Debug("Failed to remove temp dir", "path", tempDir, "error", err)
}
@@ -174,28 +201,37 @@ func (sm *SnapshotManager) ExportSnapshotMetadata(ctx context.Context, dbPath st
// Step 1: Copy database to temp file
// The main database should be closed at this point
tempDBPath := filepath.Join(tempDir, "snapshot.db")
log.Debug("Copying database to temporary location", "source", dbPath, "destination", tempDBPath)
if err := copyFile(dbPath, tempDBPath); err != nil {
return fmt.Errorf("copying database: %w", err)
}
log.Debug("Database copy complete", "size", getFileSize(tempDBPath))
// Step 2: Clean the temp database to only contain current snapshot data
log.Debug("Cleaning snapshot database to contain only current snapshot", "snapshot_id", snapshotID)
if err := sm.cleanSnapshotDB(ctx, tempDBPath, snapshotID); err != nil {
return fmt.Errorf("cleaning snapshot database: %w", err)
}
log.Debug("Database cleaning complete", "size_after_clean", getFileSize(tempDBPath))
// Step 3: Dump the cleaned database to SQL
dumpPath := filepath.Join(tempDir, "snapshot.sql")
log.Debug("Dumping database to SQL", "source", tempDBPath, "destination", dumpPath)
if err := sm.dumpDatabase(tempDBPath, dumpPath); err != nil {
return fmt.Errorf("dumping database: %w", err)
}
log.Debug("SQL dump complete", "size", getFileSize(dumpPath))
// Step 4: Compress the SQL dump
compressedPath := filepath.Join(tempDir, "snapshot.sql.zst")
log.Debug("Compressing SQL dump", "source", dumpPath, "destination", compressedPath)
if err := sm.compressDump(dumpPath, compressedPath); err != nil {
return fmt.Errorf("compressing dump: %w", err)
}
log.Debug("Compression complete", "original_size", getFileSize(dumpPath), "compressed_size", getFileSize(compressedPath))
// Step 5: Read compressed data for encryption/upload
log.Debug("Reading compressed data for upload", "path", compressedPath)
compressedData, err := os.ReadFile(compressedPath)
if err != nil {
return fmt.Errorf("reading compressed dump: %w", err)
@@ -204,14 +240,19 @@ func (sm *SnapshotManager) ExportSnapshotMetadata(ctx context.Context, dbPath st
// Step 6: Encrypt if encryptor is available
finalData := compressedData
if sm.encryptor != nil {
log.Debug("Encrypting snapshot data", "size_before", len(compressedData))
encrypted, err := sm.encryptor.Encrypt(compressedData)
if err != nil {
return fmt.Errorf("encrypting snapshot: %w", err)
}
finalData = encrypted
log.Debug("Encryption complete", "size_after", len(encrypted))
} else {
log.Debug("No encryption configured, using compressed data as-is")
}
// Step 7: Generate blob manifest (before closing temp DB)
log.Debug("Generating blob manifest from temporary database", "db_path", tempDBPath)
blobManifest, err := sm.generateBlobManifest(ctx, tempDBPath, snapshotID)
if err != nil {
return fmt.Errorf("generating blob manifest: %w", err)
@@ -224,15 +265,19 @@ func (sm *SnapshotManager) ExportSnapshotMetadata(ctx context.Context, dbPath st
dbKey += ".age"
}
log.Debug("Uploading snapshot database to S3", "key", dbKey, "size", len(finalData))
if err := sm.s3Client.PutObject(ctx, dbKey, bytes.NewReader(finalData)); err != nil {
return fmt.Errorf("uploading snapshot database: %w", err)
}
log.Debug("Database upload complete", "key", dbKey)
// Upload blob manifest (unencrypted, compressed)
manifestKey := fmt.Sprintf("metadata/%s/manifest.json.zst", snapshotID)
log.Debug("Uploading blob manifest to S3", "key", manifestKey, "size", len(blobManifest))
if err := sm.s3Client.PutObject(ctx, manifestKey, bytes.NewReader(blobManifest)); err != nil {
return fmt.Errorf("uploading blob manifest: %w", err)
}
log.Debug("Manifest upload complete", "key", manifestKey)
log.Info("Uploaded snapshot metadata",
"snapshot_id", snapshotID,
@@ -260,14 +305,18 @@ func (sm *SnapshotManager) ExportSnapshotMetadata(ctx context.Context, dbPath st
// Future implementation when we have snapshot_files table:
//
// DELETE FROM snapshots WHERE id != ?;
// DELETE FROM files WHERE path NOT IN (
// SELECT file_path FROM snapshot_files WHERE snapshot_id = ?
// DELETE FROM files WHERE NOT EXISTS (
// SELECT 1 FROM snapshot_files
// WHERE snapshot_files.file_id = files.id
// AND snapshot_files.snapshot_id = ?
// );
// DELETE FROM chunks WHERE chunk_hash NOT IN (
// SELECT DISTINCT chunk_hash FROM file_chunks
// DELETE FROM chunks WHERE NOT EXISTS (
// SELECT 1 FROM file_chunks
// WHERE file_chunks.chunk_hash = chunks.chunk_hash
// );
// DELETE FROM blobs WHERE blob_hash NOT IN (
// SELECT DISTINCT blob_hash FROM blob_chunks
// DELETE FROM blobs WHERE NOT EXISTS (
// SELECT 1 FROM blob_chunks
// WHERE blob_chunks.blob_hash = blobs.blob_hash
// );
func (sm *SnapshotManager) cleanSnapshotDB(ctx context.Context, dbPath string, snapshotID string) error {
// Open the temp database
@@ -293,84 +342,127 @@ func (sm *SnapshotManager) cleanSnapshotDB(ctx context.Context, dbPath string, s
}()
// Step 1: Delete all other snapshots
_, err = tx.ExecContext(ctx, "DELETE FROM snapshots WHERE id != ?", snapshotID)
log.Debug("Deleting other snapshots", "keeping", snapshotID)
database.LogSQL("Execute", "DELETE FROM snapshots WHERE id != ?", snapshotID)
result, err := tx.ExecContext(ctx, "DELETE FROM snapshots WHERE id != ?", snapshotID)
if err != nil {
return fmt.Errorf("deleting other snapshots: %w", err)
}
rowsAffected, _ := result.RowsAffected()
log.Debug("Deleted snapshots", "count", rowsAffected)
// Step 2: Delete files not in this snapshot
_, err = tx.ExecContext(ctx, `
log.Debug("Deleting files not in current snapshot")
database.LogSQL("Execute", `DELETE FROM files WHERE NOT EXISTS (SELECT 1 FROM snapshot_files WHERE snapshot_files.file_id = files.id AND snapshot_files.snapshot_id = ?)`, snapshotID)
result, err = tx.ExecContext(ctx, `
DELETE FROM files
WHERE path NOT IN (
SELECT file_path FROM snapshot_files WHERE snapshot_id = ?
WHERE NOT EXISTS (
SELECT 1 FROM snapshot_files
WHERE snapshot_files.file_id = files.id
AND snapshot_files.snapshot_id = ?
)`, snapshotID)
if err != nil {
return fmt.Errorf("deleting orphaned files: %w", err)
}
rowsAffected, _ = result.RowsAffected()
log.Debug("Deleted files", "count", rowsAffected)
// Step 3: file_chunks will be deleted via CASCADE from files
log.Debug("file_chunks will be deleted via CASCADE")
// Step 4: Delete chunk_files for deleted files
_, err = tx.ExecContext(ctx, `
log.Debug("Deleting orphaned chunk_files")
database.LogSQL("Execute", `DELETE FROM chunk_files WHERE NOT EXISTS (SELECT 1 FROM files WHERE files.id = chunk_files.file_id)`)
result, err = tx.ExecContext(ctx, `
DELETE FROM chunk_files
WHERE file_path NOT IN (
SELECT path FROM files
WHERE NOT EXISTS (
SELECT 1 FROM files
WHERE files.id = chunk_files.file_id
)`)
if err != nil {
return fmt.Errorf("deleting orphaned chunk_files: %w", err)
}
rowsAffected, _ = result.RowsAffected()
log.Debug("Deleted chunk_files", "count", rowsAffected)
// Step 5: Delete chunks with no remaining file references
_, err = tx.ExecContext(ctx, `
log.Debug("Deleting orphaned chunks")
database.LogSQL("Execute", `DELETE FROM chunks WHERE NOT EXISTS (SELECT 1 FROM file_chunks WHERE file_chunks.chunk_hash = chunks.chunk_hash)`)
result, err = tx.ExecContext(ctx, `
DELETE FROM chunks
WHERE chunk_hash NOT IN (
SELECT DISTINCT chunk_hash FROM file_chunks
WHERE NOT EXISTS (
SELECT 1 FROM file_chunks
WHERE file_chunks.chunk_hash = chunks.chunk_hash
)`)
if err != nil {
return fmt.Errorf("deleting orphaned chunks: %w", err)
}
rowsAffected, _ = result.RowsAffected()
log.Debug("Deleted chunks", "count", rowsAffected)
// Step 6: Delete blob_chunks for deleted chunks
_, err = tx.ExecContext(ctx, `
log.Debug("Deleting orphaned blob_chunks")
database.LogSQL("Execute", `DELETE FROM blob_chunks WHERE NOT EXISTS (SELECT 1 FROM chunks WHERE chunks.chunk_hash = blob_chunks.chunk_hash)`)
result, err = tx.ExecContext(ctx, `
DELETE FROM blob_chunks
WHERE chunk_hash NOT IN (
SELECT chunk_hash FROM chunks
WHERE NOT EXISTS (
SELECT 1 FROM chunks
WHERE chunks.chunk_hash = blob_chunks.chunk_hash
)`)
if err != nil {
return fmt.Errorf("deleting orphaned blob_chunks: %w", err)
}
rowsAffected, _ = result.RowsAffected()
log.Debug("Deleted blob_chunks", "count", rowsAffected)
// Step 7: Delete blobs not in this snapshot
_, err = tx.ExecContext(ctx, `
log.Debug("Deleting blobs not in current snapshot")
database.LogSQL("Execute", `DELETE FROM blobs WHERE NOT EXISTS (SELECT 1 FROM snapshot_blobs WHERE snapshot_blobs.blob_hash = blobs.blob_hash AND snapshot_blobs.snapshot_id = ?)`, snapshotID)
result, err = tx.ExecContext(ctx, `
DELETE FROM blobs
WHERE blob_hash NOT IN (
SELECT blob_hash FROM snapshot_blobs WHERE snapshot_id = ?
WHERE NOT EXISTS (
SELECT 1 FROM snapshot_blobs
WHERE snapshot_blobs.blob_hash = blobs.blob_hash
AND snapshot_blobs.snapshot_id = ?
)`, snapshotID)
if err != nil {
return fmt.Errorf("deleting orphaned blobs: %w", err)
}
rowsAffected, _ = result.RowsAffected()
log.Debug("Deleted blobs not in snapshot", "count", rowsAffected)
// Step 8: Delete orphaned snapshot_files and snapshot_blobs
_, err = tx.ExecContext(ctx, "DELETE FROM snapshot_files WHERE snapshot_id != ?", snapshotID)
log.Debug("Deleting orphaned snapshot_files")
database.LogSQL("Execute", "DELETE FROM snapshot_files WHERE snapshot_id != ?", snapshotID)
result, err = tx.ExecContext(ctx, "DELETE FROM snapshot_files WHERE snapshot_id != ?", snapshotID)
if err != nil {
return fmt.Errorf("deleting orphaned snapshot_files: %w", err)
}
rowsAffected, _ = result.RowsAffected()
log.Debug("Deleted snapshot_files", "count", rowsAffected)
_, err = tx.ExecContext(ctx, "DELETE FROM snapshot_blobs WHERE snapshot_id != ?", snapshotID)
log.Debug("Deleting orphaned snapshot_blobs")
database.LogSQL("Execute", "DELETE FROM snapshot_blobs WHERE snapshot_id != ?", snapshotID)
result, err = tx.ExecContext(ctx, "DELETE FROM snapshot_blobs WHERE snapshot_id != ?", snapshotID)
if err != nil {
return fmt.Errorf("deleting orphaned snapshot_blobs: %w", err)
}
rowsAffected, _ = result.RowsAffected()
log.Debug("Deleted snapshot_blobs", "count", rowsAffected)
// Commit transaction
log.Debug("Committing cleanup transaction")
if err := tx.Commit(); err != nil {
return fmt.Errorf("committing transaction: %w", err)
}
log.Debug("Database cleanup complete")
return nil
}
// dumpDatabase creates a SQL dump of the database
func (sm *SnapshotManager) dumpDatabase(dbPath, dumpPath string) error {
log.Debug("Running sqlite3 dump command", "source", dbPath, "destination", dumpPath)
cmd := exec.Command("sqlite3", dbPath, ".dump")
output, err := cmd.Output()
@@ -378,6 +470,7 @@ func (sm *SnapshotManager) dumpDatabase(dbPath, dumpPath string) error {
return fmt.Errorf("running sqlite3 dump: %w", err)
}
log.Debug("SQL dump generated", "size", len(output))
if err := os.WriteFile(dumpPath, output, 0644); err != nil {
return fmt.Errorf("writing dump file: %w", err)
}
@@ -387,27 +480,32 @@ func (sm *SnapshotManager) dumpDatabase(dbPath, dumpPath string) error {
// compressDump compresses the SQL dump using zstd
func (sm *SnapshotManager) compressDump(inputPath, outputPath string) error {
log.Debug("Opening SQL dump for compression", "path", inputPath)
input, err := os.Open(inputPath)
if err != nil {
return fmt.Errorf("opening input file: %w", err)
}
defer func() {
log.Debug("Closing input file", "path", inputPath)
if err := input.Close(); err != nil {
log.Debug("Failed to close input file", "error", err)
log.Debug("Failed to close input file", "path", inputPath, "error", err)
}
}()
log.Debug("Creating output file for compressed data", "path", outputPath)
output, err := os.Create(outputPath)
if err != nil {
return fmt.Errorf("creating output file: %w", err)
}
defer func() {
log.Debug("Closing output file", "path", outputPath)
if err := output.Close(); err != nil {
log.Debug("Failed to close output file", "error", err)
log.Debug("Failed to close output file", "path", outputPath, "error", err)
}
}()
// Create zstd encoder with good compression and multithreading
log.Debug("Creating zstd compressor", "level", "SpeedBetterCompression", "concurrency", runtime.NumCPU())
zstdWriter, err := zstd.NewWriter(output,
zstd.WithEncoderLevel(zstd.SpeedBetterCompression),
zstd.WithEncoderConcurrency(runtime.NumCPU()),
@@ -422,6 +520,7 @@ func (sm *SnapshotManager) compressDump(inputPath, outputPath string) error {
}
}()
log.Debug("Compressing data")
if _, err := io.Copy(zstdWriter, input); err != nil {
return fmt.Errorf("compressing data: %w", err)
}
@@ -431,35 +530,44 @@ func (sm *SnapshotManager) compressDump(inputPath, outputPath string) error {
// copyFile copies a file from src to dst
func copyFile(src, dst string) error {
log.Debug("Opening source file for copy", "path", src)
sourceFile, err := os.Open(src)
if err != nil {
return err
}
defer func() {
log.Debug("Closing source file", "path", src)
if err := sourceFile.Close(); err != nil {
log.Debug("Failed to close source file", "error", err)
log.Debug("Failed to close source file", "path", src, "error", err)
}
}()
log.Debug("Creating destination file", "path", dst)
destFile, err := os.Create(dst)
if err != nil {
return err
}
defer func() {
log.Debug("Closing destination file", "path", dst)
if err := destFile.Close(); err != nil {
log.Debug("Failed to close destination file", "error", err)
log.Debug("Failed to close destination file", "path", dst, "error", err)
}
}()
if _, err := io.Copy(destFile, sourceFile); err != nil {
log.Debug("Copying file data")
n, err := io.Copy(destFile, sourceFile)
if err != nil {
return err
}
log.Debug("File copy complete", "bytes_copied", n)
return nil
}
// generateBlobManifest creates a compressed JSON list of all blobs in the snapshot
func (sm *SnapshotManager) generateBlobManifest(ctx context.Context, dbPath string, snapshotID string) ([]byte, error) {
log.Debug("Generating blob manifest", "db_path", dbPath, "snapshot_id", snapshotID)
// Open the cleaned database using the database package
db, err := database.New(ctx, dbPath)
if err != nil {
@@ -471,10 +579,12 @@ func (sm *SnapshotManager) generateBlobManifest(ctx context.Context, dbPath stri
repos := database.NewRepositories(db)
// Get all blobs for this snapshot
log.Debug("Querying blobs for snapshot", "snapshot_id", snapshotID)
blobs, err := repos.Snapshots.GetBlobHashes(ctx, snapshotID)
if err != nil {
return nil, fmt.Errorf("getting snapshot blobs: %w", err)
}
log.Debug("Found blobs", "count", len(blobs))
// Create manifest structure
manifest := struct {
@@ -490,16 +600,20 @@ func (sm *SnapshotManager) generateBlobManifest(ctx context.Context, dbPath stri
}
// Marshal to JSON
log.Debug("Marshaling manifest to JSON")
jsonData, err := json.MarshalIndent(manifest, "", " ")
if err != nil {
return nil, fmt.Errorf("marshaling manifest: %w", err)
}
log.Debug("JSON manifest created", "size", len(jsonData))
// Compress with zstd
log.Debug("Compressing manifest with zstd")
compressed, err := compressData(jsonData)
if err != nil {
return nil, fmt.Errorf("compressing manifest: %w", err)
}
log.Debug("Manifest compressed", "original_size", len(jsonData), "compressed_size", len(compressed))
log.Info("Generated blob manifest",
"snapshot_id", snapshotID,
@@ -532,6 +646,15 @@ func compressData(data []byte) ([]byte, error) {
return buf.Bytes(), nil
}
// getFileSize returns the size of a file in bytes, or -1 if error
func getFileSize(path string) int64 {
info, err := os.Stat(path)
if err != nil {
return -1
}
return info.Size()
}
// BackupStats contains statistics from a backup operation
type BackupStats struct {
FilesScanned int
@@ -540,3 +663,108 @@ type BackupStats struct {
BlobsCreated int
BytesUploaded int64
}
// ExtendedBackupStats contains additional statistics for comprehensive tracking
type ExtendedBackupStats struct {
BackupStats
BlobUncompressedSize int64 // Total uncompressed size of all referenced blobs
CompressionLevel int // Compression level used for this snapshot
UploadDurationMs int64 // Total milliseconds spent uploading to S3
}
// CleanupIncompleteSnapshots removes incomplete snapshots that don't have metadata in S3.
// This is critical for data safety: incomplete snapshots can cause deduplication to skip
// files that were never successfully backed up, resulting in data loss.
func (sm *SnapshotManager) CleanupIncompleteSnapshots(ctx context.Context, hostname string) error {
log.Info("Checking for incomplete snapshots", "hostname", hostname)
// Get all incomplete snapshots for this hostname
incompleteSnapshots, err := sm.repos.Snapshots.GetIncompleteByHostname(ctx, hostname)
if err != nil {
return fmt.Errorf("getting incomplete snapshots: %w", err)
}
if len(incompleteSnapshots) == 0 {
log.Debug("No incomplete snapshots found")
return nil
}
log.Info("Found incomplete snapshots", "count", len(incompleteSnapshots))
// Check each incomplete snapshot for metadata in S3
for _, snapshot := range incompleteSnapshots {
// Check if metadata exists in S3
metadataKey := fmt.Sprintf("metadata/%s/db.zst", snapshot.ID)
_, err := sm.s3Client.StatObject(ctx, metadataKey)
if err != nil {
// Metadata doesn't exist in S3 - this is an incomplete snapshot
log.Info("Cleaning up incomplete snapshot", "snapshot_id", snapshot.ID, "started_at", snapshot.StartedAt)
// Delete the snapshot and all its associations
if err := sm.deleteSnapshot(ctx, snapshot.ID); err != nil {
return fmt.Errorf("deleting incomplete snapshot %s: %w", snapshot.ID, err)
}
log.Info("Deleted incomplete snapshot", "snapshot_id", snapshot.ID)
} else {
// Metadata exists - this snapshot was completed but database wasn't updated
// This shouldn't happen in normal operation, but mark it complete
log.Warn("Found snapshot with metadata but incomplete in DB", "snapshot_id", snapshot.ID)
if err := sm.repos.Snapshots.MarkComplete(ctx, nil, snapshot.ID); err != nil {
log.Error("Failed to mark snapshot complete", "snapshot_id", snapshot.ID, "error", err)
}
}
}
return nil
}
// deleteSnapshot removes a snapshot and all its associations from the database
func (sm *SnapshotManager) deleteSnapshot(ctx context.Context, snapshotID string) error {
// Delete snapshot_files entries
if err := sm.repos.Snapshots.DeleteSnapshotFiles(ctx, snapshotID); err != nil {
return fmt.Errorf("deleting snapshot files: %w", err)
}
// Delete snapshot_blobs entries
if err := sm.repos.Snapshots.DeleteSnapshotBlobs(ctx, snapshotID); err != nil {
return fmt.Errorf("deleting snapshot blobs: %w", err)
}
// Delete the snapshot itself
if err := sm.repos.Snapshots.Delete(ctx, snapshotID); err != nil {
return fmt.Errorf("deleting snapshot: %w", err)
}
// Clean up orphaned data
log.Debug("Cleaning up orphaned data")
if err := sm.cleanupOrphanedData(ctx); err != nil {
return fmt.Errorf("cleaning up orphaned data: %w", err)
}
return nil
}
// cleanupOrphanedData removes files, chunks, and blobs that are no longer referenced by any snapshot
func (sm *SnapshotManager) cleanupOrphanedData(ctx context.Context) error {
// Delete orphaned files (files not in any snapshot)
log.Debug("Deleting orphaned files")
if err := sm.repos.Files.DeleteOrphaned(ctx); err != nil {
return fmt.Errorf("deleting orphaned files: %w", err)
}
// Delete orphaned chunks (chunks not referenced by any file)
log.Debug("Deleting orphaned chunks")
if err := sm.repos.Chunks.DeleteOrphaned(ctx); err != nil {
return fmt.Errorf("deleting orphaned chunks: %w", err)
}
// Delete orphaned blobs (blobs not in any snapshot)
log.Debug("Deleting orphaned blobs")
if err := sm.repos.Blobs.DeleteOrphaned(ctx); err != nil {
return fmt.Errorf("deleting orphaned blobs: %w", err)
}
return nil
}

View File

@@ -1,3 +1,17 @@
// Package blob handles the creation of blobs - the final storage units for Vaultik.
// A blob is a large file (up to 10GB) containing many compressed and encrypted chunks
// from multiple source files. Blobs are content-addressed, meaning their filename
// is derived from the SHA256 hash of their compressed and encrypted content.
//
// The blob creation process:
// 1. Chunks are accumulated from multiple files
// 2. The collection is compressed using zstd
// 3. The compressed data is encrypted using age
// 4. The encrypted blob is hashed to create its content-addressed name
// 5. The blob is uploaded to S3 using the hash as the filename
//
// This design optimizes storage efficiency by batching many small chunks into
// larger blobs, reducing the number of S3 operations and associated costs.
package blob
import (
@@ -20,19 +34,25 @@ import (
"github.com/klauspost/compress/zstd"
)
// BlobHandler is called when a blob is finalized
// BlobHandler is a callback function invoked when a blob is finalized and ready for upload.
// The handler receives a BlobWithReader containing the blob metadata and a reader for
// the compressed and encrypted blob content. The handler is responsible for uploading
// the blob to storage and cleaning up any temporary files.
type BlobHandler func(blob *BlobWithReader) error
// PackerConfig holds configuration for creating a Packer
// PackerConfig holds configuration for creating a Packer.
// All fields except BlobHandler are required.
type PackerConfig struct {
MaxBlobSize int64
CompressionLevel int
Encryptor Encryptor // Required - blobs are always encrypted
Repositories *database.Repositories // For creating blob records
BlobHandler BlobHandler // Optional - called when blob is ready
MaxBlobSize int64 // Maximum size of a blob before forcing finalization
CompressionLevel int // Zstd compression level (1-19, higher = better compression)
Encryptor Encryptor // Age encryptor for blob encryption (required)
Repositories *database.Repositories // Database repositories for tracking blob metadata
BlobHandler BlobHandler // Optional callback when blob is ready for upload
}
// Packer combines chunks into blobs with compression and encryption
// Packer accumulates chunks and packs them into blobs.
// It handles compression, encryption, and coordination with the database
// to track blob metadata. Packer is thread-safe.
type Packer struct {
maxBlobSize int64
compressionLevel int
@@ -69,10 +89,13 @@ type blobInProgress struct {
compressedSize int64 // Current compressed size (estimated)
}
// ChunkRef represents a chunk to be added to a blob
// ChunkRef represents a chunk to be added to a blob.
// The Hash is the content-addressed identifier (SHA256) of the chunk,
// and Data contains the raw chunk bytes. After adding to a blob,
// the Data can be safely discarded as it's written to the blob immediately.
type ChunkRef struct {
Hash string
Data []byte
Hash string // SHA256 hash of the chunk data
Data []byte // Raw chunk content
}
// chunkInfo tracks chunk metadata in a blob
@@ -107,7 +130,9 @@ type BlobWithReader struct {
TempFile *os.File // Optional, only set for disk-based blobs
}
// NewPacker creates a new blob packer
// NewPacker creates a new blob packer that accumulates chunks into blobs.
// The packer will automatically finalize blobs when they reach MaxBlobSize.
// Returns an error if required configuration fields are missing or invalid.
func NewPacker(cfg PackerConfig) (*Packer, error) {
if cfg.Encryptor == nil {
return nil, fmt.Errorf("encryptor is required - blobs must be encrypted")
@@ -125,15 +150,21 @@ func NewPacker(cfg PackerConfig) (*Packer, error) {
}, nil
}
// SetBlobHandler sets the handler to be called when a blob is finalized
// SetBlobHandler sets the handler to be called when a blob is finalized.
// The handler is responsible for uploading the blob to storage.
// If no handler is set, finalized blobs are stored in memory and can be
// retrieved with GetFinishedBlobs().
func (p *Packer) SetBlobHandler(handler BlobHandler) {
p.mu.Lock()
defer p.mu.Unlock()
p.blobHandler = handler
}
// AddChunk adds a chunk to the current blob
// Returns ErrBlobSizeLimitExceeded if adding the chunk would exceed the size limit
// AddChunk adds a chunk to the current blob being packed.
// If adding the chunk would exceed MaxBlobSize, returns ErrBlobSizeLimitExceeded.
// In this case, the caller should finalize the current blob and retry.
// The chunk data is written immediately and can be garbage collected after this call.
// Thread-safe.
func (p *Packer) AddChunk(chunk *ChunkRef) error {
p.mu.Lock()
defer p.mu.Unlock()
@@ -166,7 +197,10 @@ func (p *Packer) AddChunk(chunk *ChunkRef) error {
return nil
}
// Flush finalizes any pending blob
// Flush finalizes any in-progress blob, compressing, encrypting, and hashing it.
// This should be called after all chunks have been added to ensure no data is lost.
// If a BlobHandler is set, it will be called with the finalized blob.
// Thread-safe.
func (p *Packer) Flush() error {
p.mu.Lock()
defer p.mu.Unlock()
@@ -180,8 +214,12 @@ func (p *Packer) Flush() error {
return nil
}
// FinalizeBlob finalizes the current blob being assembled
// Caller must handle retrying the chunk that triggered size limit
// FinalizeBlob finalizes the current blob being assembled.
// This compresses the accumulated chunks, encrypts the result, and computes
// the content-addressed hash. The finalized blob is either passed to the
// BlobHandler (if set) or stored internally.
// Caller must handle retrying any chunk that triggered size limit exceeded.
// Not thread-safe - caller must hold the lock.
func (p *Packer) FinalizeBlob() error {
p.mu.Lock()
defer p.mu.Unlock()
@@ -193,7 +231,10 @@ func (p *Packer) FinalizeBlob() error {
return p.finalizeCurrentBlob()
}
// GetFinishedBlobs returns all completed blobs and clears the list
// GetFinishedBlobs returns all completed blobs and clears the internal list.
// This is only used when no BlobHandler is set. After calling this method,
// the caller is responsible for uploading the blobs to storage.
// Thread-safe.
func (p *Packer) GetFinishedBlobs() []*FinishedBlob {
p.mu.Lock()
defer p.mu.Unlock()
@@ -212,8 +253,8 @@ func (p *Packer) startNewBlob() error {
if p.repos != nil {
blob := &database.Blob{
ID: blobID,
Hash: "", // Will be set when finalized
CreatedTS: time.Now(),
Hash: "temp-placeholder-" + blobID, // Temporary placeholder until finalized
CreatedTS: time.Now().UTC(),
FinishedTS: nil,
UncompressedSize: 0,
CompressedSize: 0,
@@ -237,7 +278,7 @@ func (p *Packer) startNewBlob() error {
id: blobID,
chunks: make([]*chunkInfo, 0),
chunkSet: make(map[string]bool),
startTime: time.Now(),
startTime: time.Now().UTC(),
tempFile: tempFile,
hasher: sha256.New(),
size: 0,

View File

@@ -10,7 +10,9 @@ import (
"github.com/jotfs/fastcdc-go"
)
// Chunk represents a single chunk of data
// Chunk represents a single chunk of data produced by the content-defined chunking algorithm.
// Each chunk is identified by its SHA256 hash and contains the raw data along with
// its position and size information from the original file.
type Chunk struct {
Hash string // Content hash of the chunk
Data []byte // Chunk data
@@ -18,14 +20,20 @@ type Chunk struct {
Size int64 // Size of the chunk
}
// Chunker provides content-defined chunking using FastCDC
// Chunker provides content-defined chunking using the FastCDC algorithm.
// It splits data into variable-sized chunks based on content patterns, ensuring
// that identical data sequences produce identical chunks regardless of their
// position in the file. This enables efficient deduplication.
type Chunker struct {
avgChunkSize int
minChunkSize int
maxChunkSize int
}
// NewChunker creates a new chunker with the specified average chunk size
// NewChunker creates a new chunker with the specified average chunk size.
// The actual chunk sizes will vary between avgChunkSize/4 and avgChunkSize*4
// as recommended by the FastCDC algorithm. Typical values for avgChunkSize
// are 64KB (65536), 256KB (262144), or 1MB (1048576).
func NewChunker(avgChunkSize int64) *Chunker {
// FastCDC recommends min = avg/4 and max = avg*4
return &Chunker{
@@ -35,7 +43,10 @@ func NewChunker(avgChunkSize int64) *Chunker {
}
}
// ChunkReader splits the reader into content-defined chunks
// ChunkReader splits the reader into content-defined chunks and returns all chunks at once.
// This method loads all chunk data into memory, so it should only be used for
// reasonably sized inputs. For large files or streams, use ChunkReaderStreaming instead.
// Returns an error if chunking fails or if reading from the input fails.
func (c *Chunker) ChunkReader(r io.Reader) ([]Chunk, error) {
opts := fastcdc.Options{
MinSize: c.minChunkSize,
@@ -80,20 +91,31 @@ func (c *Chunker) ChunkReader(r io.Reader) ([]Chunk, error) {
return chunks, nil
}
// ChunkCallback is called for each chunk as it's processed
// ChunkCallback is a function called for each chunk as it's processed.
// The callback receives a Chunk containing the hash, data, offset, and size.
// If the callback returns an error, chunk processing stops and the error is propagated.
type ChunkCallback func(chunk Chunk) error
// ChunkReaderStreaming splits the reader into chunks and calls the callback for each
func (c *Chunker) ChunkReaderStreaming(r io.Reader, callback ChunkCallback) error {
// ChunkReaderStreaming splits the reader into chunks and calls the callback for each chunk.
// This is the preferred method for processing large files or streams as it doesn't
// accumulate all chunks in memory. The callback is invoked for each chunk as it's
// produced, allowing for streaming processing and immediate storage or transmission.
// Returns the SHA256 hash of the entire file content and an error if chunking fails,
// reading fails, or if the callback returns an error.
func (c *Chunker) ChunkReaderStreaming(r io.Reader, callback ChunkCallback) (string, error) {
// Create a tee reader to calculate full file hash while chunking
fileHasher := sha256.New()
teeReader := io.TeeReader(r, fileHasher)
opts := fastcdc.Options{
MinSize: c.minChunkSize,
AverageSize: c.avgChunkSize,
MaxSize: c.maxChunkSize,
}
chunker, err := fastcdc.NewChunker(r, opts)
chunker, err := fastcdc.NewChunker(teeReader, opts)
if err != nil {
return fmt.Errorf("creating chunker: %w", err)
return "", fmt.Errorf("creating chunker: %w", err)
}
offset := int64(0)
@@ -104,10 +126,10 @@ func (c *Chunker) ChunkReaderStreaming(r io.Reader, callback ChunkCallback) erro
break
}
if err != nil {
return fmt.Errorf("reading chunk: %w", err)
return "", fmt.Errorf("reading chunk: %w", err)
}
// Calculate hash
// Calculate chunk hash
hash := sha256.Sum256(chunk.Data)
// Make a copy of the data since FastCDC reuses the buffer
@@ -120,16 +142,20 @@ func (c *Chunker) ChunkReaderStreaming(r io.Reader, callback ChunkCallback) erro
Offset: offset,
Size: int64(len(chunk.Data)),
}); err != nil {
return fmt.Errorf("callback error: %w", err)
return "", fmt.Errorf("callback error: %w", err)
}
offset += int64(len(chunk.Data))
}
return nil
// Return the full file hash
return hex.EncodeToString(fileHasher.Sum(nil)), nil
}
// ChunkFile splits a file into content-defined chunks
// ChunkFile splits a file into content-defined chunks by reading the entire file.
// This is a convenience method that opens the file and passes it to ChunkReader.
// For large files, consider using ChunkReaderStreaming with a file handle instead.
// Returns an error if the file cannot be opened or if chunking fails.
func (c *Chunker) ChunkFile(path string) ([]Chunk, error) {
file, err := os.Open(path)
if err != nil {

View File

@@ -15,7 +15,9 @@ import (
"go.uber.org/fx"
)
// AppOptions contains common options for creating the fx application
// AppOptions contains common options for creating the fx application.
// It includes the configuration file path, logging options, and additional
// fx modules and invocations that should be included in the application.
type AppOptions struct {
ConfigPath string
LogOptions log.LogOptions
@@ -27,13 +29,16 @@ type AppOptions struct {
func setupGlobals(lc fx.Lifecycle, g *globals.Globals) {
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
g.StartTime = time.Now()
g.StartTime = time.Now().UTC()
return nil
},
})
}
// NewApp creates a new fx application with common modules
// NewApp creates a new fx application with common modules.
// It sets up the base modules (config, database, logging, globals) and
// combines them with any additional modules specified in the options.
// The returned fx.App is ready to be started with RunApp.
func NewApp(opts AppOptions) *fx.App {
baseModules := []fx.Option{
fx.Supply(config.ConfigPath(opts.ConfigPath)),
@@ -53,7 +58,10 @@ func NewApp(opts AppOptions) *fx.App {
return fx.New(allOptions...)
}
// RunApp starts and stops the fx application within the given context
// RunApp starts and stops the fx application within the given context.
// It handles graceful shutdown on interrupt signals (SIGINT, SIGTERM) and
// ensures the application stops cleanly. The function blocks until the
// application completes or is interrupted. Returns an error if startup fails.
func RunApp(ctx context.Context, app *fx.App) error {
// Set up signal handling for graceful shutdown
sigChan := make(chan os.Signal, 1)
@@ -101,7 +109,9 @@ func RunApp(ctx context.Context, app *fx.App) error {
}
}
// RunWithApp is a helper that creates and runs an fx app with the given options
// RunWithApp is a helper that creates and runs an fx app with the given options.
// It combines NewApp and RunApp into a single convenient function. This is the
// preferred way to run CLI commands that need the full application context.
func RunWithApp(ctx context.Context, opts AppOptions) error {
app := NewApp(opts)
return RunApp(ctx, app)

View File

@@ -1,287 +0,0 @@
package cli
import (
"context"
"fmt"
"os"
"path/filepath"
"git.eeqj.de/sneak/vaultik/internal/backup"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/crypto"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/s3"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// BackupOptions contains options for the backup command
type BackupOptions struct {
ConfigPath string
Daemon bool
Cron bool
Prune bool
}
// BackupApp contains all dependencies needed for running backups
type BackupApp struct {
Globals *globals.Globals
Config *config.Config
Repositories *database.Repositories
ScannerFactory backup.ScannerFactory
S3Client *s3.Client
DB *database.DB
Lifecycle fx.Lifecycle
Shutdowner fx.Shutdowner
}
// NewBackupCommand creates the backup command
func NewBackupCommand() *cobra.Command {
opts := &BackupOptions{}
cmd := &cobra.Command{
Use: "backup",
Short: "Perform incremental backup",
Long: `Backup configured directories using incremental deduplication and encryption.
Config is located at /etc/vaultik/config.yml, but can be overridden by specifying
a path using --config or by setting VAULTIK_CONFIG to a path.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// If --config not specified, check environment variable
if opts.ConfigPath == "" {
opts.ConfigPath = os.Getenv("VAULTIK_CONFIG")
}
// If still not specified, use default
if opts.ConfigPath == "" {
defaultConfig := "/etc/vaultik/config.yml"
if _, err := os.Stat(defaultConfig); err == nil {
opts.ConfigPath = defaultConfig
} else {
return fmt.Errorf("no config file specified, VAULTIK_CONFIG not set, and %s not found", defaultConfig)
}
}
return runBackup(cmd.Context(), opts)
},
}
cmd.Flags().StringVar(&opts.ConfigPath, "config", "", "Path to config file")
cmd.Flags().BoolVar(&opts.Daemon, "daemon", false, "Run in daemon mode with inotify monitoring")
cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
cmd.Flags().BoolVar(&opts.Prune, "prune", false, "Delete all previous snapshots and unreferenced blobs after backup")
return cmd
}
func runBackup(ctx context.Context, opts *BackupOptions) error {
rootFlags := GetRootFlags()
return RunWithApp(ctx, AppOptions{
ConfigPath: opts.ConfigPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Cron: opts.Cron,
},
Modules: []fx.Option{
backup.Module,
s3.Module,
fx.Provide(fx.Annotate(
func(g *globals.Globals, cfg *config.Config, repos *database.Repositories,
scannerFactory backup.ScannerFactory, s3Client *s3.Client, db *database.DB,
lc fx.Lifecycle, shutdowner fx.Shutdowner) *BackupApp {
return &BackupApp{
Globals: g,
Config: cfg,
Repositories: repos,
ScannerFactory: scannerFactory,
S3Client: s3Client,
DB: db,
Lifecycle: lc,
Shutdowner: shutdowner,
}
},
)),
},
Invokes: []fx.Option{
fx.Invoke(func(app *BackupApp, lc fx.Lifecycle) {
// Create a cancellable context for the backup
backupCtx, backupCancel := context.WithCancel(context.Background())
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Start the backup in a goroutine
go func() {
// Run the backup
if err := app.runBackup(backupCtx, opts); err != nil {
if err != context.Canceled {
log.Error("Backup failed", "error", err)
}
}
// Shutdown the app when backup completes
if err := app.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping backup")
// Cancel the backup context
backupCancel()
return nil
},
})
}),
},
})
}
// runBackup executes the backup operation
func (app *BackupApp) runBackup(ctx context.Context, opts *BackupOptions) error {
log.Info("Starting backup",
"config", opts.ConfigPath,
"version", app.Globals.Version,
"commit", app.Globals.Commit,
"index_path", app.Config.IndexPath,
)
if opts.Daemon {
log.Info("Running in daemon mode")
// TODO: Implement daemon mode with inotify
return fmt.Errorf("daemon mode not yet implemented")
}
// Resolve source directories to absolute paths
resolvedDirs := make([]string, 0, len(app.Config.SourceDirs))
for _, dir := range app.Config.SourceDirs {
absPath, err := filepath.Abs(dir)
if err != nil {
return fmt.Errorf("failed to resolve absolute path for %s: %w", dir, err)
}
// Resolve symlinks
resolvedPath, err := filepath.EvalSymlinks(absPath)
if err != nil {
// If the path doesn't exist yet, use the absolute path
if os.IsNotExist(err) {
resolvedPath = absPath
} else {
return fmt.Errorf("failed to resolve symlinks for %s: %w", absPath, err)
}
}
resolvedDirs = append(resolvedDirs, resolvedPath)
}
// Create scanner with progress enabled (unless in cron mode)
scanner := app.ScannerFactory(backup.ScannerParams{
EnableProgress: !opts.Cron,
})
// Perform a single backup run
log.Notice("Starting backup", "source_dirs", len(resolvedDirs))
for i, dir := range resolvedDirs {
log.Info("Source directory", "index", i+1, "path", dir)
}
totalFiles := 0
totalBytes := int64(0)
totalChunks := 0
totalBlobs := 0
// Create a new snapshot at the beginning of backup
hostname := app.Config.Hostname
if hostname == "" {
hostname, _ = os.Hostname()
}
// Create encryptor if age recipients are configured
var encryptor backup.Encryptor
if len(app.Config.AgeRecipients) > 0 {
cryptoEncryptor, err := crypto.NewEncryptor(app.Config.AgeRecipients)
if err != nil {
return fmt.Errorf("creating encryptor: %w", err)
}
encryptor = cryptoEncryptor
}
snapshotManager := backup.NewSnapshotManager(app.Repositories, app.S3Client, encryptor)
snapshotID, err := snapshotManager.CreateSnapshot(ctx, hostname, app.Globals.Version)
if err != nil {
return fmt.Errorf("creating snapshot: %w", err)
}
log.Info("Created snapshot", "snapshot_id", snapshotID)
for _, dir := range resolvedDirs {
// Check if context is cancelled
select {
case <-ctx.Done():
log.Info("Backup cancelled")
return ctx.Err()
default:
}
log.Info("Scanning directory", "path", dir)
result, err := scanner.Scan(ctx, dir, snapshotID)
if err != nil {
return fmt.Errorf("failed to scan %s: %w", dir, err)
}
totalFiles += result.FilesScanned
totalBytes += result.BytesScanned
totalChunks += result.ChunksCreated
totalBlobs += result.BlobsCreated
log.Info("Directory scan complete",
"path", dir,
"files", result.FilesScanned,
"files_skipped", result.FilesSkipped,
"bytes", result.BytesScanned,
"bytes_skipped", result.BytesSkipped,
"chunks", result.ChunksCreated,
"blobs", result.BlobsCreated,
"duration", result.EndTime.Sub(result.StartTime))
}
// Update snapshot statistics
stats := backup.BackupStats{
FilesScanned: totalFiles,
BytesScanned: totalBytes,
ChunksCreated: totalChunks,
BlobsCreated: totalBlobs,
BytesUploaded: totalBytes, // TODO: Track actual uploaded bytes
}
if err := snapshotManager.UpdateSnapshotStats(ctx, snapshotID, stats); err != nil {
return fmt.Errorf("updating snapshot stats: %w", err)
}
// Mark snapshot as complete
if err := snapshotManager.CompleteSnapshot(ctx, snapshotID); err != nil {
return fmt.Errorf("completing snapshot: %w", err)
}
// Export snapshot metadata
// Export snapshot metadata without closing the database
// The export function should handle its own database connection
if err := snapshotManager.ExportSnapshotMetadata(ctx, app.Config.IndexPath, snapshotID); err != nil {
return fmt.Errorf("exporting snapshot metadata: %w", err)
}
log.Notice("Backup complete",
"snapshot_id", snapshotID,
"total_files", totalFiles,
"total_bytes", totalBytes,
"total_chunks", totalChunks,
"total_blobs", totalBlobs)
if opts.Prune {
log.Info("Pruning enabled - will delete old snapshots after backup")
// TODO: Implement pruning
}
return nil
}

94
internal/cli/duration.go Normal file
View File

@@ -0,0 +1,94 @@
package cli
import (
"fmt"
"regexp"
"strconv"
"strings"
"time"
)
// parseDuration parses duration strings. Supports standard Go duration format
// (e.g., "3h30m", "1h45m30s") as well as extended units:
// - d: days (e.g., "30d", "7d")
// - w: weeks (e.g., "2w", "4w")
// - mo: months (30 days) (e.g., "6mo", "1mo")
// - y: years (365 days) (e.g., "1y", "2y")
//
// Can combine units: "1y6mo", "2w3d", "1d12h30m"
func parseDuration(s string) (time.Duration, error) {
// First try standard Go duration parsing
if d, err := time.ParseDuration(s); err == nil {
return d, nil
}
// Extended duration parsing
// Check for negative values
if strings.HasPrefix(strings.TrimSpace(s), "-") {
return 0, fmt.Errorf("negative durations are not supported")
}
// Pattern matches: number + unit, repeated
re := regexp.MustCompile(`(\d+(?:\.\d+)?)\s*([a-zA-Z]+)`)
matches := re.FindAllStringSubmatch(s, -1)
if len(matches) == 0 {
return 0, fmt.Errorf("invalid duration format: %q", s)
}
var total time.Duration
for _, match := range matches {
valueStr := match[1]
unit := strings.ToLower(match[2])
value, err := strconv.ParseFloat(valueStr, 64)
if err != nil {
return 0, fmt.Errorf("invalid number %q: %w", valueStr, err)
}
var d time.Duration
switch unit {
// Standard time units
case "ns", "nanosecond", "nanoseconds":
d = time.Duration(value)
case "us", "µs", "microsecond", "microseconds":
d = time.Duration(value * float64(time.Microsecond))
case "ms", "millisecond", "milliseconds":
d = time.Duration(value * float64(time.Millisecond))
case "s", "sec", "second", "seconds":
d = time.Duration(value * float64(time.Second))
case "m", "min", "minute", "minutes":
d = time.Duration(value * float64(time.Minute))
case "h", "hr", "hour", "hours":
d = time.Duration(value * float64(time.Hour))
// Extended units
case "d", "day", "days":
d = time.Duration(value * float64(24*time.Hour))
case "w", "week", "weeks":
d = time.Duration(value * float64(7*24*time.Hour))
case "mo", "month", "months":
// Using 30 days as approximation
d = time.Duration(value * float64(30*24*time.Hour))
case "y", "year", "years":
// Using 365 days as approximation
d = time.Duration(value * float64(365*24*time.Hour))
default:
// Try parsing as standard Go duration unit
testStr := fmt.Sprintf("1%s", unit)
if _, err := time.ParseDuration(testStr); err == nil {
// It's a valid Go duration unit, parse the full value
fullStr := fmt.Sprintf("%g%s", value, unit)
if d, err = time.ParseDuration(fullStr); err != nil {
return 0, fmt.Errorf("invalid duration %q: %w", fullStr, err)
}
} else {
return 0, fmt.Errorf("unknown time unit %q", unit)
}
}
total += d
}
return total, nil
}

View File

@@ -0,0 +1,263 @@
package cli
import (
"testing"
"time"
"github.com/stretchr/testify/assert"
)
func TestParseDuration(t *testing.T) {
tests := []struct {
name string
input string
expected time.Duration
wantErr bool
}{
// Standard Go durations
{
name: "standard seconds",
input: "30s",
expected: 30 * time.Second,
},
{
name: "standard minutes",
input: "45m",
expected: 45 * time.Minute,
},
{
name: "standard hours",
input: "2h",
expected: 2 * time.Hour,
},
{
name: "standard combined",
input: "3h30m",
expected: 3*time.Hour + 30*time.Minute,
},
{
name: "standard complex",
input: "1h45m30s",
expected: 1*time.Hour + 45*time.Minute + 30*time.Second,
},
{
name: "standard with milliseconds",
input: "1s500ms",
expected: 1*time.Second + 500*time.Millisecond,
},
// Extended units - days
{
name: "single day",
input: "1d",
expected: 24 * time.Hour,
},
{
name: "multiple days",
input: "7d",
expected: 7 * 24 * time.Hour,
},
{
name: "fractional days",
input: "1.5d",
expected: 36 * time.Hour,
},
{
name: "days spelled out",
input: "3days",
expected: 3 * 24 * time.Hour,
},
// Extended units - weeks
{
name: "single week",
input: "1w",
expected: 7 * 24 * time.Hour,
},
{
name: "multiple weeks",
input: "4w",
expected: 4 * 7 * 24 * time.Hour,
},
{
name: "weeks spelled out",
input: "2weeks",
expected: 2 * 7 * 24 * time.Hour,
},
// Extended units - months
{
name: "single month",
input: "1mo",
expected: 30 * 24 * time.Hour,
},
{
name: "multiple months",
input: "6mo",
expected: 6 * 30 * 24 * time.Hour,
},
{
name: "months spelled out",
input: "3months",
expected: 3 * 30 * 24 * time.Hour,
},
// Extended units - years
{
name: "single year",
input: "1y",
expected: 365 * 24 * time.Hour,
},
{
name: "multiple years",
input: "2y",
expected: 2 * 365 * 24 * time.Hour,
},
{
name: "years spelled out",
input: "1year",
expected: 365 * 24 * time.Hour,
},
// Combined extended units
{
name: "weeks and days",
input: "2w3d",
expected: 2*7*24*time.Hour + 3*24*time.Hour,
},
{
name: "years and months",
input: "1y6mo",
expected: 365*24*time.Hour + 6*30*24*time.Hour,
},
{
name: "days and hours",
input: "1d12h",
expected: 24*time.Hour + 12*time.Hour,
},
{
name: "complex combination",
input: "1y2mo3w4d5h6m7s",
expected: 365*24*time.Hour + 2*30*24*time.Hour + 3*7*24*time.Hour + 4*24*time.Hour + 5*time.Hour + 6*time.Minute + 7*time.Second,
},
{
name: "with spaces",
input: "1d 12h 30m",
expected: 24*time.Hour + 12*time.Hour + 30*time.Minute,
},
// Edge cases
{
name: "zero duration",
input: "0s",
expected: 0,
},
{
name: "large duration",
input: "10y",
expected: 10 * 365 * 24 * time.Hour,
},
// Error cases
{
name: "empty string",
input: "",
wantErr: true,
},
{
name: "invalid format",
input: "abc",
wantErr: true,
},
{
name: "unknown unit",
input: "5x",
wantErr: true,
},
{
name: "invalid number",
input: "xyzd",
wantErr: true,
},
{
name: "negative not supported",
input: "-5d",
wantErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got, err := parseDuration(tt.input)
if tt.wantErr {
assert.Error(t, err, "expected error for input %q", tt.input)
return
}
assert.NoError(t, err, "unexpected error for input %q", tt.input)
assert.Equal(t, tt.expected, got, "duration mismatch for input %q", tt.input)
})
}
}
func TestParseDurationSpecialCases(t *testing.T) {
// Test that standard Go durations work exactly as expected
standardDurations := []string{
"300ms",
"1.5h",
"2h45m",
"72h",
"1us",
"1µs",
"1ns",
}
for _, d := range standardDurations {
expected, err := time.ParseDuration(d)
assert.NoError(t, err)
got, err := parseDuration(d)
assert.NoError(t, err)
assert.Equal(t, expected, got, "standard duration %q should parse identically", d)
}
}
func TestParseDurationRealWorldExamples(t *testing.T) {
// Test real-world snapshot purge scenarios
tests := []struct {
description string
input string
olderThan time.Duration
}{
{
description: "keep snapshots from last 30 days",
input: "30d",
olderThan: 30 * 24 * time.Hour,
},
{
description: "keep snapshots from last 6 months",
input: "6mo",
olderThan: 6 * 30 * 24 * time.Hour,
},
{
description: "keep snapshots from last year",
input: "1y",
olderThan: 365 * 24 * time.Hour,
},
{
description: "keep snapshots from last week and a half",
input: "1w3d",
olderThan: 10 * 24 * time.Hour,
},
{
description: "keep snapshots from last 90 days",
input: "90d",
olderThan: 90 * 24 * time.Hour,
},
}
for _, tt := range tests {
t.Run(tt.description, func(t *testing.T) {
got, err := parseDuration(tt.input)
assert.NoError(t, err)
assert.Equal(t, tt.olderThan, got)
// Verify the duration makes sense for snapshot purging
assert.Greater(t, got, time.Hour, "snapshot purge duration should be at least an hour")
})
}
}

View File

@@ -4,7 +4,9 @@ import (
"os"
)
// CLIEntry is the main entry point for the CLI application
// CLIEntry is the main entry point for the CLI application.
// It creates the root command, executes it, and exits with status 1
// if an error occurs. This function should be called from main().
func CLIEntry() {
rootCmd := NewRootCommand()
if err := rootCmd.Execute(); err != nil {

View File

@@ -18,7 +18,7 @@ func TestCLIEntry(t *testing.T) {
}
// Verify all subcommands are registered
expectedCommands := []string{"backup", "restore", "prune", "verify", "fetch"}
expectedCommands := []string{"snapshot", "store", "restore", "prune", "verify", "fetch"}
for _, expected := range expectedCommands {
found := false
for _, cmd := range cmd.Commands() {
@@ -32,19 +32,24 @@ func TestCLIEntry(t *testing.T) {
}
}
// Verify backup command has proper flags
backupCmd, _, err := cmd.Find([]string{"backup"})
// Verify snapshot command has subcommands
snapshotCmd, _, err := cmd.Find([]string{"snapshot"})
if err != nil {
t.Errorf("Failed to find backup command: %v", err)
t.Errorf("Failed to find snapshot command: %v", err)
} else {
if backupCmd.Flag("config") == nil {
t.Error("Backup command missing --config flag")
}
if backupCmd.Flag("daemon") == nil {
t.Error("Backup command missing --daemon flag")
}
if backupCmd.Flag("cron") == nil {
t.Error("Backup command missing --cron flag")
// Check snapshot subcommands
expectedSubCommands := []string{"create", "list", "purge", "verify"}
for _, expected := range expectedSubCommands {
found := false
for _, subcmd := range snapshotCmd.Commands() {
if subcmd.Use == expected || subcmd.Name() == expected {
found = true
break
}
}
if !found {
t.Errorf("Expected snapshot subcommand '%s' not found", expected)
}
}
}
}

View File

@@ -1,18 +1,25 @@
package cli
import (
"fmt"
"os"
"github.com/spf13/cobra"
)
// RootFlags holds global flags
// RootFlags holds global flags that apply to all commands.
// These flags are defined on the root command and inherited by all subcommands.
type RootFlags struct {
Verbose bool
Debug bool
ConfigPath string
Verbose bool
Debug bool
}
var rootFlags RootFlags
// NewRootCommand creates the root cobra command
// NewRootCommand creates the root cobra command for the vaultik CLI.
// It sets up the command structure, global flags, and adds all subcommands.
// This is the main entry point for the CLI command hierarchy.
func NewRootCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "vaultik",
@@ -24,23 +31,49 @@ on the source system.`,
}
// Add global flags
cmd.PersistentFlags().StringVar(&rootFlags.ConfigPath, "config", "", "Path to config file (default: $VAULTIK_CONFIG or /etc/vaultik/config.yml)")
cmd.PersistentFlags().BoolVarP(&rootFlags.Verbose, "verbose", "v", false, "Enable verbose output")
cmd.PersistentFlags().BoolVar(&rootFlags.Debug, "debug", false, "Enable debug output")
// Add subcommands
cmd.AddCommand(
NewBackupCommand(),
NewRestoreCommand(),
NewPruneCommand(),
NewVerifyCommand(),
NewFetchCommand(),
SnapshotCmd(),
NewStoreCommand(),
NewSnapshotCommand(),
)
return cmd
}
// GetRootFlags returns the global flags
// GetRootFlags returns the global flags that were parsed from the command line.
// This allows subcommands to access global flag values like verbosity and config path.
func GetRootFlags() RootFlags {
return rootFlags
}
// ResolveConfigPath resolves the config file path from flags, environment, or default.
// It checks in order: 1) --config flag, 2) VAULTIK_CONFIG environment variable,
// 3) default location /etc/vaultik/config.yml. Returns an error if no valid
// config file can be found through any of these methods.
func ResolveConfigPath() (string, error) {
// First check global flag
if rootFlags.ConfigPath != "" {
return rootFlags.ConfigPath, nil
}
// Then check environment variable
if envPath := os.Getenv("VAULTIK_CONFIG"); envPath != "" {
return envPath, nil
}
// Finally check default location
defaultPath := "/etc/vaultik/config.yml"
if _, err := os.Stat(defaultPath); err == nil {
return defaultPath, nil
}
return "", fmt.Errorf("no config file specified, VAULTIK_CONFIG not set, and %s not found", defaultPath)
}

View File

@@ -1,90 +1,892 @@
package cli
import (
"context"
"encoding/json"
"fmt"
"os"
"path/filepath"
"sort"
"strings"
"text/tabwriter"
"time"
"git.eeqj.de/sneak/vaultik/internal/backup"
"git.eeqj.de/sneak/vaultik/internal/config"
"git.eeqj.de/sneak/vaultik/internal/crypto"
"git.eeqj.de/sneak/vaultik/internal/database"
"git.eeqj.de/sneak/vaultik/internal/globals"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/s3"
"github.com/dustin/go-humanize"
"github.com/klauspost/compress/zstd"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
func SnapshotCmd() *cobra.Command {
// SnapshotCreateOptions contains options for the snapshot create command
type SnapshotCreateOptions struct {
Daemon bool
Cron bool
Prune bool
}
// SnapshotCreateApp contains all dependencies needed for creating snapshots
type SnapshotCreateApp struct {
Globals *globals.Globals
Config *config.Config
Repositories *database.Repositories
ScannerFactory backup.ScannerFactory
S3Client *s3.Client
DB *database.DB
Lifecycle fx.Lifecycle
Shutdowner fx.Shutdowner
}
// SnapshotApp contains dependencies for snapshot commands
type SnapshotApp struct {
*SnapshotCreateApp // Reuse snapshot creation functionality
S3Client *s3.Client
}
// SnapshotInfo represents snapshot information for listing
type SnapshotInfo struct {
ID string `json:"id"`
Timestamp time.Time `json:"timestamp"`
CompressedSize int64 `json:"compressed_size"`
}
// NewSnapshotCommand creates the snapshot command and subcommands
func NewSnapshotCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "snapshot",
Short: "Manage snapshots",
Long: "Commands for listing, removing, and querying snapshots",
Short: "Snapshot management commands",
Long: "Commands for creating, listing, and managing snapshots",
}
cmd.AddCommand(snapshotListCmd())
cmd.AddCommand(snapshotRmCmd())
cmd.AddCommand(snapshotLatestCmd())
// Add subcommands
cmd.AddCommand(newSnapshotCreateCommand())
cmd.AddCommand(newSnapshotListCommand())
cmd.AddCommand(newSnapshotPurgeCommand())
cmd.AddCommand(newSnapshotVerifyCommand())
return cmd
}
func snapshotListCmd() *cobra.Command {
var (
bucket string
prefix string
limit int
// newSnapshotCreateCommand creates the 'snapshot create' subcommand
func newSnapshotCreateCommand() *cobra.Command {
opts := &SnapshotCreateOptions{}
cmd := &cobra.Command{
Use: "create",
Short: "Create a new snapshot",
Long: `Creates a new snapshot of the configured directories.
Config is located at /etc/vaultik/config.yml by default, but can be overridden by
specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,
Args: cobra.NoArgs,
RunE: func(cmd *cobra.Command, args []string) error {
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
// Use the backup functionality from cli package
rootFlags := GetRootFlags()
return RunWithApp(cmd.Context(), AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
Cron: opts.Cron,
},
Modules: []fx.Option{
backup.Module,
s3.Module,
fx.Provide(fx.Annotate(
func(g *globals.Globals, cfg *config.Config, repos *database.Repositories,
scannerFactory backup.ScannerFactory, s3Client *s3.Client, db *database.DB,
lc fx.Lifecycle, shutdowner fx.Shutdowner) *SnapshotCreateApp {
return &SnapshotCreateApp{
Globals: g,
Config: cfg,
Repositories: repos,
ScannerFactory: scannerFactory,
S3Client: s3Client,
DB: db,
Lifecycle: lc,
Shutdowner: shutdowner,
}
},
)),
},
Invokes: []fx.Option{
fx.Invoke(func(app *SnapshotCreateApp, lc fx.Lifecycle) {
// Create a cancellable context for the snapshot
snapshotCtx, snapshotCancel := context.WithCancel(context.Background())
lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
// Start the snapshot creation in a goroutine
go func() {
// Run the snapshot creation
if err := app.runSnapshot(snapshotCtx, opts); err != nil {
if err != context.Canceled {
log.Error("Snapshot creation failed", "error", err)
}
}
// Shutdown the app when snapshot completes
if err := app.Shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
return nil
},
OnStop: func(ctx context.Context) error {
log.Debug("Stopping snapshot creation")
// Cancel the snapshot context
snapshotCancel()
return nil
},
})
}),
},
})
},
}
cmd.Flags().BoolVar(&opts.Daemon, "daemon", false, "Run in daemon mode with inotify monitoring")
cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
cmd.Flags().BoolVar(&opts.Prune, "prune", false, "Delete all previous snapshots and unreferenced blobs after backup")
return cmd
}
// runSnapshot executes the snapshot creation operation
func (app *SnapshotCreateApp) runSnapshot(ctx context.Context, opts *SnapshotCreateOptions) error {
snapshotStartTime := time.Now()
log.Info("Starting snapshot creation",
"version", app.Globals.Version,
"commit", app.Globals.Commit,
"index_path", app.Config.IndexPath,
)
// Clean up incomplete snapshots FIRST, before any scanning
// This is critical for data safety - see CleanupIncompleteSnapshots for details
hostname := app.Config.Hostname
if hostname == "" {
hostname, _ = os.Hostname()
}
// Create encryptor if needed for snapshot manager
var encryptor backup.Encryptor
if len(app.Config.AgeRecipients) > 0 {
cryptoEncryptor, err := crypto.NewEncryptor(app.Config.AgeRecipients)
if err != nil {
return fmt.Errorf("creating encryptor: %w", err)
}
encryptor = cryptoEncryptor
}
snapshotManager := backup.NewSnapshotManager(app.Repositories, app.S3Client, encryptor)
// CRITICAL: This MUST succeed. If we fail to clean up incomplete snapshots,
// the deduplication logic will think files from the incomplete snapshot were
// already backed up and skip them, resulting in data loss.
if err := snapshotManager.CleanupIncompleteSnapshots(ctx, hostname); err != nil {
return fmt.Errorf("cleanup incomplete snapshots: %w", err)
}
if opts.Daemon {
log.Info("Running in daemon mode")
// TODO: Implement daemon mode with inotify
return fmt.Errorf("daemon mode not yet implemented")
}
// Resolve source directories to absolute paths
resolvedDirs := make([]string, 0, len(app.Config.SourceDirs))
for _, dir := range app.Config.SourceDirs {
absPath, err := filepath.Abs(dir)
if err != nil {
return fmt.Errorf("failed to resolve absolute path for %s: %w", dir, err)
}
// Resolve symlinks
resolvedPath, err := filepath.EvalSymlinks(absPath)
if err != nil {
// If the path doesn't exist yet, use the absolute path
if os.IsNotExist(err) {
resolvedPath = absPath
} else {
return fmt.Errorf("failed to resolve symlinks for %s: %w", absPath, err)
}
}
resolvedDirs = append(resolvedDirs, resolvedPath)
}
// Create scanner with progress enabled (unless in cron mode)
scanner := app.ScannerFactory(backup.ScannerParams{
EnableProgress: !opts.Cron,
})
// Perform a single snapshot run
log.Notice("Starting snapshot", "source_dirs", len(resolvedDirs))
for i, dir := range resolvedDirs {
log.Info("Source directory", "index", i+1, "path", dir)
}
// Statistics tracking
totalFiles := 0
totalBytes := int64(0)
totalChunks := 0
totalBlobs := 0
totalBytesSkipped := int64(0)
totalFilesSkipped := 0
totalBytesUploaded := int64(0)
totalBlobsUploaded := 0
uploadDuration := time.Duration(0)
// Create a new snapshot at the beginning
// (hostname, encryptor, and snapshotManager already created above for cleanup)
snapshotID, err := snapshotManager.CreateSnapshot(ctx, hostname, app.Globals.Version, app.Globals.Commit)
if err != nil {
return fmt.Errorf("creating snapshot: %w", err)
}
log.Info("Created snapshot", "snapshot_id", snapshotID)
for _, dir := range resolvedDirs {
// Check if context is cancelled
select {
case <-ctx.Done():
log.Info("Snapshot creation cancelled")
return ctx.Err()
default:
}
log.Info("Scanning directory", "path", dir)
result, err := scanner.Scan(ctx, dir, snapshotID)
if err != nil {
return fmt.Errorf("failed to scan %s: %w", dir, err)
}
totalFiles += result.FilesScanned
totalBytes += result.BytesScanned
totalChunks += result.ChunksCreated
totalBlobs += result.BlobsCreated
totalFilesSkipped += result.FilesSkipped
totalBytesSkipped += result.BytesSkipped
log.Info("Directory scan complete",
"path", dir,
"files", result.FilesScanned,
"files_skipped", result.FilesSkipped,
"bytes", result.BytesScanned,
"bytes_skipped", result.BytesSkipped,
"chunks", result.ChunksCreated,
"blobs", result.BlobsCreated,
"duration", result.EndTime.Sub(result.StartTime))
}
// Get upload statistics from scanner progress if available
if s := scanner.GetProgress(); s != nil {
stats := s.GetStats()
totalBytesUploaded = stats.BytesUploaded.Load()
totalBlobsUploaded = int(stats.BlobsUploaded.Load())
uploadDuration = time.Duration(stats.UploadDurationMs.Load()) * time.Millisecond
}
// Update snapshot statistics with extended fields
extStats := backup.ExtendedBackupStats{
BackupStats: backup.BackupStats{
FilesScanned: totalFiles,
BytesScanned: totalBytes,
ChunksCreated: totalChunks,
BlobsCreated: totalBlobs,
BytesUploaded: totalBytesUploaded,
},
BlobUncompressedSize: 0, // Will be set from database query below
CompressionLevel: app.Config.CompressionLevel,
UploadDurationMs: uploadDuration.Milliseconds(),
}
if err := snapshotManager.UpdateSnapshotStatsExtended(ctx, snapshotID, extStats); err != nil {
return fmt.Errorf("updating snapshot stats: %w", err)
}
// Mark snapshot as complete
if err := snapshotManager.CompleteSnapshot(ctx, snapshotID); err != nil {
return fmt.Errorf("completing snapshot: %w", err)
}
// Export snapshot metadata
// Export snapshot metadata without closing the database
// The export function should handle its own database connection
if err := snapshotManager.ExportSnapshotMetadata(ctx, app.Config.IndexPath, snapshotID); err != nil {
return fmt.Errorf("exporting snapshot metadata: %w", err)
}
// Calculate final statistics
snapshotDuration := time.Since(snapshotStartTime)
totalFilesChanged := totalFiles - totalFilesSkipped
totalBytesChanged := totalBytes
totalBytesAll := totalBytes + totalBytesSkipped
// Calculate upload speed
var avgUploadSpeed string
if totalBytesUploaded > 0 && uploadDuration > 0 {
bytesPerSec := float64(totalBytesUploaded) / uploadDuration.Seconds()
bitsPerSec := bytesPerSec * 8
if bitsPerSec >= 1e9 {
avgUploadSpeed = fmt.Sprintf("%.1f Gbit/s", bitsPerSec/1e9)
} else if bitsPerSec >= 1e6 {
avgUploadSpeed = fmt.Sprintf("%.0f Mbit/s", bitsPerSec/1e6)
} else if bitsPerSec >= 1e3 {
avgUploadSpeed = fmt.Sprintf("%.0f Kbit/s", bitsPerSec/1e3)
} else {
avgUploadSpeed = fmt.Sprintf("%.0f bit/s", bitsPerSec)
}
} else {
avgUploadSpeed = "N/A"
}
// Get total blob sizes from database
totalBlobSizeCompressed := int64(0)
totalBlobSizeUncompressed := int64(0)
if blobHashes, err := app.Repositories.Snapshots.GetBlobHashes(ctx, snapshotID); err == nil {
for _, hash := range blobHashes {
if blob, err := app.Repositories.Blobs.GetByHash(ctx, hash); err == nil && blob != nil {
totalBlobSizeCompressed += blob.CompressedSize
totalBlobSizeUncompressed += blob.UncompressedSize
}
}
}
// Calculate compression ratio
var compressionRatio float64
if totalBlobSizeUncompressed > 0 {
compressionRatio = float64(totalBlobSizeCompressed) / float64(totalBlobSizeUncompressed)
} else {
compressionRatio = 1.0
}
// Print comprehensive summary
log.Notice("=== Snapshot Summary ===")
log.Info("Snapshot ID", "id", snapshotID)
log.Info("Source files",
"total_count", formatNumber(totalFiles),
"total_size", humanize.Bytes(uint64(totalBytesAll)))
log.Info("Changed files",
"count", formatNumber(totalFilesChanged),
"size", humanize.Bytes(uint64(totalBytesChanged)))
log.Info("Unchanged files",
"count", formatNumber(totalFilesSkipped),
"size", humanize.Bytes(uint64(totalBytesSkipped)))
log.Info("Blob storage",
"total_uncompressed", humanize.Bytes(uint64(totalBlobSizeUncompressed)),
"total_compressed", humanize.Bytes(uint64(totalBlobSizeCompressed)),
"compression_ratio", fmt.Sprintf("%.2fx", compressionRatio),
"compression_level", app.Config.CompressionLevel)
log.Info("Upload activity",
"bytes_uploaded", humanize.Bytes(uint64(totalBytesUploaded)),
"blobs_uploaded", totalBlobsUploaded,
"upload_time", formatDuration(uploadDuration),
"avg_speed", avgUploadSpeed)
log.Info("Total time", "duration", formatDuration(snapshotDuration))
log.Notice("==========================")
if opts.Prune {
log.Info("Pruning enabled - will delete old snapshots after snapshot")
// TODO: Implement pruning
}
return nil
}
// newSnapshotListCommand creates the 'snapshot list' subcommand
func newSnapshotListCommand() *cobra.Command {
var jsonOutput bool
cmd := &cobra.Command{
Use: "list",
Short: "List snapshots",
Long: "List all snapshots in the bucket, sorted by timestamp",
Short: "List all snapshots",
Long: "Lists all snapshots with their ID, timestamp, and compressed size",
RunE: func(cmd *cobra.Command, args []string) error {
panic("unimplemented")
return runSnapshotCommand(cmd.Context(), func(app *SnapshotApp) error {
return app.List(cmd.Context(), jsonOutput)
})
},
}
cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name")
cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
cmd.Flags().IntVar(&limit, "limit", 10, "Maximum number of snapshots to list")
_ = cmd.MarkFlagRequired("bucket")
cmd.Flags().BoolVar(&jsonOutput, "json", false, "Output in JSON format")
return cmd
}
func snapshotRmCmd() *cobra.Command {
var (
bucket string
prefix string
snapshot string
)
// newSnapshotPurgeCommand creates the 'snapshot purge' subcommand
func newSnapshotPurgeCommand() *cobra.Command {
var keepLatest bool
var olderThan string
var force bool
cmd := &cobra.Command{
Use: "rm",
Short: "Remove a snapshot",
Long: "Remove a snapshot and optionally its associated blobs",
Use: "purge",
Short: "Purge old snapshots",
Long: "Removes snapshots based on age or count criteria",
RunE: func(cmd *cobra.Command, args []string) error {
panic("unimplemented")
// Validate flags
if !keepLatest && olderThan == "" {
return fmt.Errorf("must specify either --keep-latest or --older-than")
}
if keepLatest && olderThan != "" {
return fmt.Errorf("cannot specify both --keep-latest and --older-than")
}
return runSnapshotCommand(cmd.Context(), func(app *SnapshotApp) error {
return app.Purge(cmd.Context(), keepLatest, olderThan, force)
})
},
}
cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name")
cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
cmd.Flags().StringVar(&snapshot, "snapshot", "", "Snapshot ID to remove")
_ = cmd.MarkFlagRequired("bucket")
_ = cmd.MarkFlagRequired("snapshot")
cmd.Flags().BoolVar(&keepLatest, "keep-latest", false, "Keep only the latest snapshot")
cmd.Flags().StringVar(&olderThan, "older-than", "", "Remove snapshots older than duration (e.g., 30d, 6m, 1y)")
cmd.Flags().BoolVar(&force, "force", false, "Skip confirmation prompt")
return cmd
}
func snapshotLatestCmd() *cobra.Command {
var (
bucket string
prefix string
)
// newSnapshotVerifyCommand creates the 'snapshot verify' subcommand
func newSnapshotVerifyCommand() *cobra.Command {
var deep bool
cmd := &cobra.Command{
Use: "latest",
Short: "Get the latest snapshot ID",
Long: "Display the ID of the most recent snapshot",
Use: "verify <snapshot-id>",
Short: "Verify snapshot integrity",
Long: "Verifies that all blobs referenced in a snapshot exist",
Args: cobra.ExactArgs(1),
RunE: func(cmd *cobra.Command, args []string) error {
panic("unimplemented")
return runSnapshotCommand(cmd.Context(), func(app *SnapshotApp) error {
return app.Verify(cmd.Context(), args[0], deep)
})
},
}
cmd.Flags().StringVar(&bucket, "bucket", "", "S3 bucket name")
cmd.Flags().StringVar(&prefix, "prefix", "", "S3 prefix")
_ = cmd.MarkFlagRequired("bucket")
cmd.Flags().BoolVar(&deep, "deep", false, "Download and verify blob hashes")
return cmd
}
// List lists all snapshots
func (app *SnapshotApp) List(ctx context.Context, jsonOutput bool) error {
snapshots, err := app.getSnapshots(ctx)
if err != nil {
return err
}
// Sort by timestamp (newest first)
sort.Slice(snapshots, func(i, j int) bool {
return snapshots[i].Timestamp.After(snapshots[j].Timestamp)
})
if jsonOutput {
// JSON output
encoder := json.NewEncoder(os.Stdout)
encoder.SetIndent("", " ")
return encoder.Encode(snapshots)
}
// Table output
w := tabwriter.NewWriter(os.Stdout, 0, 0, 3, ' ', 0)
if _, err := fmt.Fprintln(w, "SNAPSHOT ID\tTIMESTAMP\tCOMPRESSED SIZE"); err != nil {
return err
}
if _, err := fmt.Fprintln(w, "───────────\t─────────\t───────────────"); err != nil {
return err
}
for _, snap := range snapshots {
if _, err := fmt.Fprintf(w, "%s\t%s\t%s\n",
snap.ID,
snap.Timestamp.Format("2006-01-02 15:04:05"),
formatBytes(snap.CompressedSize)); err != nil {
return err
}
}
return w.Flush()
}
// Purge removes old snapshots based on criteria
func (app *SnapshotApp) Purge(ctx context.Context, keepLatest bool, olderThan string, force bool) error {
snapshots, err := app.getSnapshots(ctx)
if err != nil {
return err
}
// Sort by timestamp (newest first)
sort.Slice(snapshots, func(i, j int) bool {
return snapshots[i].Timestamp.After(snapshots[j].Timestamp)
})
var toDelete []SnapshotInfo
if keepLatest {
// Keep only the most recent snapshot
if len(snapshots) > 1 {
toDelete = snapshots[1:]
}
} else if olderThan != "" {
// Parse duration
duration, err := parseDuration(olderThan)
if err != nil {
return fmt.Errorf("invalid duration: %w", err)
}
cutoff := time.Now().UTC().Add(-duration)
for _, snap := range snapshots {
if snap.Timestamp.Before(cutoff) {
toDelete = append(toDelete, snap)
}
}
}
if len(toDelete) == 0 {
fmt.Println("No snapshots to delete")
return nil
}
// Show what will be deleted
fmt.Printf("The following snapshots will be deleted:\n\n")
for _, snap := range toDelete {
fmt.Printf(" %s (%s, %s)\n",
snap.ID,
snap.Timestamp.Format("2006-01-02 15:04:05"),
formatBytes(snap.CompressedSize))
}
// Confirm unless --force is used
if !force {
fmt.Printf("\nDelete %d snapshot(s)? [y/N] ", len(toDelete))
var confirm string
if _, err := fmt.Scanln(&confirm); err != nil {
// Treat EOF or error as "no"
fmt.Println("Cancelled")
return nil
}
if strings.ToLower(confirm) != "y" {
fmt.Println("Cancelled")
return nil
}
} else {
fmt.Printf("\nDeleting %d snapshot(s) (--force specified)\n", len(toDelete))
}
// Delete snapshots
for _, snap := range toDelete {
log.Info("Deleting snapshot", "id", snap.ID)
if err := app.deleteSnapshot(ctx, snap.ID); err != nil {
return fmt.Errorf("deleting snapshot %s: %w", snap.ID, err)
}
}
fmt.Printf("Deleted %d snapshot(s)\n", len(toDelete))
// TODO: Run blob pruning to clean up unreferenced blobs
return nil
}
// Verify checks snapshot integrity
func (app *SnapshotApp) Verify(ctx context.Context, snapshotID string, deep bool) error {
fmt.Printf("Verifying snapshot %s...\n", snapshotID)
// Download and parse manifest
manifest, err := app.downloadManifest(ctx, snapshotID)
if err != nil {
return fmt.Errorf("downloading manifest: %w", err)
}
fmt.Printf("Manifest contains %d blobs\n", len(manifest))
// Check each blob exists
missing := 0
verified := 0
for _, blobHash := range manifest {
blobPath := fmt.Sprintf("blobs/%s/%s/%s", blobHash[:2], blobHash[2:4], blobHash)
if deep {
// Download and verify hash
// TODO: Implement deep verification
fmt.Printf("Deep verification not yet implemented\n")
return nil
} else {
// Just check existence
_, err := app.S3Client.StatObject(ctx, blobPath)
if err != nil {
fmt.Printf(" Missing: %s\n", blobHash)
missing++
} else {
verified++
}
}
}
fmt.Printf("\nVerification complete:\n")
fmt.Printf(" Verified: %d\n", verified)
fmt.Printf(" Missing: %d\n", missing)
if missing > 0 {
return fmt.Errorf("%d blobs are missing", missing)
}
return nil
}
// getSnapshots retrieves all snapshots from S3
func (app *SnapshotApp) getSnapshots(ctx context.Context) ([]SnapshotInfo, error) {
var snapshots []SnapshotInfo
// List all objects under metadata/
objectCh := app.S3Client.ListObjectsStream(ctx, "metadata/", true)
// Track unique snapshots
snapshotMap := make(map[string]*SnapshotInfo)
for object := range objectCh {
if object.Err != nil {
return nil, fmt.Errorf("listing objects: %w", object.Err)
}
// Extract snapshot ID from paths like metadata/2024-01-15-143052-hostname/manifest.json.zst
parts := strings.Split(object.Key, "/")
if len(parts) < 3 || parts[0] != "metadata" {
continue
}
snapshotID := parts[1]
if snapshotID == "" {
continue
}
// Initialize snapshot info if not seen
if _, exists := snapshotMap[snapshotID]; !exists {
timestamp, err := parseSnapshotTimestamp(snapshotID)
if err != nil {
log.Warn("Failed to parse snapshot timestamp", "id", snapshotID, "error", err)
continue
}
snapshotMap[snapshotID] = &SnapshotInfo{
ID: snapshotID,
Timestamp: timestamp,
CompressedSize: 0,
}
}
}
// For each snapshot, download manifest and calculate total blob size
for _, snap := range snapshotMap {
manifest, err := app.downloadManifest(ctx, snap.ID)
if err != nil {
log.Warn("Failed to download manifest", "id", snap.ID, "error", err)
continue
}
// Calculate total size of referenced blobs
for _, blobHash := range manifest {
blobPath := fmt.Sprintf("blobs/%s/%s/%s", blobHash[:2], blobHash[2:4], blobHash)
info, err := app.S3Client.StatObject(ctx, blobPath)
if err != nil {
log.Warn("Failed to stat blob", "blob", blobHash, "error", err)
continue
}
snap.CompressedSize += info.Size
}
snapshots = append(snapshots, *snap)
}
return snapshots, nil
}
// downloadManifest downloads and parses a snapshot manifest
func (app *SnapshotApp) downloadManifest(ctx context.Context, snapshotID string) ([]string, error) {
manifestPath := fmt.Sprintf("metadata/%s/manifest.json.zst", snapshotID)
reader, err := app.S3Client.GetObject(ctx, manifestPath)
if err != nil {
return nil, err
}
defer func() { _ = reader.Close() }()
// Decompress
zr, err := zstd.NewReader(reader)
if err != nil {
return nil, fmt.Errorf("creating zstd reader: %w", err)
}
defer zr.Close()
// Decode JSON
var manifest []string
if err := json.NewDecoder(zr).Decode(&manifest); err != nil {
return nil, fmt.Errorf("decoding manifest: %w", err)
}
return manifest, nil
}
// deleteSnapshot removes a snapshot and its metadata
func (app *SnapshotApp) deleteSnapshot(ctx context.Context, snapshotID string) error {
// List all objects under metadata/{snapshotID}/
prefix := fmt.Sprintf("metadata/%s/", snapshotID)
objectCh := app.S3Client.ListObjectsStream(ctx, prefix, true)
var objectsToDelete []string
for object := range objectCh {
if object.Err != nil {
return fmt.Errorf("listing objects: %w", object.Err)
}
objectsToDelete = append(objectsToDelete, object.Key)
}
// Delete all objects
for _, key := range objectsToDelete {
if err := app.S3Client.RemoveObject(ctx, key); err != nil {
return fmt.Errorf("removing %s: %w", key, err)
}
}
return nil
}
// parseSnapshotTimestamp extracts timestamp from snapshot ID
// Format: hostname-20240115-143052Z
func parseSnapshotTimestamp(snapshotID string) (time.Time, error) {
// Find the last hyphen to separate hostname from timestamp
lastHyphen := strings.LastIndex(snapshotID, "-")
if lastHyphen == -1 {
return time.Time{}, fmt.Errorf("invalid snapshot ID format")
}
// Extract timestamp part (everything after hostname)
timestampPart := snapshotID[lastHyphen+1:]
// The timestamp format is YYYYMMDD-HHMMSSZ
// We need to find where the date ends and time begins
if len(timestampPart) < 8 {
return time.Time{}, fmt.Errorf("invalid snapshot ID format: timestamp too short")
}
// Find where the hostname ends by looking for pattern YYYYMMDD
hostnameEnd := strings.LastIndex(snapshotID[:lastHyphen], "-")
if hostnameEnd == -1 {
return time.Time{}, fmt.Errorf("invalid snapshot ID format: missing date separator")
}
// Get the full timestamp including date from before the last hyphen
fullTimestamp := snapshotID[hostnameEnd+1:]
// Parse the timestamp with Z suffix
return time.Parse("20060102-150405Z", fullTimestamp)
}
// parseDuration is now in duration.go
// runSnapshotCommand creates the FX app and runs the given function
func runSnapshotCommand(ctx context.Context, fn func(*SnapshotApp) error) error {
var result error
rootFlags := GetRootFlags()
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
err = RunWithApp(ctx, AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
},
Modules: []fx.Option{
s3.Module,
fx.Provide(func(
g *globals.Globals,
cfg *config.Config,
db *database.DB,
repos *database.Repositories,
s3Client *s3.Client,
lc fx.Lifecycle,
shutdowner fx.Shutdowner,
) *SnapshotApp {
snapshotCreateApp := &SnapshotCreateApp{
Globals: g,
Config: cfg,
Repositories: repos,
ScannerFactory: nil, // Not needed for snapshot commands
S3Client: s3Client,
DB: db,
Lifecycle: lc,
Shutdowner: shutdowner,
}
return &SnapshotApp{
SnapshotCreateApp: snapshotCreateApp,
S3Client: s3Client,
}
}),
},
Invokes: []fx.Option{
fx.Invoke(func(app *SnapshotApp, shutdowner fx.Shutdowner) {
result = fn(app)
// Shutdown after command completes
go func() {
time.Sleep(100 * time.Millisecond) // Brief delay to ensure clean shutdown
if err := shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
}),
},
})
if err != nil {
return err
}
return result
}
// formatNumber formats a number with comma separators
func formatNumber(n int) string {
if n < 1000 {
return fmt.Sprintf("%d", n)
}
return humanize.Comma(int64(n))
}
// formatDuration formats a duration in a human-readable way
func formatDuration(d time.Duration) string {
if d < time.Second {
return fmt.Sprintf("%dms", d.Milliseconds())
}
if d < time.Minute {
return fmt.Sprintf("%.1fs", d.Seconds())
}
if d < time.Hour {
mins := int(d.Minutes())
secs := int(d.Seconds()) % 60
if secs > 0 {
return fmt.Sprintf("%dm%ds", mins, secs)
}
return fmt.Sprintf("%dm", mins)
}
hours := int(d.Hours())
mins := int(d.Minutes()) % 60
if mins > 0 {
return fmt.Sprintf("%dh%dm", hours, mins)
}
return fmt.Sprintf("%dh", hours)
}

159
internal/cli/store.go Normal file
View File

@@ -0,0 +1,159 @@
package cli
import (
"context"
"fmt"
"strings"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"git.eeqj.de/sneak/vaultik/internal/s3"
"github.com/spf13/cobra"
"go.uber.org/fx"
)
// StoreApp contains dependencies for store commands
type StoreApp struct {
S3Client *s3.Client
Shutdowner fx.Shutdowner
}
// NewStoreCommand creates the store command and subcommands
func NewStoreCommand() *cobra.Command {
cmd := &cobra.Command{
Use: "store",
Short: "Storage information commands",
Long: "Commands for viewing information about the S3 storage backend",
}
// Add subcommands
cmd.AddCommand(newStoreInfoCommand())
return cmd
}
// newStoreInfoCommand creates the 'store info' subcommand
func newStoreInfoCommand() *cobra.Command {
return &cobra.Command{
Use: "info",
Short: "Display storage information",
Long: "Shows S3 bucket configuration and storage statistics including snapshots and blobs",
RunE: func(cmd *cobra.Command, args []string) error {
return runWithApp(cmd.Context(), func(app *StoreApp) error {
return app.Info(cmd.Context())
})
},
}
}
// Info displays storage information
func (app *StoreApp) Info(ctx context.Context) error {
// Get bucket info
bucketName := app.S3Client.BucketName()
endpoint := app.S3Client.Endpoint()
fmt.Printf("Storage Information\n")
fmt.Printf("==================\n\n")
fmt.Printf("S3 Configuration:\n")
fmt.Printf(" Endpoint: %s\n", endpoint)
fmt.Printf(" Bucket: %s\n\n", bucketName)
// Count snapshots by listing metadata/ prefix
snapshotCount := 0
snapshotCh := app.S3Client.ListObjectsStream(ctx, "metadata/", true)
snapshotDirs := make(map[string]bool)
for object := range snapshotCh {
if object.Err != nil {
return fmt.Errorf("listing snapshots: %w", object.Err)
}
// Extract snapshot ID from path like metadata/2024-01-15-143052-hostname/
parts := strings.Split(object.Key, "/")
if len(parts) >= 2 && parts[0] == "metadata" && parts[1] != "" {
snapshotDirs[parts[1]] = true
}
}
snapshotCount = len(snapshotDirs)
// Count blobs and calculate total size by listing blobs/ prefix
blobCount := 0
var totalSize int64
blobCh := app.S3Client.ListObjectsStream(ctx, "blobs/", false)
for object := range blobCh {
if object.Err != nil {
return fmt.Errorf("listing blobs: %w", object.Err)
}
if !strings.HasSuffix(object.Key, "/") { // Skip directories
blobCount++
totalSize += object.Size
}
}
fmt.Printf("Storage Statistics:\n")
fmt.Printf(" Snapshots: %d\n", snapshotCount)
fmt.Printf(" Blobs: %d\n", blobCount)
fmt.Printf(" Total Size: %s\n", formatBytes(totalSize))
return nil
}
// formatBytes formats bytes into human-readable format
func formatBytes(bytes int64) string {
const unit = 1024
if bytes < unit {
return fmt.Sprintf("%d B", bytes)
}
div, exp := int64(unit), 0
for n := bytes / unit; n >= unit; n /= unit {
div *= unit
exp++
}
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
}
// runWithApp creates the FX app and runs the given function
func runWithApp(ctx context.Context, fn func(*StoreApp) error) error {
var result error
rootFlags := GetRootFlags()
// Use unified config resolution
configPath, err := ResolveConfigPath()
if err != nil {
return err
}
err = RunWithApp(ctx, AppOptions{
ConfigPath: configPath,
LogOptions: log.LogOptions{
Verbose: rootFlags.Verbose,
Debug: rootFlags.Debug,
},
Modules: []fx.Option{
s3.Module,
fx.Provide(func(s3Client *s3.Client, shutdowner fx.Shutdowner) *StoreApp {
return &StoreApp{
S3Client: s3Client,
Shutdowner: shutdowner,
}
}),
},
Invokes: []fx.Option{
fx.Invoke(func(app *StoreApp, shutdowner fx.Shutdowner) {
result = fn(app)
// Shutdown after command completes
go func() {
time.Sleep(100 * time.Millisecond) // Brief delay to ensure clean shutdown
if err := shutdowner.Shutdown(); err != nil {
log.Error("Failed to shutdown", "error", err)
}
}()
}),
},
})
if err != nil {
return err
}
return result
}

View File

@@ -9,7 +9,10 @@ import (
"gopkg.in/yaml.v3"
)
// Config represents the application configuration
// Config represents the application configuration for Vaultik.
// It defines all settings for backup operations, including source directories,
// encryption recipients, S3 storage configuration, and performance tuning parameters.
// Configuration is typically loaded from a YAML file.
type Config struct {
AgeRecipients []string `yaml:"age_recipients"`
BackupInterval time.Duration `yaml:"backup_interval"`
@@ -19,14 +22,15 @@ type Config struct {
FullScanInterval time.Duration `yaml:"full_scan_interval"`
Hostname string `yaml:"hostname"`
IndexPath string `yaml:"index_path"`
IndexPrefix string `yaml:"index_prefix"`
MinTimeBetweenRun time.Duration `yaml:"min_time_between_run"`
S3 S3Config `yaml:"s3"`
SourceDirs []string `yaml:"source_dirs"`
CompressionLevel int `yaml:"compression_level"`
}
// S3Config represents S3 storage configuration
// S3Config represents S3 storage configuration for backup storage.
// It supports both AWS S3 and S3-compatible storage services.
// All fields except UseSSL and PartSize are required.
type S3Config struct {
Endpoint string `yaml:"endpoint"`
Bucket string `yaml:"bucket"`
@@ -38,10 +42,14 @@ type S3Config struct {
PartSize Size `yaml:"part_size"`
}
// ConfigPath wraps the config file path for fx injection
// ConfigPath wraps the config file path for fx dependency injection.
// This type allows the config file path to be injected as a distinct type
// rather than a plain string, avoiding conflicts with other string dependencies.
type ConfigPath string
// New creates a new Config instance
// New creates a new Config instance by loading from the specified path.
// This function is used by the fx dependency injection framework.
// Returns an error if the path is empty or if loading fails.
func New(path ConfigPath) (*Config, error) {
if path == "" {
return nil, fmt.Errorf("config path not provided")
@@ -55,7 +63,11 @@ func New(path ConfigPath) (*Config, error) {
return cfg, nil
}
// Load reads and parses the configuration file
// Load reads and parses the configuration file from the specified path.
// It applies default values for optional fields, performs environment variable
// substitution for certain fields (like IndexPath), and validates the configuration.
// The configuration file should be in YAML format. Returns an error if the file
// cannot be read, parsed, or if validation fails.
func Load(path string) (*Config, error) {
data, err := os.ReadFile(path)
if err != nil {
@@ -70,7 +82,6 @@ func Load(path string) (*Config, error) {
FullScanInterval: 24 * time.Hour,
MinTimeBetweenRun: 15 * time.Minute,
IndexPath: "/var/lib/vaultik/index.sqlite",
IndexPrefix: "index/",
CompressionLevel: 3,
}
@@ -107,7 +118,15 @@ func Load(path string) (*Config, error) {
return cfg, nil
}
// Validate checks if the configuration is valid
// Validate checks if the configuration is valid and complete.
// It ensures all required fields are present and have valid values:
// - At least one age recipient must be specified
// - At least one source directory must be configured
// - S3 credentials and endpoint must be provided
// - Chunk size must be at least 1MB
// - Blob size limit must be at least the chunk size
// - Compression level must be between 1 and 19
// Returns an error describing the first validation failure encountered.
func (c *Config) Validate() error {
if len(c.AgeRecipients) == 0 {
return fmt.Errorf("at least one age_recipient is required")
@@ -148,7 +167,8 @@ func (c *Config) Validate() error {
return nil
}
// Module exports the config module for fx
// Module exports the config module for fx dependency injection.
// It provides the Config type to other modules in the application.
var Module = fx.Module("config",
fx.Provide(New),
)

View File

@@ -6,10 +6,14 @@ import (
"github.com/dustin/go-humanize"
)
// Size is a custom type that can unmarshal from both int64 and string
// Size represents a byte size that can be specified in configuration files.
// It can unmarshal from both numeric values (interpreted as bytes) and
// human-readable strings like "10MB", "2.5GB", or "1TB".
type Size int64
// UnmarshalYAML implements yaml.Unmarshaler for Size
// UnmarshalYAML implements yaml.Unmarshaler for Size, allowing it to be
// parsed from YAML configuration files. It accepts both numeric values
// (interpreted as bytes) and string values with units (e.g., "10MB").
func (s *Size) UnmarshalYAML(unmarshal func(interface{}) error) error {
// Try to unmarshal as int64 first
var intVal int64
@@ -34,12 +38,16 @@ func (s *Size) UnmarshalYAML(unmarshal func(interface{}) error) error {
return nil
}
// Int64 returns the size as int64
// Int64 returns the size as int64 bytes.
// This is useful when the size needs to be passed to APIs that expect
// a numeric byte count.
func (s Size) Int64() int64 {
return int64(s)
}
// String returns the size as a human-readable string
// String returns the size as a human-readable string.
// For example, 1048576 bytes would be formatted as "1.0 MB".
// This implements the fmt.Stringer interface.
func (s Size) String() string {
return humanize.Bytes(uint64(s))
}

View File

@@ -9,13 +9,19 @@ import (
"filippo.io/age"
)
// Encryptor provides thread-safe encryption using age
// Encryptor provides thread-safe encryption using the age encryption library.
// It supports encrypting data for multiple recipients simultaneously, allowing
// any of the corresponding private keys to decrypt the data. This is useful
// for backup scenarios where multiple parties should be able to decrypt the data.
type Encryptor struct {
recipients []age.Recipient
mu sync.RWMutex
}
// NewEncryptor creates a new encryptor with the given age public keys
// NewEncryptor creates a new encryptor with the given age public keys.
// Each public key should be a valid age X25519 recipient string (e.g., "age1...")
// At least one recipient must be provided. Returns an error if any of the
// public keys are invalid or if no recipients are specified.
func NewEncryptor(publicKeys []string) (*Encryptor, error) {
if len(publicKeys) == 0 {
return nil, fmt.Errorf("at least one recipient is required")
@@ -35,7 +41,10 @@ func NewEncryptor(publicKeys []string) (*Encryptor, error) {
}, nil
}
// Encrypt encrypts data using age encryption
// Encrypt encrypts data using age encryption for all configured recipients.
// The encrypted data can be decrypted by any of the corresponding private keys.
// This method is suitable for small to medium amounts of data that fit in memory.
// For large data streams, use EncryptStream or EncryptWriter instead.
func (e *Encryptor) Encrypt(data []byte) ([]byte, error) {
e.mu.RLock()
recipients := e.recipients
@@ -62,7 +71,10 @@ func (e *Encryptor) Encrypt(data []byte) ([]byte, error) {
return buf.Bytes(), nil
}
// EncryptStream encrypts data from reader to writer
// EncryptStream encrypts data from reader to writer using age encryption.
// This method is suitable for encrypting large files or streams as it processes
// data in a streaming fashion without loading everything into memory.
// The encrypted data is written directly to the destination writer.
func (e *Encryptor) EncryptStream(dst io.Writer, src io.Reader) error {
e.mu.RLock()
recipients := e.recipients
@@ -87,7 +99,11 @@ func (e *Encryptor) EncryptStream(dst io.Writer, src io.Reader) error {
return nil
}
// EncryptWriter creates a writer that encrypts data written to it
// EncryptWriter creates a writer that encrypts data written to it.
// All data written to the returned WriteCloser will be encrypted and written
// to the destination writer. The caller must call Close() on the returned
// writer to ensure all encrypted data is properly flushed and finalized.
// This is useful for integrating encryption into existing writer-based pipelines.
func (e *Encryptor) EncryptWriter(dst io.Writer) (io.WriteCloser, error) {
e.mu.RLock()
recipients := e.recipients
@@ -102,7 +118,11 @@ func (e *Encryptor) EncryptWriter(dst io.Writer) (io.WriteCloser, error) {
return w, nil
}
// UpdateRecipients updates the recipients (thread-safe)
// UpdateRecipients updates the recipients for future encryption operations.
// This method is thread-safe and can be called while other encryption operations
// are in progress. Existing encryption operations will continue with the old
// recipients. At least one recipient must be provided. Returns an error if any
// of the public keys are invalid or if no recipients are specified.
func (e *Encryptor) UpdateRecipients(publicKeys []string) error {
if len(publicKeys) == 0 {
return fmt.Errorf("at least one recipient is required")

View File

@@ -24,7 +24,7 @@ func (r *BlobChunkRepository) Create(ctx context.Context, tx *sql.Tx, bc *BlobCh
if tx != nil {
_, err = tx.ExecContext(ctx, query, bc.BlobID, bc.ChunkHash, bc.Offset, bc.Length)
} else {
_, err = r.db.ExecWithLock(ctx, query, bc.BlobID, bc.ChunkHash, bc.Offset, bc.Length)
_, err = r.db.ExecWithLog(ctx, query, bc.BlobID, bc.ChunkHash, bc.Offset, bc.Length)
}
if err != nil {

View File

@@ -2,7 +2,9 @@ package database
import (
"context"
"strings"
"testing"
"time"
)
func TestBlobChunkRepository(t *testing.T) {
@@ -10,78 +12,112 @@ func TestBlobChunkRepository(t *testing.T) {
defer cleanup()
ctx := context.Background()
repo := NewBlobChunkRepository(db)
repos := NewRepositories(db)
// Create blob first
blob := &Blob{
ID: "blob1-uuid",
Hash: "blob1-hash",
CreatedTS: time.Now(),
}
err := repos.Blobs.Create(ctx, nil, blob)
if err != nil {
t.Fatalf("failed to create blob: %v", err)
}
// Create chunks
chunks := []string{"chunk1", "chunk2", "chunk3"}
for _, chunkHash := range chunks {
chunk := &Chunk{
ChunkHash: chunkHash,
SHA256: chunkHash + "-sha",
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
}
}
// Test Create
bc1 := &BlobChunk{
BlobID: "blob1-uuid",
BlobID: blob.ID,
ChunkHash: "chunk1",
Offset: 0,
Length: 1024,
}
err := repo.Create(ctx, nil, bc1)
err = repos.BlobChunks.Create(ctx, nil, bc1)
if err != nil {
t.Fatalf("failed to create blob chunk: %v", err)
}
// Add more chunks to the same blob
bc2 := &BlobChunk{
BlobID: "blob1-uuid",
BlobID: blob.ID,
ChunkHash: "chunk2",
Offset: 1024,
Length: 2048,
}
err = repo.Create(ctx, nil, bc2)
err = repos.BlobChunks.Create(ctx, nil, bc2)
if err != nil {
t.Fatalf("failed to create second blob chunk: %v", err)
}
bc3 := &BlobChunk{
BlobID: "blob1-uuid",
BlobID: blob.ID,
ChunkHash: "chunk3",
Offset: 3072,
Length: 512,
}
err = repo.Create(ctx, nil, bc3)
err = repos.BlobChunks.Create(ctx, nil, bc3)
if err != nil {
t.Fatalf("failed to create third blob chunk: %v", err)
}
// Test GetByBlobID
chunks, err := repo.GetByBlobID(ctx, "blob1-uuid")
blobChunks, err := repos.BlobChunks.GetByBlobID(ctx, blob.ID)
if err != nil {
t.Fatalf("failed to get blob chunks: %v", err)
}
if len(chunks) != 3 {
t.Errorf("expected 3 chunks, got %d", len(chunks))
if len(blobChunks) != 3 {
t.Errorf("expected 3 chunks, got %d", len(blobChunks))
}
// Verify order by offset
expectedOffsets := []int64{0, 1024, 3072}
for i, chunk := range chunks {
if chunk.Offset != expectedOffsets[i] {
t.Errorf("wrong chunk order: expected offset %d, got %d", expectedOffsets[i], chunk.Offset)
for i, bc := range blobChunks {
if bc.Offset != expectedOffsets[i] {
t.Errorf("wrong chunk order: expected offset %d, got %d", expectedOffsets[i], bc.Offset)
}
}
// Test GetByChunkHash
bc, err := repo.GetByChunkHash(ctx, "chunk2")
bc, err := repos.BlobChunks.GetByChunkHash(ctx, "chunk2")
if err != nil {
t.Fatalf("failed to get blob chunk by chunk hash: %v", err)
}
if bc == nil {
t.Fatal("expected blob chunk, got nil")
}
if bc.BlobID != "blob1-uuid" {
t.Errorf("wrong blob ID: expected blob1-uuid, got %s", bc.BlobID)
if bc.BlobID != blob.ID {
t.Errorf("wrong blob ID: expected %s, got %s", blob.ID, bc.BlobID)
}
if bc.Offset != 1024 {
t.Errorf("wrong offset: expected 1024, got %d", bc.Offset)
}
// Test duplicate insert (should fail due to primary key constraint)
err = repos.BlobChunks.Create(ctx, nil, bc1)
if err == nil {
t.Fatal("duplicate blob_chunk insert should fail due to primary key constraint")
}
if !strings.Contains(err.Error(), "UNIQUE") && !strings.Contains(err.Error(), "constraint") {
t.Fatalf("expected constraint error, got: %v", err)
}
// Test non-existent chunk
bc, err = repo.GetByChunkHash(ctx, "nonexistent")
bc, err = repos.BlobChunks.GetByChunkHash(ctx, "nonexistent")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
@@ -95,26 +131,61 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
defer cleanup()
ctx := context.Background()
repo := NewBlobChunkRepository(db)
repos := NewRepositories(db)
// Create blobs
blob1 := &Blob{
ID: "blob1-uuid",
Hash: "blob1-hash",
CreatedTS: time.Now(),
}
blob2 := &Blob{
ID: "blob2-uuid",
Hash: "blob2-hash",
CreatedTS: time.Now(),
}
err := repos.Blobs.Create(ctx, nil, blob1)
if err != nil {
t.Fatalf("failed to create blob1: %v", err)
}
err = repos.Blobs.Create(ctx, nil, blob2)
if err != nil {
t.Fatalf("failed to create blob2: %v", err)
}
// Create chunks
chunkHashes := []string{"chunk1", "chunk2", "chunk3"}
for _, chunkHash := range chunkHashes {
chunk := &Chunk{
ChunkHash: chunkHash,
SHA256: chunkHash + "-sha",
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk %s: %v", chunkHash, err)
}
}
// Create chunks across multiple blobs
// Some chunks are shared between blobs (deduplication scenario)
blobChunks := []BlobChunk{
{BlobID: "blob1-uuid", ChunkHash: "chunk1", Offset: 0, Length: 1024},
{BlobID: "blob1-uuid", ChunkHash: "chunk2", Offset: 1024, Length: 1024},
{BlobID: "blob2-uuid", ChunkHash: "chunk2", Offset: 0, Length: 1024}, // chunk2 is shared
{BlobID: "blob2-uuid", ChunkHash: "chunk3", Offset: 1024, Length: 1024},
{BlobID: blob1.ID, ChunkHash: "chunk1", Offset: 0, Length: 1024},
{BlobID: blob1.ID, ChunkHash: "chunk2", Offset: 1024, Length: 1024},
{BlobID: blob2.ID, ChunkHash: "chunk2", Offset: 0, Length: 1024}, // chunk2 is shared
{BlobID: blob2.ID, ChunkHash: "chunk3", Offset: 1024, Length: 1024},
}
for _, bc := range blobChunks {
err := repo.Create(ctx, nil, &bc)
err := repos.BlobChunks.Create(ctx, nil, &bc)
if err != nil {
t.Fatalf("failed to create blob chunk: %v", err)
}
}
// Verify blob1 chunks
chunks, err := repo.GetByBlobID(ctx, "blob1-uuid")
chunks, err := repos.BlobChunks.GetByBlobID(ctx, blob1.ID)
if err != nil {
t.Fatalf("failed to get blob1 chunks: %v", err)
}
@@ -123,7 +194,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
}
// Verify blob2 chunks
chunks, err = repo.GetByBlobID(ctx, "blob2-uuid")
chunks, err = repos.BlobChunks.GetByBlobID(ctx, blob2.ID)
if err != nil {
t.Fatalf("failed to get blob2 chunks: %v", err)
}
@@ -132,7 +203,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
}
// Verify shared chunk
bc, err := repo.GetByChunkHash(ctx, "chunk2")
bc, err := repos.BlobChunks.GetByChunkHash(ctx, "chunk2")
if err != nil {
t.Fatalf("failed to get shared chunk: %v", err)
}
@@ -140,7 +211,7 @@ func TestBlobChunkRepositoryMultipleBlobs(t *testing.T) {
t.Fatal("expected shared chunk, got nil")
}
// GetByChunkHash returns first match, should be blob1
if bc.BlobID != "blob1-uuid" {
t.Errorf("expected blob1-uuid for shared chunk, got %s", bc.BlobID)
if bc.BlobID != blob1.ID {
t.Errorf("expected %s for shared chunk, got %s", blob1.ID, bc.BlobID)
}
}

View File

@@ -5,6 +5,8 @@ import (
"database/sql"
"fmt"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
)
type BlobRepository struct {
@@ -36,7 +38,7 @@ func (r *BlobRepository) Create(ctx context.Context, tx *sql.Tx, blob *Blob) err
_, err = tx.ExecContext(ctx, query, blob.ID, blob.Hash, blob.CreatedTS.Unix(),
finishedTS, blob.UncompressedSize, blob.CompressedSize, uploadedTS)
} else {
_, err = r.db.ExecWithLock(ctx, query, blob.ID, blob.Hash, blob.CreatedTS.Unix(),
_, err = r.db.ExecWithLog(ctx, query, blob.ID, blob.Hash, blob.CreatedTS.Unix(),
finishedTS, blob.UncompressedSize, blob.CompressedSize, uploadedTS)
}
@@ -75,13 +77,13 @@ func (r *BlobRepository) GetByHash(ctx context.Context, hash string) (*Blob, err
return nil, fmt.Errorf("querying blob: %w", err)
}
blob.CreatedTS = time.Unix(createdTSUnix, 0)
blob.CreatedTS = time.Unix(createdTSUnix, 0).UTC()
if finishedTSUnix.Valid {
ts := time.Unix(finishedTSUnix.Int64, 0)
ts := time.Unix(finishedTSUnix.Int64, 0).UTC()
blob.FinishedTS = &ts
}
if uploadedTSUnix.Valid {
ts := time.Unix(uploadedTSUnix.Int64, 0)
ts := time.Unix(uploadedTSUnix.Int64, 0).UTC()
blob.UploadedTS = &ts
}
return &blob, nil
@@ -116,13 +118,13 @@ func (r *BlobRepository) GetByID(ctx context.Context, id string) (*Blob, error)
return nil, fmt.Errorf("querying blob: %w", err)
}
blob.CreatedTS = time.Unix(createdTSUnix, 0)
blob.CreatedTS = time.Unix(createdTSUnix, 0).UTC()
if finishedTSUnix.Valid {
ts := time.Unix(finishedTSUnix.Int64, 0)
ts := time.Unix(finishedTSUnix.Int64, 0).UTC()
blob.FinishedTS = &ts
}
if uploadedTSUnix.Valid {
ts := time.Unix(uploadedTSUnix.Int64, 0)
ts := time.Unix(uploadedTSUnix.Int64, 0).UTC()
blob.UploadedTS = &ts
}
return &blob, nil
@@ -136,12 +138,12 @@ func (r *BlobRepository) UpdateFinished(ctx context.Context, tx *sql.Tx, id stri
WHERE id = ?
`
now := time.Now().Unix()
now := time.Now().UTC().Unix()
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, hash, now, uncompressedSize, compressedSize, id)
} else {
_, err = r.db.ExecWithLock(ctx, query, hash, now, uncompressedSize, compressedSize, id)
_, err = r.db.ExecWithLog(ctx, query, hash, now, uncompressedSize, compressedSize, id)
}
if err != nil {
@@ -159,12 +161,12 @@ func (r *BlobRepository) UpdateUploaded(ctx context.Context, tx *sql.Tx, id stri
WHERE id = ?
`
now := time.Now().Unix()
now := time.Now().UTC().Unix()
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, now, id)
} else {
_, err = r.db.ExecWithLock(ctx, query, now, id)
_, err = r.db.ExecWithLog(ctx, query, now, id)
}
if err != nil {
@@ -173,3 +175,26 @@ func (r *BlobRepository) UpdateUploaded(ctx context.Context, tx *sql.Tx, id stri
return nil
}
// DeleteOrphaned deletes blobs that are not referenced by any snapshot
func (r *BlobRepository) DeleteOrphaned(ctx context.Context) error {
query := `
DELETE FROM blobs
WHERE NOT EXISTS (
SELECT 1 FROM snapshot_blobs
WHERE snapshot_blobs.blob_id = blobs.id
)
`
result, err := r.db.ExecWithLog(ctx, query)
if err != nil {
return fmt.Errorf("deleting orphaned blobs: %w", err)
}
rowsAffected, _ := result.RowsAffected()
if rowsAffected > 0 {
log.Debug("Deleted orphaned blobs", "count", rowsAffected)
}
return nil
}

View File

@@ -0,0 +1,124 @@
package database
import (
"context"
"fmt"
"testing"
"time"
)
// TestCascadeDeleteDebug tests cascade delete with debug output
func TestCascadeDeleteDebug(t *testing.T) {
db, cleanup := setupTestDB(t)
defer cleanup()
ctx := context.Background()
repos := NewRepositories(db)
// Check if foreign keys are enabled
var fkEnabled int
err := db.conn.QueryRow("PRAGMA foreign_keys").Scan(&fkEnabled)
if err != nil {
t.Fatal(err)
}
t.Logf("Foreign keys enabled: %d", fkEnabled)
// Create a file
file := &File{
Path: "/cascade-test.txt",
MTime: time.Now().Truncate(time.Second),
CTime: time.Now().Truncate(time.Second),
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
}
err = repos.Files.Create(ctx, nil, file)
if err != nil {
t.Fatalf("failed to create file: %v", err)
}
t.Logf("Created file with ID: %s", file.ID)
// Create chunks and file-chunk mappings
for i := 0; i < 3; i++ {
chunk := &Chunk{
ChunkHash: fmt.Sprintf("cascade-chunk-%d", i),
SHA256: fmt.Sprintf("cascade-sha-%d", i),
Size: 1024,
}
err = repos.Chunks.Create(ctx, nil, chunk)
if err != nil {
t.Fatalf("failed to create chunk: %v", err)
}
fc := &FileChunk{
FileID: file.ID,
Idx: i,
ChunkHash: chunk.ChunkHash,
}
err = repos.FileChunks.Create(ctx, nil, fc)
if err != nil {
t.Fatalf("failed to create file chunk: %v", err)
}
t.Logf("Created file chunk mapping: file_id=%s, idx=%d, chunk=%s", fc.FileID, fc.Idx, fc.ChunkHash)
}
// Verify file chunks exist
fileChunks, err := repos.FileChunks.GetByFileID(ctx, file.ID)
if err != nil {
t.Fatal(err)
}
t.Logf("File chunks before delete: %d", len(fileChunks))
// Check the foreign key constraint
var fkInfo string
err = db.conn.QueryRow(`
SELECT sql FROM sqlite_master
WHERE type='table' AND name='file_chunks'
`).Scan(&fkInfo)
if err != nil {
t.Fatal(err)
}
t.Logf("file_chunks table definition:\n%s", fkInfo)
// Delete the file
t.Log("Deleting file...")
err = repos.Files.DeleteByID(ctx, nil, file.ID)
if err != nil {
t.Fatalf("failed to delete file: %v", err)
}
// Verify file is gone
deletedFile, err := repos.Files.GetByID(ctx, file.ID)
if err != nil {
t.Fatal(err)
}
if deletedFile != nil {
t.Error("file should have been deleted")
} else {
t.Log("File was successfully deleted")
}
// Check file chunks after delete
fileChunks, err = repos.FileChunks.GetByFileID(ctx, file.ID)
if err != nil {
t.Fatal(err)
}
t.Logf("File chunks after delete: %d", len(fileChunks))
// Manually check the database
var count int
err = db.conn.QueryRow("SELECT COUNT(*) FROM file_chunks WHERE file_id = ?", file.ID).Scan(&count)
if err != nil {
t.Fatal(err)
}
t.Logf("Manual count of file_chunks for deleted file: %d", count)
if len(fileChunks) != 0 {
t.Errorf("expected 0 file chunks after cascade delete, got %d", len(fileChunks))
// List the remaining chunks
for _, fc := range fileChunks {
t.Logf("Remaining chunk: file_id=%s, idx=%d, chunk=%s", fc.FileID, fc.Idx, fc.ChunkHash)
}
}
}

View File

@@ -16,16 +16,16 @@ func NewChunkFileRepository(db *DB) *ChunkFileRepository {
func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkFile) error {
query := `
INSERT INTO chunk_files (chunk_hash, file_path, file_offset, length)
INSERT INTO chunk_files (chunk_hash, file_id, file_offset, length)
VALUES (?, ?, ?, ?)
ON CONFLICT(chunk_hash, file_path) DO NOTHING
ON CONFLICT(chunk_hash, file_id) DO NOTHING
`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, cf.ChunkHash, cf.FilePath, cf.FileOffset, cf.Length)
_, err = tx.ExecContext(ctx, query, cf.ChunkHash, cf.FileID, cf.FileOffset, cf.Length)
} else {
_, err = r.db.ExecWithLock(ctx, query, cf.ChunkHash, cf.FilePath, cf.FileOffset, cf.Length)
_, err = r.db.ExecWithLog(ctx, query, cf.ChunkHash, cf.FileID, cf.FileOffset, cf.Length)
}
if err != nil {
@@ -37,7 +37,7 @@ func (r *ChunkFileRepository) Create(ctx context.Context, tx *sql.Tx, cf *ChunkF
func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash string) ([]*ChunkFile, error) {
query := `
SELECT chunk_hash, file_path, file_offset, length
SELECT chunk_hash, file_id, file_offset, length
FROM chunk_files
WHERE chunk_hash = ?
`
@@ -51,7 +51,7 @@ func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash stri
var chunkFiles []*ChunkFile
for rows.Next() {
var cf ChunkFile
err := rows.Scan(&cf.ChunkHash, &cf.FilePath, &cf.FileOffset, &cf.Length)
err := rows.Scan(&cf.ChunkHash, &cf.FileID, &cf.FileOffset, &cf.Length)
if err != nil {
return nil, fmt.Errorf("scanning chunk file: %w", err)
}
@@ -63,9 +63,10 @@ func (r *ChunkFileRepository) GetByChunkHash(ctx context.Context, chunkHash stri
func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string) ([]*ChunkFile, error) {
query := `
SELECT chunk_hash, file_path, file_offset, length
FROM chunk_files
WHERE file_path = ?
SELECT cf.chunk_hash, cf.file_id, cf.file_offset, cf.length
FROM chunk_files cf
JOIN files f ON cf.file_id = f.id
WHERE f.path = ?
`
rows, err := r.db.conn.QueryContext(ctx, query, filePath)
@@ -77,7 +78,34 @@ func (r *ChunkFileRepository) GetByFilePath(ctx context.Context, filePath string
var chunkFiles []*ChunkFile
for rows.Next() {
var cf ChunkFile
err := rows.Scan(&cf.ChunkHash, &cf.FilePath, &cf.FileOffset, &cf.Length)
err := rows.Scan(&cf.ChunkHash, &cf.FileID, &cf.FileOffset, &cf.Length)
if err != nil {
return nil, fmt.Errorf("scanning chunk file: %w", err)
}
chunkFiles = append(chunkFiles, &cf)
}
return chunkFiles, rows.Err()
}
// GetByFileID retrieves chunk files by file ID
func (r *ChunkFileRepository) GetByFileID(ctx context.Context, fileID string) ([]*ChunkFile, error) {
query := `
SELECT chunk_hash, file_id, file_offset, length
FROM chunk_files
WHERE file_id = ?
`
rows, err := r.db.conn.QueryContext(ctx, query, fileID)
if err != nil {
return nil, fmt.Errorf("querying chunk files: %w", err)
}
defer CloseRows(rows)
var chunkFiles []*ChunkFile
for rows.Next() {
var cf ChunkFile
err := rows.Scan(&cf.ChunkHash, &cf.FileID, &cf.FileOffset, &cf.Length)
if err != nil {
return nil, fmt.Errorf("scanning chunk file: %w", err)
}

View File

@@ -3,6 +3,7 @@ package database
import (
"context"
"testing"
"time"
)
func TestChunkFileRepository(t *testing.T) {
@@ -11,16 +12,49 @@ func TestChunkFileRepository(t *testing.T) {
ctx := context.Background()
repo := NewChunkFileRepository(db)
fileRepo := NewFileRepository(db)
// Create test files first
testTime := time.Now().Truncate(time.Second)
file1 := &File{
Path: "/file1.txt",
MTime: testTime,
CTime: testTime,
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
LinkTarget: "",
}
err := fileRepo.Create(ctx, nil, file1)
if err != nil {
t.Fatalf("failed to create file1: %v", err)
}
file2 := &File{
Path: "/file2.txt",
MTime: testTime,
CTime: testTime,
Size: 1024,
Mode: 0644,
UID: 1000,
GID: 1000,
LinkTarget: "",
}
err = fileRepo.Create(ctx, nil, file2)
if err != nil {
t.Fatalf("failed to create file2: %v", err)
}
// Test Create
cf1 := &ChunkFile{
ChunkHash: "chunk1",
FilePath: "/file1.txt",
FileID: file1.ID,
FileOffset: 0,
Length: 1024,
}
err := repo.Create(ctx, nil, cf1)
err = repo.Create(ctx, nil, cf1)
if err != nil {
t.Fatalf("failed to create chunk file: %v", err)
}
@@ -28,7 +62,7 @@ func TestChunkFileRepository(t *testing.T) {
// Add same chunk in different file (deduplication scenario)
cf2 := &ChunkFile{
ChunkHash: "chunk1",
FilePath: "/file2.txt",
FileID: file2.ID,
FileOffset: 2048,
Length: 1024,
}
@@ -50,10 +84,10 @@ func TestChunkFileRepository(t *testing.T) {
foundFile1 := false
foundFile2 := false
for _, cf := range chunkFiles {
if cf.FilePath == "/file1.txt" && cf.FileOffset == 0 {
if cf.FileID == file1.ID && cf.FileOffset == 0 {
foundFile1 = true
}
if cf.FilePath == "/file2.txt" && cf.FileOffset == 2048 {
if cf.FileID == file2.ID && cf.FileOffset == 2048 {
foundFile2 = true
}
}
@@ -61,10 +95,10 @@ func TestChunkFileRepository(t *testing.T) {
t.Error("not all expected files found")
}
// Test GetByFilePath
chunkFiles, err = repo.GetByFilePath(ctx, "/file1.txt")
// Test GetByFileID
chunkFiles, err = repo.GetByFileID(ctx, file1.ID)
if err != nil {
t.Fatalf("failed to get chunks by file path: %v", err)
t.Fatalf("failed to get chunks by file ID: %v", err)
}
if len(chunkFiles) != 1 {
t.Errorf("expected 1 chunk for file, got %d", len(chunkFiles))
@@ -86,6 +120,23 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
ctx := context.Background()
repo := NewChunkFileRepository(db)
fileRepo := NewFileRepository(db)
// Create test files
testTime := time.Now().Truncate(time.Second)
file1 := &File{Path: "/file1.txt", MTime: testTime, CTime: testTime, Size: 3072, Mode: 0644, UID: 1000, GID: 1000}
file2 := &File{Path: "/file2.txt", MTime: testTime, CTime: testTime, Size: 3072, Mode: 0644, UID: 1000, GID: 1000}
file3 := &File{Path: "/file3.txt", MTime: testTime, CTime: testTime, Size: 2048, Mode: 0644, UID: 1000, GID: 1000}
if err := fileRepo.Create(ctx, nil, file1); err != nil {
t.Fatalf("failed to create file1: %v", err)
}
if err := fileRepo.Create(ctx, nil, file2); err != nil {
t.Fatalf("failed to create file2: %v", err)
}
if err := fileRepo.Create(ctx, nil, file3); err != nil {
t.Fatalf("failed to create file3: %v", err)
}
// Simulate a scenario where multiple files share chunks
// File1: chunk1, chunk2, chunk3
@@ -94,16 +145,16 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
chunkFiles := []ChunkFile{
// File1
{ChunkHash: "chunk1", FilePath: "/file1.txt", FileOffset: 0, Length: 1024},
{ChunkHash: "chunk2", FilePath: "/file1.txt", FileOffset: 1024, Length: 1024},
{ChunkHash: "chunk3", FilePath: "/file1.txt", FileOffset: 2048, Length: 1024},
{ChunkHash: "chunk1", FileID: file1.ID, FileOffset: 0, Length: 1024},
{ChunkHash: "chunk2", FileID: file1.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: "chunk3", FileID: file1.ID, FileOffset: 2048, Length: 1024},
// File2
{ChunkHash: "chunk2", FilePath: "/file2.txt", FileOffset: 0, Length: 1024},
{ChunkHash: "chunk3", FilePath: "/file2.txt", FileOffset: 1024, Length: 1024},
{ChunkHash: "chunk4", FilePath: "/file2.txt", FileOffset: 2048, Length: 1024},
{ChunkHash: "chunk2", FileID: file2.ID, FileOffset: 0, Length: 1024},
{ChunkHash: "chunk3", FileID: file2.ID, FileOffset: 1024, Length: 1024},
{ChunkHash: "chunk4", FileID: file2.ID, FileOffset: 2048, Length: 1024},
// File3
{ChunkHash: "chunk1", FilePath: "/file3.txt", FileOffset: 0, Length: 1024},
{ChunkHash: "chunk4", FilePath: "/file3.txt", FileOffset: 1024, Length: 1024},
{ChunkHash: "chunk1", FileID: file3.ID, FileOffset: 0, Length: 1024},
{ChunkHash: "chunk4", FileID: file3.ID, FileOffset: 1024, Length: 1024},
}
for _, cf := range chunkFiles {
@@ -132,7 +183,7 @@ func TestChunkFileRepositoryComplexDeduplication(t *testing.T) {
}
// Test file2 chunks
chunks, err := repo.GetByFilePath(ctx, "/file2.txt")
chunks, err := repo.GetByFileID(ctx, file2.ID)
if err != nil {
t.Fatalf("failed to get chunks for file2: %v", err)
}

View File

@@ -4,6 +4,8 @@ import (
"context"
"database/sql"
"fmt"
"git.eeqj.de/sneak/vaultik/internal/log"
)
type ChunkRepository struct {
@@ -25,7 +27,7 @@ func (r *ChunkRepository) Create(ctx context.Context, tx *sql.Tx, chunk *Chunk)
if tx != nil {
_, err = tx.ExecContext(ctx, query, chunk.ChunkHash, chunk.SHA256, chunk.Size)
} else {
_, err = r.db.ExecWithLock(ctx, query, chunk.ChunkHash, chunk.SHA256, chunk.Size)
_, err = r.db.ExecWithLog(ctx, query, chunk.ChunkHash, chunk.SHA256, chunk.Size)
}
if err != nil {
@@ -139,3 +141,26 @@ func (r *ChunkRepository) ListUnpacked(ctx context.Context, limit int) ([]*Chunk
return chunks, rows.Err()
}
// DeleteOrphaned deletes chunks that are not referenced by any file
func (r *ChunkRepository) DeleteOrphaned(ctx context.Context) error {
query := `
DELETE FROM chunks
WHERE NOT EXISTS (
SELECT 1 FROM file_chunks
WHERE file_chunks.chunk_hash = chunks.chunk_hash
)
`
result, err := r.db.ExecWithLog(ctx, query)
if err != nil {
return fmt.Errorf("deleting orphaned chunks: %w", err)
}
rowsAffected, _ := result.RowsAffected()
if rowsAffected > 0 {
log.Debug("Deleted orphaned chunks", "count", rowsAffected)
}
return nil
}

View File

@@ -1,84 +1,158 @@
// Package database provides the local SQLite index for Vaultik backup operations.
// The database tracks files, chunks, and their associations with blobs.
//
// Blobs in Vaultik are the final storage units uploaded to S3. Each blob is a
// large (up to 10GB) file containing many compressed and encrypted chunks from
// multiple source files. Blobs are content-addressed, meaning their filename
// is derived from their SHA256 hash after compression and encryption.
//
// The database does not support migrations. If the schema changes, delete
// the local database and perform a full backup to recreate it.
package database
import (
"context"
"database/sql"
_ "embed"
"fmt"
"os"
"strings"
"sync"
"git.eeqj.de/sneak/vaultik/internal/log"
_ "modernc.org/sqlite"
)
//go:embed schema.sql
var schemaSQL string
// DB represents the Vaultik local index database connection.
// It uses SQLite to track file metadata, content-defined chunks, and blob associations.
// The database enables incremental backups by detecting changed files and
// supports deduplication by tracking which chunks are already stored in blobs.
// Write operations are synchronized through a mutex to ensure thread safety.
type DB struct {
conn *sql.DB
writeLock sync.Mutex
conn *sql.DB
path string
}
// New creates a new database connection at the specified path.
// It automatically handles database recovery, creates the schema if needed,
// and configures SQLite with appropriate settings for performance and reliability.
// The database uses WAL mode for better concurrency and sets a busy timeout
// to handle concurrent access gracefully.
//
// If the database appears locked, it will attempt recovery by removing stale
// lock files and switching temporarily to TRUNCATE journal mode.
//
// New creates a new database connection at the specified path.
// It automatically handles recovery from stale locks, creates the schema if needed,
// and configures SQLite with WAL mode for better concurrency.
// The path parameter can be a file path for persistent storage or ":memory:"
// for an in-memory database (useful for testing).
func New(ctx context.Context, path string) (*DB, error) {
log.Debug("Opening database connection", "path", path)
// First, try to recover from any stale locks
if err := recoverDatabase(ctx, path); err != nil {
log.Warn("Failed to recover database", "error", err)
}
// First attempt with standard WAL mode
conn, err := sql.Open("sqlite", path+"?_journal_mode=WAL&_synchronous=NORMAL&_busy_timeout=10000&_locking_mode=NORMAL")
log.Debug("Attempting to open database with WAL mode", "path", path)
conn, err := sql.Open(
"sqlite",
path+"?_journal_mode=WAL&_synchronous=NORMAL&_busy_timeout=10000&_locking_mode=NORMAL&_foreign_keys=ON",
)
if err == nil {
// Set connection pool settings to ensure proper cleanup
conn.SetMaxOpenConns(1) // SQLite only supports one writer
// Set connection pool settings
// SQLite can handle multiple readers but only one writer at a time.
// Setting MaxOpenConns to 1 ensures all writes are serialized through
// a single connection, preventing SQLITE_BUSY errors.
conn.SetMaxOpenConns(1)
conn.SetMaxIdleConns(1)
if err := conn.PingContext(ctx); err == nil {
// Success on first try
db := &DB{conn: conn}
log.Debug("Database opened successfully with WAL mode", "path", path)
// Enable foreign keys explicitly
if _, err := conn.ExecContext(ctx, "PRAGMA foreign_keys = ON"); err != nil {
log.Warn("Failed to enable foreign keys", "error", err)
}
db := &DB{conn: conn, path: path}
if err := db.createSchema(ctx); err != nil {
_ = conn.Close()
return nil, fmt.Errorf("creating schema: %w", err)
}
return db, nil
}
log.Debug("Failed to ping database, closing connection", "path", path, "error", err)
_ = conn.Close()
}
// If first attempt failed, try with TRUNCATE mode to clear any locks
log.Info("Database appears locked, attempting recovery with TRUNCATE mode")
conn, err = sql.Open("sqlite", path+"?_journal_mode=TRUNCATE&_synchronous=NORMAL&_busy_timeout=10000")
log.Info(
"Database appears locked, attempting recovery with TRUNCATE mode",
"path", path,
)
conn, err = sql.Open(
"sqlite",
path+"?_journal_mode=TRUNCATE&_synchronous=NORMAL&_busy_timeout=10000&_foreign_keys=ON",
)
if err != nil {
return nil, fmt.Errorf("opening database in recovery mode: %w", err)
}
// Set connection pool settings
// SQLite can handle multiple readers but only one writer at a time.
// Setting MaxOpenConns to 1 ensures all writes are serialized through
// a single connection, preventing SQLITE_BUSY errors.
conn.SetMaxOpenConns(1)
conn.SetMaxIdleConns(1)
if err := conn.PingContext(ctx); err != nil {
log.Debug("Failed to ping database in recovery mode, closing", "path", path, "error", err)
_ = conn.Close()
return nil, fmt.Errorf("database still locked after recovery attempt: %w", err)
return nil, fmt.Errorf(
"database still locked after recovery attempt: %w",
err,
)
}
log.Debug("Database opened in TRUNCATE mode", "path", path)
// Switch back to WAL mode
log.Debug("Switching database back to WAL mode", "path", path)
if _, err := conn.ExecContext(ctx, "PRAGMA journal_mode=WAL"); err != nil {
log.Warn("Failed to switch back to WAL mode", "error", err)
log.Warn("Failed to switch back to WAL mode", "path", path, "error", err)
}
db := &DB{conn: conn}
// Ensure foreign keys are enabled
if _, err := conn.ExecContext(ctx, "PRAGMA foreign_keys=ON"); err != nil {
log.Warn("Failed to enable foreign keys", "path", path, "error", err)
}
db := &DB{conn: conn, path: path}
if err := db.createSchema(ctx); err != nil {
_ = conn.Close()
return nil, fmt.Errorf("creating schema: %w", err)
}
log.Debug("Database connection established successfully", "path", path)
return db, nil
}
// Close closes the database connection.
// It ensures all pending operations are completed before closing.
// Returns an error if the database connection cannot be closed properly.
func (db *DB) Close() error {
log.Debug("Closing database connection")
log.Debug("Closing database connection", "path", db.path)
if err := db.conn.Close(); err != nil {
log.Error("Failed to close database", "error", err)
log.Error("Failed to close database", "path", db.path, "error", err)
return fmt.Errorf("failed to close database: %w", err)
}
log.Debug("Database connection closed successfully")
log.Debug("Database connection closed successfully", "path", db.path)
return nil
}
@@ -138,148 +212,79 @@ func recoverDatabase(ctx context.Context, path string) error {
return nil
}
// Conn returns the underlying *sql.DB connection.
// This should be used sparingly and primarily for read operations.
// For write operations, prefer using the ExecWithLog method.
func (db *DB) Conn() *sql.DB {
return db.conn
}
func (db *DB) BeginTx(ctx context.Context, opts *sql.TxOptions) (*sql.Tx, error) {
// BeginTx starts a new database transaction with the given options.
// The caller is responsible for committing or rolling back the transaction.
// For write transactions, consider using the Repositories.WithTx method instead,
// which handles locking and rollback automatically.
func (db *DB) BeginTx(
ctx context.Context,
opts *sql.TxOptions,
) (*sql.Tx, error) {
return db.conn.BeginTx(ctx, opts)
}
// LockForWrite acquires the write lock
func (db *DB) LockForWrite() {
log.Debug("Attempting to acquire write lock")
db.writeLock.Lock()
log.Debug("Write lock acquired")
}
// UnlockWrite releases the write lock
func (db *DB) UnlockWrite() {
log.Debug("Releasing write lock")
db.writeLock.Unlock()
log.Debug("Write lock released")
}
// ExecWithLock executes a write query with the write lock held
func (db *DB) ExecWithLock(ctx context.Context, query string, args ...interface{}) (sql.Result, error) {
db.writeLock.Lock()
defer db.writeLock.Unlock()
// Note: LockForWrite and UnlockWrite methods have been removed.
// SQLite handles its own locking internally, so explicit locking is not needed.
// ExecWithLog executes a write query with SQL logging.
// SQLite handles its own locking internally, so we just pass through to ExecContext.
// The query and args parameters follow the same format as sql.DB.ExecContext.
func (db *DB) ExecWithLog(
ctx context.Context,
query string,
args ...interface{},
) (sql.Result, error) {
LogSQL("Execute", query, args...)
return db.conn.ExecContext(ctx, query, args...)
}
// QueryRowWithLock executes a write query that returns a row with the write lock held
func (db *DB) QueryRowWithLock(ctx context.Context, query string, args ...interface{}) *sql.Row {
db.writeLock.Lock()
defer db.writeLock.Unlock()
// QueryRowWithLog executes a query that returns at most one row with SQL logging.
// This is useful for queries that modify data and return values (e.g., INSERT ... RETURNING).
// SQLite handles its own locking internally.
// The query and args parameters follow the same format as sql.DB.QueryRowContext.
func (db *DB) QueryRowWithLog(
ctx context.Context,
query string,
args ...interface{},
) *sql.Row {
LogSQL("QueryRow", query, args...)
return db.conn.QueryRowContext(ctx, query, args...)
}
func (db *DB) createSchema(ctx context.Context) error {
schema := `
CREATE TABLE IF NOT EXISTS files (
path TEXT PRIMARY KEY,
mtime INTEGER NOT NULL,
ctime INTEGER NOT NULL,
size INTEGER NOT NULL,
mode INTEGER NOT NULL,
uid INTEGER NOT NULL,
gid INTEGER NOT NULL,
link_target TEXT
);
CREATE TABLE IF NOT EXISTS file_chunks (
path TEXT NOT NULL,
idx INTEGER NOT NULL,
chunk_hash TEXT NOT NULL,
PRIMARY KEY (path, idx)
);
CREATE TABLE IF NOT EXISTS chunks (
chunk_hash TEXT PRIMARY KEY,
sha256 TEXT NOT NULL,
size INTEGER NOT NULL
);
CREATE TABLE IF NOT EXISTS blobs (
id TEXT PRIMARY KEY,
blob_hash TEXT UNIQUE,
created_ts INTEGER NOT NULL,
finished_ts INTEGER,
uncompressed_size INTEGER NOT NULL DEFAULT 0,
compressed_size INTEGER NOT NULL DEFAULT 0,
uploaded_ts INTEGER
);
CREATE TABLE IF NOT EXISTS blob_chunks (
blob_id TEXT NOT NULL,
chunk_hash TEXT NOT NULL,
offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (blob_id, chunk_hash),
FOREIGN KEY (blob_id) REFERENCES blobs(id)
);
CREATE TABLE IF NOT EXISTS chunk_files (
chunk_hash TEXT NOT NULL,
file_path TEXT NOT NULL,
file_offset INTEGER NOT NULL,
length INTEGER NOT NULL,
PRIMARY KEY (chunk_hash, file_path)
);
CREATE TABLE IF NOT EXISTS snapshots (
id TEXT PRIMARY KEY,
hostname TEXT NOT NULL,
vaultik_version TEXT NOT NULL,
started_at INTEGER NOT NULL,
completed_at INTEGER,
file_count INTEGER NOT NULL DEFAULT 0,
chunk_count INTEGER NOT NULL DEFAULT 0,
blob_count INTEGER NOT NULL DEFAULT 0,
total_size INTEGER NOT NULL DEFAULT 0,
blob_size INTEGER NOT NULL DEFAULT 0,
compression_ratio REAL NOT NULL DEFAULT 1.0
);
CREATE TABLE IF NOT EXISTS snapshot_files (
snapshot_id TEXT NOT NULL,
file_path TEXT NOT NULL,
PRIMARY KEY (snapshot_id, file_path),
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id) ON DELETE CASCADE,
FOREIGN KEY (file_path) REFERENCES files(path) ON DELETE CASCADE
);
CREATE TABLE IF NOT EXISTS snapshot_blobs (
snapshot_id TEXT NOT NULL,
blob_id TEXT NOT NULL,
blob_hash TEXT NOT NULL,
PRIMARY KEY (snapshot_id, blob_id),
FOREIGN KEY (snapshot_id) REFERENCES snapshots(id) ON DELETE CASCADE,
FOREIGN KEY (blob_id) REFERENCES blobs(id) ON DELETE CASCADE
);
CREATE TABLE IF NOT EXISTS uploads (
blob_hash TEXT PRIMARY KEY,
uploaded_at INTEGER NOT NULL,
size INTEGER NOT NULL,
duration_ms INTEGER NOT NULL
);
`
_, err := db.conn.ExecContext(ctx, schema)
_, err := db.conn.ExecContext(ctx, schemaSQL)
return err
}
// NewTestDB creates an in-memory SQLite database for testing
// NewTestDB creates an in-memory SQLite database for testing purposes.
// The database is automatically initialized with the schema and is ready for use.
// Each call creates a new independent database instance.
func NewTestDB() (*DB, error) {
return New(context.Background(), ":memory:")
}
// LogSQL logs SQL queries if debug mode is enabled
// LogSQL logs SQL queries and their arguments when debug mode is enabled.
// Debug mode is activated by setting the GODEBUG environment variable to include "vaultik".
// This is useful for troubleshooting database operations and understanding query patterns.
//
// The operation parameter describes the type of SQL operation (e.g., "Execute", "Query").
// The query parameter is the SQL statement being executed.
// The args parameter contains the query arguments that will be interpolated.
func LogSQL(operation, query string, args ...interface{}) {
if strings.Contains(os.Getenv("GODEBUG"), "vaultik") {
log.Debug("SQL "+operation, "query", strings.TrimSpace(query), "args", fmt.Sprintf("%v", args))
log.Debug(
"SQL "+operation,
"query",
strings.TrimSpace(query),
"args",
fmt.Sprintf("%v", args),
)
}
}

View File

@@ -67,21 +67,26 @@ func TestDatabaseConcurrentAccess(t *testing.T) {
}()
// Test concurrent writes
done := make(chan bool, 10)
type result struct {
index int
err error
}
results := make(chan result, 10)
for i := 0; i < 10; i++ {
go func(i int) {
_, err := db.ExecWithLock(ctx, "INSERT INTO chunks (chunk_hash, sha256, size) VALUES (?, ?, ?)",
_, err := db.ExecWithLog(ctx, "INSERT INTO chunks (chunk_hash, sha256, size) VALUES (?, ?, ?)",
fmt.Sprintf("hash%d", i), fmt.Sprintf("sha%d", i), i*1024)
if err != nil {
t.Errorf("concurrent insert failed: %v", err)
}
done <- true
results <- result{index: i, err: err}
}(i)
}
// Wait for all goroutines
// Wait for all goroutines and check results
for i := 0; i < 10; i++ {
<-done
r := <-results
if r.err != nil {
t.Fatalf("concurrent insert %d failed: %v", r.index, r.err)
}
}
// Verify all inserts succeeded

View File

@@ -16,16 +16,16 @@ func NewFileChunkRepository(db *DB) *FileChunkRepository {
func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileChunk) error {
query := `
INSERT INTO file_chunks (path, idx, chunk_hash)
INSERT INTO file_chunks (file_id, idx, chunk_hash)
VALUES (?, ?, ?)
ON CONFLICT(path, idx) DO NOTHING
ON CONFLICT(file_id, idx) DO NOTHING
`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, fc.Path, fc.Idx, fc.ChunkHash)
_, err = tx.ExecContext(ctx, query, fc.FileID, fc.Idx, fc.ChunkHash)
} else {
_, err = r.db.ExecWithLock(ctx, query, fc.Path, fc.Idx, fc.ChunkHash)
_, err = r.db.ExecWithLog(ctx, query, fc.FileID, fc.Idx, fc.ChunkHash)
}
if err != nil {
@@ -37,10 +37,11 @@ func (r *FileChunkRepository) Create(ctx context.Context, tx *sql.Tx, fc *FileCh
func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*FileChunk, error) {
query := `
SELECT path, idx, chunk_hash
FROM file_chunks
WHERE path = ?
ORDER BY idx
SELECT fc.file_id, fc.idx, fc.chunk_hash
FROM file_chunks fc
JOIN files f ON fc.file_id = f.id
WHERE f.path = ?
ORDER BY fc.idx
`
rows, err := r.db.conn.QueryContext(ctx, query, path)
@@ -52,7 +53,35 @@ func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*Fi
var fileChunks []*FileChunk
for rows.Next() {
var fc FileChunk
err := rows.Scan(&fc.Path, &fc.Idx, &fc.ChunkHash)
err := rows.Scan(&fc.FileID, &fc.Idx, &fc.ChunkHash)
if err != nil {
return nil, fmt.Errorf("scanning file chunk: %w", err)
}
fileChunks = append(fileChunks, &fc)
}
return fileChunks, rows.Err()
}
// GetByFileID retrieves file chunks by file ID
func (r *FileChunkRepository) GetByFileID(ctx context.Context, fileID string) ([]*FileChunk, error) {
query := `
SELECT file_id, idx, chunk_hash
FROM file_chunks
WHERE file_id = ?
ORDER BY idx
`
rows, err := r.db.conn.QueryContext(ctx, query, fileID)
if err != nil {
return nil, fmt.Errorf("querying file chunks: %w", err)
}
defer CloseRows(rows)
var fileChunks []*FileChunk
for rows.Next() {
var fc FileChunk
err := rows.Scan(&fc.FileID, &fc.Idx, &fc.ChunkHash)
if err != nil {
return nil, fmt.Errorf("scanning file chunk: %w", err)
}
@@ -65,10 +94,11 @@ func (r *FileChunkRepository) GetByPath(ctx context.Context, path string) ([]*Fi
// GetByPathTx retrieves file chunks within a transaction
func (r *FileChunkRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path string) ([]*FileChunk, error) {
query := `
SELECT path, idx, chunk_hash
FROM file_chunks
WHERE path = ?
ORDER BY idx
SELECT fc.file_id, fc.idx, fc.chunk_hash
FROM file_chunks fc
JOIN files f ON fc.file_id = f.id
WHERE f.path = ?
ORDER BY fc.idx
`
LogSQL("GetByPathTx", query, path)
@@ -81,7 +111,7 @@ func (r *FileChunkRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path
var fileChunks []*FileChunk
for rows.Next() {
var fc FileChunk
err := rows.Scan(&fc.Path, &fc.Idx, &fc.ChunkHash)
err := rows.Scan(&fc.FileID, &fc.Idx, &fc.ChunkHash)
if err != nil {
return nil, fmt.Errorf("scanning file chunk: %w", err)
}
@@ -93,13 +123,31 @@ func (r *FileChunkRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path
}
func (r *FileChunkRepository) DeleteByPath(ctx context.Context, tx *sql.Tx, path string) error {
query := `DELETE FROM file_chunks WHERE path = ?`
query := `DELETE FROM file_chunks WHERE file_id = (SELECT id FROM files WHERE path = ?)`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, path)
} else {
_, err = r.db.ExecWithLock(ctx, query, path)
_, err = r.db.ExecWithLog(ctx, query, path)
}
if err != nil {
return fmt.Errorf("deleting file chunks: %w", err)
}
return nil
}
// DeleteByFileID deletes all chunks for a file by its UUID
func (r *FileChunkRepository) DeleteByFileID(ctx context.Context, tx *sql.Tx, fileID string) error {
query := `DELETE FROM file_chunks WHERE file_id = ?`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, fileID)
} else {
_, err = r.db.ExecWithLog(ctx, query, fileID)
}
if err != nil {

View File

@@ -4,6 +4,7 @@ import (
"context"
"fmt"
"testing"
"time"
)
func TestFileChunkRepository(t *testing.T) {
@@ -12,22 +13,40 @@ func TestFileChunkRepository(t *testing.T) {
ctx := context.Background()
repo := NewFileChunkRepository(db)
fileRepo := NewFileRepository(db)
// Create test file first
testTime := time.Now().Truncate(time.Second)
file := &File{
Path: "/test/file.txt",
MTime: testTime,
CTime: testTime,
Size: 3072,
Mode: 0644,
UID: 1000,
GID: 1000,
LinkTarget: "",
}
err := fileRepo.Create(ctx, nil, file)
if err != nil {
t.Fatalf("failed to create file: %v", err)
}
// Test Create
fc1 := &FileChunk{
Path: "/test/file.txt",
FileID: file.ID,
Idx: 0,
ChunkHash: "chunk1",
}
err := repo.Create(ctx, nil, fc1)
err = repo.Create(ctx, nil, fc1)
if err != nil {
t.Fatalf("failed to create file chunk: %v", err)
}
// Add more chunks for the same file
fc2 := &FileChunk{
Path: "/test/file.txt",
FileID: file.ID,
Idx: 1,
ChunkHash: "chunk2",
}
@@ -37,7 +56,7 @@ func TestFileChunkRepository(t *testing.T) {
}
fc3 := &FileChunk{
Path: "/test/file.txt",
FileID: file.ID,
Idx: 2,
ChunkHash: "chunk3",
}
@@ -46,8 +65,8 @@ func TestFileChunkRepository(t *testing.T) {
t.Fatalf("failed to create third file chunk: %v", err)
}
// Test GetByPath
chunks, err := repo.GetByPath(ctx, "/test/file.txt")
// Test GetByFile
chunks, err := repo.GetByFile(ctx, "/test/file.txt")
if err != nil {
t.Fatalf("failed to get file chunks: %v", err)
}
@@ -68,13 +87,13 @@ func TestFileChunkRepository(t *testing.T) {
t.Fatalf("failed to create duplicate file chunk: %v", err)
}
// Test DeleteByPath
err = repo.DeleteByPath(ctx, nil, "/test/file.txt")
// Test DeleteByFileID
err = repo.DeleteByFileID(ctx, nil, file.ID)
if err != nil {
t.Fatalf("failed to delete file chunks: %v", err)
}
chunks, err = repo.GetByPath(ctx, "/test/file.txt")
chunks, err = repo.GetByFileID(ctx, file.ID)
if err != nil {
t.Fatalf("failed to get deleted file chunks: %v", err)
}
@@ -89,15 +108,38 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
ctx := context.Background()
repo := NewFileChunkRepository(db)
fileRepo := NewFileRepository(db)
// Create test files
testTime := time.Now().Truncate(time.Second)
filePaths := []string{"/file1.txt", "/file2.txt", "/file3.txt"}
files := make([]*File, len(filePaths))
for i, path := range filePaths {
file := &File{
Path: path,
MTime: testTime,
CTime: testTime,
Size: 2048,
Mode: 0644,
UID: 1000,
GID: 1000,
LinkTarget: "",
}
err := fileRepo.Create(ctx, nil, file)
if err != nil {
t.Fatalf("failed to create file %s: %v", path, err)
}
files[i] = file
}
// Create chunks for multiple files
files := []string{"/file1.txt", "/file2.txt", "/file3.txt"}
for _, path := range files {
for i := 0; i < 2; i++ {
for i, file := range files {
for j := 0; j < 2; j++ {
fc := &FileChunk{
Path: path,
Idx: i,
ChunkHash: fmt.Sprintf("%s_chunk%d", path, i),
FileID: file.ID,
Idx: j,
ChunkHash: fmt.Sprintf("file%d_chunk%d", i, j),
}
err := repo.Create(ctx, nil, fc)
if err != nil {
@@ -107,13 +149,13 @@ func TestFileChunkRepositoryMultipleFiles(t *testing.T) {
}
// Verify each file has correct chunks
for _, path := range files {
chunks, err := repo.GetByPath(ctx, path)
for i, file := range files {
chunks, err := repo.GetByFileID(ctx, file.ID)
if err != nil {
t.Fatalf("failed to get chunks for %s: %v", path, err)
t.Fatalf("failed to get chunks for file %d: %v", i, err)
}
if len(chunks) != 2 {
t.Errorf("expected 2 chunks for %s, got %d", path, len(chunks))
t.Errorf("expected 2 chunks for file %d, got %d", i, len(chunks))
}
}
}

View File

@@ -5,6 +5,9 @@ import (
"database/sql"
"fmt"
"time"
"git.eeqj.de/sneak/vaultik/internal/log"
"github.com/google/uuid"
)
type FileRepository struct {
@@ -16,10 +19,16 @@ func NewFileRepository(db *DB) *FileRepository {
}
func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) error {
// Generate UUID if not provided
if file.ID == "" {
file.ID = uuid.New().String()
}
query := `
INSERT INTO files (path, mtime, ctime, size, mode, uid, gid, link_target)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
INSERT INTO files (id, path, mtime, ctime, size, mode, uid, gid, link_target)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(path) DO UPDATE SET
id = excluded.id,
mtime = excluded.mtime,
ctime = excluded.ctime,
size = excluded.size,
@@ -27,14 +36,15 @@ func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) err
uid = excluded.uid,
gid = excluded.gid,
link_target = excluded.link_target
RETURNING id
`
var err error
if tx != nil {
LogSQL("Execute", query, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget)
_, err = tx.ExecContext(ctx, query, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget)
LogSQL("Execute", query, file.ID, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget)
err = tx.QueryRowContext(ctx, query, file.ID, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget).Scan(&file.ID)
} else {
_, err = r.db.ExecWithLock(ctx, query, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget)
err = r.db.QueryRowWithLog(ctx, query, file.ID, file.Path, file.MTime.Unix(), file.CTime.Unix(), file.Size, file.Mode, file.UID, file.GID, file.LinkTarget).Scan(&file.ID)
}
if err != nil {
@@ -46,7 +56,7 @@ func (r *FileRepository) Create(ctx context.Context, tx *sql.Tx, file *File) err
func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, error) {
query := `
SELECT path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE path = ?
`
@@ -56,6 +66,7 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err
var linkTarget sql.NullString
err := r.db.conn.QueryRowContext(ctx, query, path).Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
@@ -73,8 +84,48 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err
return nil, fmt.Errorf("querying file: %w", err)
}
file.MTime = time.Unix(mtimeUnix, 0)
file.CTime = time.Unix(ctimeUnix, 0)
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
}
return &file, nil
}
// GetByID retrieves a file by its UUID
func (r *FileRepository) GetByID(ctx context.Context, id string) (*File, error) {
query := `
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE id = ?
`
var file File
var mtimeUnix, ctimeUnix int64
var linkTarget sql.NullString
err := r.db.conn.QueryRowContext(ctx, query, id).Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
&file.Size,
&file.Mode,
&file.UID,
&file.GID,
&linkTarget,
)
if err == sql.ErrNoRows {
return nil, nil
}
if err != nil {
return nil, fmt.Errorf("querying file: %w", err)
}
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
}
@@ -84,7 +135,7 @@ func (r *FileRepository) GetByPath(ctx context.Context, path string) (*File, err
func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path string) (*File, error) {
query := `
SELECT path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE path = ?
`
@@ -95,6 +146,7 @@ func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path strin
LogSQL("GetByPathTx QueryRowContext", query, path)
err := tx.QueryRowContext(ctx, query, path).Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
@@ -113,8 +165,8 @@ func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path strin
return nil, fmt.Errorf("querying file: %w", err)
}
file.MTime = time.Unix(mtimeUnix, 0)
file.CTime = time.Unix(ctimeUnix, 0)
file.MTime = time.Unix(mtimeUnix, 0).UTC()
file.CTime = time.Unix(ctimeUnix, 0).UTC()
if linkTarget.Valid {
file.LinkTarget = linkTarget.String
}
@@ -124,7 +176,7 @@ func (r *FileRepository) GetByPathTx(ctx context.Context, tx *sql.Tx, path strin
func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time) ([]*File, error) {
query := `
SELECT path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE mtime >= ?
ORDER BY path
@@ -143,6 +195,7 @@ func (r *FileRepository) ListModifiedSince(ctx context.Context, since time.Time)
var linkTarget sql.NullString
err := rows.Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
@@ -175,7 +228,25 @@ func (r *FileRepository) Delete(ctx context.Context, tx *sql.Tx, path string) er
if tx != nil {
_, err = tx.ExecContext(ctx, query, path)
} else {
_, err = r.db.ExecWithLock(ctx, query, path)
_, err = r.db.ExecWithLog(ctx, query, path)
}
if err != nil {
return fmt.Errorf("deleting file: %w", err)
}
return nil
}
// DeleteByID deletes a file by its UUID
func (r *FileRepository) DeleteByID(ctx context.Context, tx *sql.Tx, id string) error {
query := `DELETE FROM files WHERE id = ?`
var err error
if tx != nil {
_, err = tx.ExecContext(ctx, query, id)
} else {
_, err = r.db.ExecWithLog(ctx, query, id)
}
if err != nil {
@@ -187,7 +258,7 @@ func (r *FileRepository) Delete(ctx context.Context, tx *sql.Tx, path string) er
func (r *FileRepository) ListByPrefix(ctx context.Context, prefix string) ([]*File, error) {
query := `
SELECT path, mtime, ctime, size, mode, uid, gid, link_target
SELECT id, path, mtime, ctime, size, mode, uid, gid, link_target
FROM files
WHERE path LIKE ? || '%'
ORDER BY path
@@ -206,6 +277,7 @@ func (r *FileRepository) ListByPrefix(ctx context.Context, prefix string) ([]*Fi
var linkTarget sql.NullString
err := rows.Scan(
&file.ID,
&file.Path,
&mtimeUnix,
&ctimeUnix,
@@ -230,3 +302,26 @@ func (r *FileRepository) ListByPrefix(ctx context.Context, prefix string) ([]*Fi
return files, rows.Err()
}
// DeleteOrphaned deletes files that are not referenced by any snapshot
func (r *FileRepository) DeleteOrphaned(ctx context.Context) error {
query := `
DELETE FROM files
WHERE NOT EXISTS (
SELECT 1 FROM snapshot_files
WHERE snapshot_files.file_id = files.id
)
`
result, err := r.db.ExecWithLog(ctx, query)
if err != nil {
return fmt.Errorf("deleting orphaned files: %w", err)
}
rowsAffected, _ := result.RowsAffected()
if rowsAffected > 0 {
log.Debug("Deleted orphaned files", "count", rowsAffected)
}