Merge docs/limitations-section

Add limitations section to README
Merge feature/keep-newer-than
2026-06-09 13:38:32 -04:00 · 2026-06-09 13:38:32 -04:00 · 2026-06-09 13:22:24 -04:00 · 2026-06-09 13:22:24 -04:00 · 2026-06-09 12:57:33 -04:00 · 2026-06-09 12:57:33 -04:00
10 changed files with 457 additions and 330 deletions
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -53,8 +53,8 @@ The database tracks five primary entities and their relationships:
 ### Entity Descriptions

 #### File (`database.File`)
-Represents a file or directory in the backup system. Stores metadata needed for restoration:
- Path, mtime
+Represents a file, directory, or symlink in the backup system. Stores metadata needed for restoration:
+- Path, source_path (for restore path stripping), mtime
 - Size, mode, ownership (uid, gid)
 - Symlink target (if applicable)

@@ -95,7 +95,7 @@ Maps chunks to their position within blobs:

 #### Snapshot (`database.Snapshot`)
 Represents a point-in-time backup:
- `ID`: Format is `{hostname}-{YYYYMMDD}-{HHMMSS}Z`
+- `ID`: Format is `{hostname}_{snapshot-name}_{RFC3339}` (e.g. `server1_home_2025-06-01T12:00:00Z`)
 - Tracks file count, chunk count, blob count, sizes, compression ratio
 - `CompletedAt`: Null until snapshot finishes successfully

@@ -127,7 +127,7 @@ fx.New(
    config.Module,                                   // 5. Config
    database.Module,                                 // 6. Database + Repositories
    log.Module,                                      // 7. Logger initialization
-    s3.Module,                                       // 8. S3 client
+    storage.Module,                                  // 8. Storage backend (S3/file/rclone)
    snapshot.Module,                                 // 9. SnapshotManager + ScannerFactory
    fx.Provide(vaultik.New),                         // 10. Vaultik orchestrator
 )
@@ -161,7 +161,7 @@ type Vaultik struct {
    Config          *config.Config
    DB              *database.DB
    Repositories    *database.Repositories
-    S3Client        *s3.Client
+    Storage         storage.Storer
    ScannerFactory  snapshot.ScannerFactory
    SnapshotManager *snapshot.SnapshotManager
    Shutdowner      fx.Shutdowner
@@ -341,12 +341,11 @@ CreateSnapshot(opts)
    └─► SnapshotManager.ExportSnapshotMetadata()
            │
            ├─► Copy database to temp file
-            ├─► Clean to only current snapshot data
-            ├─► Dump to SQL
-            ├─► Compress with zstd
+            ├─► Clean to only current snapshot data (VACUUM)
+            ├─► Compress binary SQLite with zstd
            ├─► Encrypt with age
-            ├─► Upload db.zst.age to S3
-            └─► Upload manifest.json.zst to S3
+            ├─► Upload db.zst.age to storage
+            └─► Upload manifest.json.zst to storage
 ```

 ## Deduplication Strategy
@@ -368,8 +367,8 @@ bucket/
 │
 └── metadata/
    └── {snapshot-id}/
-        ├── db.zst.age               # Encrypted database dump
-        └── manifest.json.zst        # Blob list (for verification)
+        ├── db.zst.age               # Encrypted binary SQLite database
+        └── manifest.json.zst        # Blob list (for pruning/verification)
 ```

 ## Thread Safety
--- a/README.md
+++ b/README.md
@@ -1,43 +1,35 @@
 # vaultik (ваултик)

-WIP: pre-1.0, some functions may not be fully implemented yet
-
 `vaultik` is an incremental backup tool written in Go. It encrypts data
 using an `age` public key and uploads each encrypted blob directly to a
 remote S3-compatible object store. It requires no private keys, secrets, or
 credentials (other than those required to PUT to encrypted object storage,
 such as S3 API keys) stored on the backed-up system.

-It includes table-stakes features such as:
+Features:

-* modern encryption (the excellent `age`)
-* deduplication
-* incremental backups
-* modern multithreaded zstd compression with configurable levels
+* modern encryption ([age](https://age-encryption.org/), X25519 + XChaCha20-Poly1305)
+* content-defined chunking with deduplication (FastCDC)
+* incremental backups (only changed files are re-chunked)
+* multithreaded zstd compression at configurable levels
 * content-addressed immutable storage
-* local state tracking in standard SQLite database, enables write-only
-  incremental backups to destination
+* local state tracking in SQLite (enables write-only incremental backups)
 * no mutable remote metadata
-* no plaintext file paths or metadata stored in remote
-* does not create huge numbers of small files (to keep S3 operation counts
-  down) even if the source system has many small files
+* no plaintext file paths or metadata in remote storage
+* packs small files into large blobs (keeps S3 operation counts down)
+* backs up regular files, symlinks, empty directories, and file permissions
+* pluggable storage backends: S3, local filesystem, rclone (70+ providers)
+* pure Go (no CGO), cross-compiles to linux/darwin × amd64/arm64

 ## why

-Existing backup software fails under one or more of these conditions:
-
-* Requires secrets (passwords, private keys) on the source system, which
-  compromises encrypted backups in the case of host system compromise
-* Depends on symmetric encryption unsuitable for zero-trust environments
-* Creates one-blob-per-file, which results in excessive S3 operation counts
-* is slow
-
 Other backup tools like `restic`, `borg`, and `duplicity` are designed for
 environments where the source host can store secrets and has access to
-decryption keys. I don't want to store backup decryption keys on my hosts,
-only public keys for encryption.
+decryption keys. `vaultik` is for environments where you don't want to
+store backup decryption keys on your hosts — only public keys for
+encryption.

-My requirements are:
+Requirements that no existing tool meets:

 * open source
 * no passphrases or private keys on the source host
@@ -46,40 +38,13 @@ My requirements are:
 * encrypted
 * s3 compatible without an intermediate step or tool

-Surprisingly, no existing tool meets these requirements, so I wrote `vaultik`.
+## install

-## design goals
+```sh
+go install git.eeqj.de/sneak/vaultik@latest
+```

-1. Backups must require only a public key on the source host.
-1. No secrets or private keys may exist on the source system.
-1. Restore must be possible using **only** the backup bucket and a private key.
-1. Prune must be possible (requires private key, done on different hosts).
-1. All encryption uses [`age`](https://age-encryption.org/) (X25519, XChaCha20-Poly1305).
-1. Compression uses `zstd` at a configurable level.
-1. Files are chunked, and multiple chunks are packed into encrypted blobs
-   to reduce object count for filesystems with many small files.
-1. All metadata (snapshots) is stored remotely as encrypted SQLite DBs.
-
-## what
-
-`vaultik` walks a set of configured directories and builds a
-content-addressable chunk map of changed files using deterministic chunking.
-Each chunk is streamed into a blob packer. Blobs are compressed with `zstd`,
-encrypted with `age`, and uploaded directly to remote storage under a
-content-addressed S3 path. At the end, a pruned snapshot-specific sqlite
-database of metadata is created, encrypted, and uploaded alongside the
-blobs.
-
-No plaintext file contents ever hit disk. No private key or secret
-passphrase is needed or stored locally.
-
-## how
-
-1. **install**
-
-   ```sh
-   go install git.eeqj.de/sneak/vaultik@latest
-   ```
+## quick start

 1. **generate keypair**

@@ -88,23 +53,21 @@ passphrase is needed or stored locally.
   grep 'public key:' agekey.txt
   ```

-1. **write config**
+2. **write config** (see `config.example.yml` for all options)

   ```yaml
-   # Named snapshots - each snapshot can contain multiple paths
   snapshots:
     system:
       paths:
         - /etc
         - /var/lib
       exclude:
-         - '*.cache'  # Snapshot-specific exclusions
+         - '*.cache'
     home:
       paths:
         - /home/user/documents
         - /home/user/photos

-   # Global exclusions (apply to all snapshots)
   exclude:
     - '*.log'
     - '*.tmp'
@@ -112,29 +75,36 @@ passphrase is needed or stored locally.
     - 'node_modules'

   age_recipients:
-     - age1278m9q7dp3chsh2dcy82qk27v047zywyvtxwnj4cvt0z65jw6a7q5dqhfj
+     - age1YOUR_PUBLIC_KEY_HERE
+
+   # Storage backend (pick one):
+   storage_url: "s3://mybucket/backups?endpoint=s3.example.com&region=us-east-1"
+   # storage_url: "file:///mnt/backups"
+   # storage_url: "rclone://myremote/path/to/backups"
+
+   # For s3:// URLs, credentials are still required:
   s3:
-     endpoint: https://s3.example.com
-     bucket: vaultik-data
-     prefix: host1/
     access_key_id: ...
     secret_access_key: ...
-     region: us-east-1
-   chunk_size: 10MB
-   blob_size_limit: 1GB
   ```

-1. **run**
+3. **run**

   ```sh
-   # Create all configured snapshots
-   vaultik --config /etc/vaultik.yaml snapshot create
+   # Back up all configured snapshots
+   vaultik --config /etc/vaultik.yml snapshot create

-   # Create specific snapshots by name
-   vaultik --config /etc/vaultik.yaml snapshot create home system
+   # Back up specific snapshots by name
+   vaultik --config /etc/vaultik.yml snapshot create home system

   # Silent mode for cron
-   vaultik --config /etc/vaultik.yaml snapshot create --cron
+   vaultik --config /etc/vaultik.yml snapshot create --cron
+
+   # Back up and clean up old snapshots + orphan blobs in one shot
+   vaultik --config /etc/vaultik.yml snapshot create --prune
+
+   # Daily cron: back up, keep last 4 weeks of snapshots
+   vaultik --config /etc/vaultik.yml snapshot create --cron --prune --keep-newer-than 4w
   ```

 ---
@@ -144,7 +114,7 @@ passphrase is needed or stored locally.
 ### commands

 ```sh
-vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--prune] [--skip-errors]
+vaultik [--config <path>] snapshot create [snapshot-names...] [--cron] [--prune] [--keep-newer-than <duration>] [--skip-errors]
 vaultik [--config <path>] snapshot list [--json]
 vaultik [--config <path>] snapshot verify <snapshot-id> [--deep] [--json]
 vaultik [--config <path>] snapshot purge [--keep-latest | --older-than <duration>] [--snapshot <name>...] [--force]
@@ -159,245 +129,244 @@ vaultik [--config <path>] database purge [--force]
 vaultik version
 ```

-### environment
+### global flags

-* `VAULTIK_AGE_SECRET_KEY`: Required for `restore` and deep `verify`. Contains the age private key for decryption.
-* `VAULTIK_CONFIG`: Optional path to config file.
+* `--config <path>`: Path to config file (default: `$VAULTIK_CONFIG` or `/etc/vaultik/config.yml`)
+* `--verbose`, `-v`: Enable verbose output
+* `--debug`: Enable debug output
+* `--quiet`, `-q`: Suppress non-error output
+
+### environment variables
+
+* `VAULTIK_AGE_SECRET_KEY`: Age private key for decryption (required for `restore` and `verify --deep`)
+* `VAULTIK_CONFIG`: Path to config file (overridden by `--config`)
+* `VAULTIK_INDEX_PATH`: Override local SQLite index path

 ### command details

-**snapshot create**: Perform incremental backup of configured snapshots
-* Config is located at `/etc/vaultik/config.yml` by default
+**snapshot create**: Perform incremental backup of configured snapshots.
 * Optional snapshot names argument to create specific snapshots (default: all)
 * `--cron`: Silent unless error (for crontab)
-* `--prune`: After backup, drop older snapshots of each backed-up name (keeping
-  only the latest) and remove orphaned blobs from remote storage
+* `--prune`: After backup, drop older snapshots of each backed-up name and
+  remove orphaned blobs from remote storage. By default keeps only the latest
+  snapshot per name; use `--keep-newer-than` for a rolling window.
+* `--keep-newer-than <duration>`: With `--prune`, keep snapshots newer than
+  this duration instead of only the latest (e.g. `4w`, `30d`, `6mo`, `1y`)
 * `--skip-errors`: Skip file read errors (log them loudly but continue)

-**snapshot list**: List all snapshots with their timestamps and sizes
+**snapshot list**: List all snapshots with their timestamps and sizes.
 * `--json`: Output in JSON format

-**snapshot verify**: Verify snapshot integrity
-* `--deep`: Download and verify blob contents (not just existence)
+**snapshot verify**: Verify snapshot integrity.
+* Default (shallow): checks that all blobs referenced in the manifest exist in storage
+* `--deep`: Downloads and decrypts each blob, verifies chunk hashes against the
+  encrypted metadata database
+* `--json`: Output results as JSON

 **snapshot purge**: Remove old snapshots based on criteria. Retention is
-applied per-snapshot-name (e.g. `--keep-latest` keeps the latest of each
-configured name, not the latest globally).
+per-snapshot-name (`--keep-latest` keeps the latest of each name, not the
+latest globally).
 * `--keep-latest`: Keep only the most recent snapshot of each name
-* `--older-than`: Remove snapshots older than duration (e.g., 30d, 6mo, 1y)
+* `--older-than <duration>`: Remove snapshots older than duration (e.g. `30d`, `6m`, `1y`)
 * `--snapshot <name>`: Restrict to specific snapshot names (repeat for multiple)
 * `--force`: Skip confirmation prompt

-**snapshot remove**: Remove a specific snapshot
+**snapshot remove**: Remove a specific snapshot from the local database.
+* `--remote`: Also remove snapshot metadata from remote storage
+* `--all`: Remove all snapshots (requires `--force`)
 * `--dry-run`: Show what would be deleted without deleting
 * `--force`: Skip confirmation prompt
+* `--json`: Output result as JSON

-**snapshot prune**: Clean orphaned data from local database
+**snapshot prune**: Clean orphaned data from the local database (files,
+chunks, blobs not referenced by any snapshot).

-**restore**: Restore snapshot to target directory
-* Requires `VAULTIK_AGE_SECRET_KEY` environment variable with age private key
+**restore**: Restore files from a backup snapshot.
+* Requires `VAULTIK_AGE_SECRET_KEY` environment variable
 * Optional path arguments to restore specific files/directories (default: all)
-* Downloads and decrypts metadata, fetches required blobs, reconstructs files
-* Preserves file permissions, timestamps, and ownership (ownership requires root)
-* Handles symlinks and directories
+* Preserves file permissions, timestamps, ownership (ownership requires root),
+  symlinks, and empty directories
+* `--verify`: After restoring, verify every file's chunk hashes match

-**prune**: Remove unreferenced blobs from remote storage
-* Scans all snapshots for referenced blobs
-* Deletes orphaned blobs
+**prune**: Remove unreferenced blobs from remote storage.
+* Scans all snapshot manifests for referenced blobs, deletes any blob not referenced
+* `--force`: Skip confirmation prompt
+* `--json`: Output stats as JSON

-**info**: Display system and configuration information
+**info**: Display system configuration, storage settings, encryption
+recipients, and local database statistics.

-**store info**: Display S3 bucket configuration and storage statistics
+**remote info**: Show detailed remote storage information including per-snapshot
+metadata sizes, blob counts, and orphaned blob detection.
+* `--json`: Output as JSON
+
+**store info**: Display storage backend type and statistics.
+
+**database purge**: Delete the local SQLite state database entirely. Remote
+storage is unaffected; the next backup will do a full scan and re-deduplicate
+against existing remote blobs.
+* `--force`: Skip confirmation prompt
+
+---
+
+## storage backends
+
+vaultik supports three storage backends, selected via the `storage_url` config field:
+
+**S3** (`s3://bucket/prefix?endpoint=host&region=us-east-1`): Any S3-compatible
+object store. Credentials are read from `s3.access_key_id` and
+`s3.secret_access_key` in the config file.
+
+**Local filesystem** (`file:///path/to/backup`): Stores blobs and metadata on
+a local or mounted filesystem. Useful for testing or backing up to a NAS.
+
+**Rclone** (`rclone://remote/path`): Uses rclone's 70+ supported cloud
+providers. Requires rclone to be configured separately (`rclone config`).
+
+Legacy S3 configuration via `s3.*` fields (endpoint, bucket, prefix, etc.) is
+still supported for backward compatibility. `storage_url` takes precedence if
+both are set.

 ---

 ## architecture

-### s3 bucket layout
+### remote storage layout

 ```
-s3://<bucket>/<prefix>/
+<bucket>/<prefix>/
 ├── blobs/
 │   └── <aa>/<bb>/<full_blob_hash>
 └── metadata/
-    ├── <snapshot_id>/
-    │   ├── db.zst.age
-    │   └── manifest.json.zst
+    └── <snapshot_id>/
+        ├── db.zst.age          # Encrypted binary SQLite database
+        └── manifest.json.zst   # Unencrypted blob list (for pruning)
 ```

-* `blobs/<aa>/<bb>/...`: Two-level directory sharding using first 4 hex chars of blob hash
-* `metadata/<snapshot_id>/db.zst.age`: Encrypted, compressed SQLite database
-* `metadata/<snapshot_id>/manifest.json.zst`: Unencrypted blob list for pruning
+* Blobs are two-level directory sharded using the first 4 hex chars of the blob hash
+* `db.zst.age` is a binary SQLite database (zstd compressed, age encrypted)
+  containing all file metadata, chunk mappings, and relationships for the snapshot
+* `manifest.json.zst` is an unencrypted compressed JSON blob list, enabling
+  pruning without the private key

-### blob manifest format
-
-The `manifest.json.zst` file is unencrypted (compressed JSON) to enable pruning without decryption:
-
-```json
-{
-  "snapshot_id": "hostname_snapshotname_2025-01-01T12:00:00Z",
-  "blob_hashes": [
-    "aa1234567890abcdef...",
-    "bb2345678901bcdef0..."
-  ]
-}
-```
-
-Snapshot IDs follow the format `<hostname>_<snapshot-name>_<timestamp>` (e.g., `server1_home_2025-01-01T12:00:00Z`).
-
-### local sqlite schema
-
-```sql
-CREATE TABLE files (
-  id TEXT PRIMARY KEY,
-  path TEXT NOT NULL UNIQUE,
-  mtime INTEGER NOT NULL,
-  size INTEGER NOT NULL,
-  mode INTEGER NOT NULL,
-  uid INTEGER NOT NULL,
-  gid INTEGER NOT NULL
-);
-
-CREATE TABLE file_chunks (
-  file_id TEXT NOT NULL,
-  idx INTEGER NOT NULL,
-  chunk_hash TEXT NOT NULL,
-  PRIMARY KEY (file_id, idx),
-  FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE
-);
-
-CREATE TABLE chunks (
-  chunk_hash TEXT PRIMARY KEY,
-  size INTEGER NOT NULL
-);
-
-CREATE TABLE blobs (
-  id TEXT PRIMARY KEY,
-  blob_hash TEXT NOT NULL UNIQUE,
-  uncompressed INTEGER NOT NULL,
-  compressed INTEGER NOT NULL,
-  uploaded_at INTEGER
-);
-
-CREATE TABLE blob_chunks (
-  blob_hash TEXT NOT NULL,
-  chunk_hash TEXT NOT NULL,
-  offset INTEGER NOT NULL,
-  length INTEGER NOT NULL,
-  PRIMARY KEY (blob_hash, chunk_hash)
-);
-
-CREATE TABLE chunk_files (
-  chunk_hash TEXT NOT NULL,
-  file_id TEXT NOT NULL,
-  file_offset INTEGER NOT NULL,
-  length INTEGER NOT NULL,
-  PRIMARY KEY (chunk_hash, file_id)
-);
-
-CREATE TABLE snapshots (
-  id TEXT PRIMARY KEY,
-  hostname TEXT NOT NULL,
-  vaultik_version TEXT NOT NULL,
-  started_at INTEGER NOT NULL,
-  completed_at INTEGER,
-  file_count INTEGER NOT NULL,
-  chunk_count INTEGER NOT NULL,
-  blob_count INTEGER NOT NULL,
-  total_size INTEGER NOT NULL,
-  blob_size INTEGER NOT NULL,
-  compression_ratio REAL NOT NULL
-);
-
-CREATE TABLE snapshot_files (
-  snapshot_id TEXT NOT NULL,
-  file_id TEXT NOT NULL,
-  PRIMARY KEY (snapshot_id, file_id)
-);
-
-CREATE TABLE snapshot_blobs (
-  snapshot_id TEXT NOT NULL,
-  blob_id TEXT NOT NULL,
-  blob_hash TEXT NOT NULL,
-  PRIMARY KEY (snapshot_id, blob_id)
-);
-```
+Snapshot IDs follow the format `<hostname>_<snapshot-name>_<RFC3339-timestamp>`
+(e.g. `server1_home_2025-06-01T12:00:00Z`).

 ### data flow

-#### backup
+**backup:**

-1. Load config, open local SQLite index
-1. Walk source directories, check mtime/size against index
-1. For changed/new files: chunk using content-defined chunking
-1. For each chunk: hash, check if already uploaded, add to blob packer
-1. When blob reaches threshold: compress, encrypt, upload to S3
-1. Build snapshot metadata, compress, encrypt, upload
-1. Create blob manifest (unencrypted) for pruning support
+1. Open local SQLite index, load known files and chunks into memory
+2. Walk source directories, compare mtime/size/mode against index
+3. For changed/new files: chunk using content-defined chunking (FastCDC)
+4. For symlinks and directories: record metadata (no chunking)
+5. For each chunk: hash, check dedup, add to blob packer
+6. When blob reaches size threshold: compress (zstd), encrypt (age), upload
+7. Build snapshot metadata database, compress, encrypt, upload
+8. Create unencrypted blob manifest for pruning support

-#### restore
+**restore:**

-1. Download `metadata/<snapshot_id>/db.zst.age`
-1. Decrypt and decompress SQLite database
-1. Query files table (optionally filtered by paths)
-1. For each file, get ordered chunk list from file_chunks
-1. Download required blobs, decrypt, decompress
-1. Extract chunks and reconstruct files
-1. Restore permissions, mtime, uid/gid
+1. Download and decrypt `metadata/<snapshot_id>/db.zst.age`
+2. Open the binary SQLite database
+3. Query files (optionally filtered by paths)
+4. Download and decrypt required blobs
+5. Extract chunks, reconstruct files
+6. Restore permissions, timestamps, ownership, symlinks

-#### prune
+**prune:**

 1. List all snapshot manifests
-1. Build set of all referenced blob hashes
-1. List all blobs in storage
-1. Delete any blob not in referenced set
+2. Build set of all referenced blob hashes
+3. List all blobs in storage
+4. Delete any blob not in the referenced set

-### chunking
+### chunking and deduplication

-* Content-defined chunking using FastCDC algorithm
+* Content-defined chunking using the FastCDC algorithm
 * Average chunk size: configurable (default 10MB)
-* Deduplication at chunk level
-* Multiple chunks packed into blobs for efficiency
+* Deduplication at file level (unchanged files skipped) and chunk level
+  (identical chunks across files stored once)
+* Multiple chunks packed into blobs to reduce object count

 ### encryption

 * Asymmetric encryption using age (X25519 + XChaCha20-Poly1305)
-* Only public key needed on source host
-* Each blob encrypted independently
-* Metadata databases also encrypted
+* Only the public key is needed on the source host
+* Each blob and each metadata database is encrypted independently
+* Multiple recipients supported (encrypt to multiple keys)

 ### compression

-* zstd compression at configurable level
-* Applied before encryption
-* Blob-level compression for efficiency
+* zstd compression at configurable level (1-19, default 3)
+* Applied before encryption at the blob level

 ---

-## does not
+## configuration reference

-* Store any secrets on the backed-up machine
-* Require mutable remote metadata
-* Use tarballs, restic, rsync, or ssh
-* Require a symmetric passphrase or password
-* Trust the source system with anything
+See `config.example.yml` for a complete annotated example. Key fields:

-## does
+| Field | Default | Description |
+|-------|---------|-------------|
+| `age_recipients` | (required) | Age public keys for encryption |
+| `snapshots` | (required) | Named snapshot definitions with paths and excludes |
+| `storage_url` | | Storage backend URL (`s3://`, `file://`, `rclone://`) |
+| `s3.*` | | Legacy S3 configuration (endpoint, bucket, credentials) |
+| `exclude` | | Global exclude patterns (applied to all snapshots) |
+| `chunk_size` | `10MB` | Average chunk size for content-defined chunking |
+| `blob_size_limit` | `10GB` | Maximum blob size before splitting |
+| `compression_level` | `3` | zstd compression level (1-19) |
+| `hostname` | system hostname | Hostname used in snapshot IDs |
+| `index_path` | `~/.local/share/.../index.sqlite` | Local SQLite index path |

-* Incremental deduplicated backup
-* Blob-packed chunk encryption
-* Content-addressed immutable blobs
-* Public-key encryption only
-* SQLite-based local and snapshot metadata
-* Fully stream-processed storage
+---
+
+## limitations
+
+* **No extended attributes (xattrs).** ACLs, macOS Finder metadata,
+  quarantine flags, SELinux labels, and other extended attributes are not
+  backed up or restored.
+* **No hard link detection.** Two hard links to the same inode are backed
+  up as independent files. Content deduplication means the data is stored
+  once, but the hard link relationship is lost on restore.
+* **No sparse file support.** Sparse files are fully materialized during
+  backup. A 100 GB sparse VM disk that is mostly zeros will consume the
+  full (compressed) size in storage.
+* **No bandwidth limiting.** Uploads and downloads use whatever bandwidth
+  is available. There is no `--bwlimit` flag yet.
+* **No parallel blob downloads during restore.** Blobs are fetched
+  sequentially. Restore speed is bound by single-stream throughput.
+* **Device nodes, named pipes, and sockets are silently skipped.** Only
+  regular files, directories, and symlinks are backed up.
+* **No database migrations.** If the local SQLite schema changes between
+  versions, delete the local database (`vaultik database purge`) and run
+  a full backup. Remote storage is unaffected.
+* **Files that change during backup may be inconsistent.** There is no
+  filesystem snapshot or freeze. If a file is modified between the scan
+  and chunk phases, the backed-up copy may reflect a partial write.
+* **Ownership restoration requires root.** File uid/gid are recorded
+  and restored, but `chown` requires elevated privileges. Without root,
+  files are restored with the current user's ownership.
+
+---
+
+## roadmap
+
+Items for future releases:
+
+* Error-condition tests (network failures, disk full, corrupted/missing blobs)
+* Parallel blob downloads during restore
+* Bandwidth limiting (`--bwlimit`)
+* Security audit of encryption implementation
+* Man pages and richer `--help` examples

 ---

 ## requirements

 * Go 1.26 or later
-* S3-compatible object storage
-* Sufficient disk space for local index (typically <1GB)
+* S3-compatible object storage (or local filesystem, or rclone remote)

 ## development workflow

--- a/TODO.md
+++ b/TODO.md
@@ -1,44 +0,0 @@
-# Vaultik 1.0 TODO
-
-Remaining tasks before 1.0 release.
-
-## Must-fix
-
-1. Scanner uses bare `fmt.Printf` (bypasses `--cron` silence)
-   - Route all user-facing output through a writer gated by progress/cron flags
-   - Affects `internal/snapshot/scanner.go` (~24 bare print calls)
-
-1. S3 client error type checking
-   - `internal/s3/client.go:207` has a TODO for proper error type checking
-
-1. Error message polish
-   - Add actionable suggestions for common failures (missing config, bad
-     storage URL, failed S3 auth, missing age key on restore/verify)
-   - Only `restore.go` currently has the "did you set VAULTIK_AGE_SECRET_KEY?" hint
-
-## Done
-
- [x] Rclone storage backend
- [x] Release process (goreleaser, CGO-free cross-compile, checksums)
- [x] End-to-end integration test (backup → restore → verify → byte-compare)
- [x] Restore integration tests
- [x] `--prune` flag on `snapshot create` (per-name retention + orphan blob cleanup)
- [x] Per-name purge retention (`--keep-latest` per snapshot name, `--snapshot` filter)
- [x] CLI surface dedup (removed top-level `purge` and `verify` duplicates)
- [x] Exit codes (create/restore now exit non-zero on failure)
- [x] Deep verify implemented and wired up
- [x] Shallow verify timestamp parsing fixed
- [x] Daemon mode removed
- [x] Makefile targets separated (`lint`/`test`/`fmt`/`check`)
- [x] CGO eliminated (pure-Go SQLite via modernc.org/sqlite)
- [x] Version set correctly in releases via goreleaser ldflags
-
-## Post-1.0
-
-1. Edge-case tests (empty dirs, symlinks, special chars, multi-GB files, 100k+ small files)
-1. Error-condition tests (network failures, disk full, corrupted/missing blobs)
-1. Parallel blob downloads during restore
-1. Bandwidth limiting (`--bwlimit`)
-1. Security audit of encryption (verify no plaintext leaks, correct hash computation)
-1. Man pages / richer `--help` examples
-1. Tag and release v1.0.0
--- a/internal/cli/snapshot.go
+++ b/internal/cli/snapshot.go
@@ -101,6 +101,7 @@ specifying a path using --config or by setting VAULTIK_CONFIG to a path.`,

 	cmd.Flags().BoolVar(&opts.Cron, "cron", false, "Run in cron mode (silent unless error)")
 	cmd.Flags().BoolVar(&opts.Prune, "prune", false, "After backup, drop older snapshots of the same name and remove orphaned blobs")
+	cmd.Flags().StringVar(&opts.KeepNewerThan, "keep-newer-than", "", "With --prune: keep snapshots newer than this duration (e.g. 4w, 30d, 6mo) instead of only the latest")
 	cmd.Flags().BoolVar(&opts.SkipErrors, "skip-errors", false, "Skip file read errors (log them loudly but continue)")

 	return cmd
--- a/internal/snapshot/scanner.go
+++ b/internal/snapshot/scanner.go
@@ -649,7 +649,40 @@ func (s *Scanner) scanPhase(ctx context.Context, path string, result *ScanResult
 			return nil
 		}

-		// Skip non-regular files for processing (but still count them)
+		// Handle symlinks
+		if info.Mode()&os.ModeSymlink != 0 {
+			file := s.buildSymlinkEntry(filePath, info)
+			if file != nil {
+				existingFiles[filePath] = struct{}{}
+				mu.Lock()
+				filesToProcess = append(filesToProcess, &FileToProcess{
+					Path:     filePath,
+					FileInfo: info,
+					File:     file,
+				})
+				filesScanned++
+				mu.Unlock()
+				s.updateScanEntryStats(result, true, info)
+			}
+			return nil
+		}
+
+		// Handle directories (record for permission/ownership preservation and empty-dir support)
+		if info.IsDir() {
+			file := s.buildDirectoryEntry(filePath, info)
+			existingFiles[filePath] = struct{}{}
+			mu.Lock()
+			filesToProcess = append(filesToProcess, &FileToProcess{
+				Path:     filePath,
+				FileInfo: info,
+				File:     file,
+			})
+			filesScanned++
+			mu.Unlock()
+			return nil
+		}
+
+		// Skip other non-regular files (devices, sockets, etc.)
 		if !info.Mode().IsRegular() {
 			return nil
 		}
@@ -760,6 +793,71 @@ func (s *Scanner) printScanProgressLine(filesScanned int64, changedCount int, es
 	}
 }

+// buildSymlinkEntry creates a File record for a symlink.
+// Returns nil if the link target cannot be read.
+func (s *Scanner) buildSymlinkEntry(path string, info os.FileInfo) *database.File {
+	target, err := os.Readlink(path)
+	if err != nil {
+		log.Debug("Cannot read symlink target", "path", path, "error", err)
+		return nil
+	}
+
+	var uid, gid uint32
+	if stat, ok := info.Sys().(interface {
+		Uid() uint32
+		Gid() uint32
+	}); ok {
+		uid = stat.Uid()
+		gid = stat.Gid()
+	}
+
+	return &database.File{
+		ID:         types.NewFileID(),
+		Path:       types.FilePath(path),
+		SourcePath: types.SourcePath(s.currentSourcePath),
+		MTime:      info.ModTime(),
+		Size:       0,
+		Mode:       uint32(info.Mode()),
+		UID:        uid,
+		GID:        gid,
+		LinkTarget: types.FilePath(target),
+	}
+}
+
+// buildDirectoryEntry creates a File record for a directory.
+func (s *Scanner) buildDirectoryEntry(path string, info os.FileInfo) *database.File {
+	var uid, gid uint32
+	if stat, ok := info.Sys().(interface {
+		Uid() uint32
+		Gid() uint32
+	}); ok {
+		uid = stat.Uid()
+		gid = stat.Gid()
+	}
+
+	return &database.File{
+		ID:         types.NewFileID(),
+		Path:       types.FilePath(path),
+		SourcePath: types.SourcePath(s.currentSourcePath),
+		MTime:      info.ModTime(),
+		Size:       0,
+		Mode:       uint32(info.Mode()),
+		UID:        uid,
+		GID:        gid,
+	}
+}
+
+// recordNonRegularFile writes a symlink or directory entry to the database
+// and associates it with the current snapshot. No chunking is performed.
+func (s *Scanner) recordNonRegularFile(ctx context.Context, ftp *FileToProcess) error {
+	return s.repos.WithTx(ctx, func(txCtx context.Context, tx *sql.Tx) error {
+		if err := s.repos.Files.Create(txCtx, tx, ftp.File); err != nil {
+			return fmt.Errorf("creating non-regular file record: %w", err)
+		}
+		return s.repos.Snapshots.AddFileByID(txCtx, tx, s.snapshotID, ftp.File.ID)
+	})
+}
+
 // checkFileInMemory checks if a file needs processing using the in-memory map
 // No database access is performed - this is purely CPU/memory work
 func (s *Scanner) checkFileInMemory(path string, info os.FileInfo, knownFiles map[string]*database.File) (*database.File, bool) {
@@ -1184,6 +1282,12 @@ type streamingChunkInfo struct {

 // processFileStreaming processes a file by streaming chunks directly to the packer
 func (s *Scanner) processFileStreaming(ctx context.Context, fileToProcess *FileToProcess, result *ScanResult) error {
+	// Symlinks and directories have no data to chunk — just record them in the DB.
+	mode := os.FileMode(fileToProcess.File.Mode)
+	if mode&os.ModeSymlink != 0 || mode.IsDir() {
+		return s.recordNonRegularFile(ctx, fileToProcess)
+	}
+
 	file, err := s.fs.Open(fileToProcess.Path)
 	if err != nil {
 		return fmt.Errorf("opening file: %w", err)
--- a/internal/snapshot/scanner_test.go
+++ b/internal/snapshot/scanner_test.go
@@ -110,15 +110,15 @@ func TestScannerSimpleDirectory(t *testing.T) {
 		t.Errorf("expected at least 97 bytes scanned, got %d", result.BytesScanned)
 	}

-	// Verify files in database - only regular files are stored
+	// Verify files in database - includes regular files and directories
 	files, err := repos.Files.ListByPrefix(ctx, "/source")
 	if err != nil {
 		t.Fatalf("failed to list files: %v", err)
 	}

-	// We should have 6 files (directories are not stored)
-	if len(files) != 6 {
-		t.Errorf("expected 6 files in database, got %d", len(files))
+	// 6 regular files + 3 directories (/source, /source/subdir, /source/subdir2)
+	if len(files) != 9 {
+		t.Errorf("expected 9 entries in database (6 files + 3 dirs), got %d", len(files))
 	}

 	// Verify specific file
--- a/internal/vaultik/helpers.go
+++ b/internal/vaultik/helpers.go
@@ -2,6 +2,7 @@ package vaultik

 import (
 	"fmt"
+	"regexp"
 	"strconv"
 	"strings"
 	"time"
@@ -95,18 +96,39 @@ func parseSnapshotName(snapshotID string) string {
 	return strings.Join(parts[1:len(parts)-1], "_")
 }

-// parseDuration parses a duration string with support for days
+// parseDuration parses a duration string with support for human-friendly units:
+// d/day/days, w/week/weeks, mo/month/months, y/year/years, plus standard Go
+// duration units (h, m, s).
 func parseDuration(s string) (time.Duration, error) {
-	// Check for days suffix
-	if strings.HasSuffix(s, "d") {
-		daysStr := strings.TrimSuffix(s, "d")
-		days, err := strconv.Atoi(daysStr)
-		if err != nil {
-			return 0, fmt.Errorf("invalid days value: %w", err)
-		}
-		return time.Duration(days) * 24 * time.Hour, nil
+	if d, err := time.ParseDuration(s); err == nil {
+		return d, nil
 	}

-	// Otherwise use standard Go duration parsing
-	return time.ParseDuration(s)
+	re := regexp.MustCompile(`(\d+)\s*([a-zA-Z]+)`)
+	matches := re.FindAllStringSubmatch(s, -1)
+	if len(matches) == 0 {
+		return 0, fmt.Errorf("invalid duration: %q", s)
+	}
+
+	var total time.Duration
+	for _, match := range matches {
+		n, err := strconv.Atoi(match[1])
+		if err != nil {
+			return 0, fmt.Errorf("invalid number %q: %w", match[1], err)
+		}
+		unit := strings.ToLower(match[2])
+		switch unit {
+		case "d", "day", "days":
+			total += time.Duration(n) * 24 * time.Hour
+		case "w", "week", "weeks":
+			total += time.Duration(n) * 7 * 24 * time.Hour
+		case "mo", "month", "months":
+			total += time.Duration(n) * 30 * 24 * time.Hour
+		case "y", "year", "years":
+			total += time.Duration(n) * 365 * 24 * time.Hour
+		default:
+			return 0, fmt.Errorf("unknown time unit %q", unit)
+		}
+	}
+	return total, nil
 }
--- a/internal/vaultik/helpers_test.go
+++ b/internal/vaultik/helpers_test.go
@@ -2,6 +2,7 @@ package vaultik

 import (
 	"testing"
+	"time"
 )

 func TestParseSnapshotName(t *testing.T) {
@@ -37,6 +38,41 @@ func TestParseSnapshotName(t *testing.T) {
 	}
 }

+func TestParseDuration(t *testing.T) {
+	tests := []struct {
+		input string
+		want  time.Duration
+		err   bool
+	}{
+		{"30d", 30 * 24 * time.Hour, false},
+		{"4w", 4 * 7 * 24 * time.Hour, false},
+		{"6mo", 6 * 30 * 24 * time.Hour, false},
+		{"1y", 365 * 24 * time.Hour, false},
+		{"2w3d", 2*7*24*time.Hour + 3*24*time.Hour, false},
+		{"1h", time.Hour, false},
+		{"30s", 30 * time.Second, false},
+		{"garbage", 0, true},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.input, func(t *testing.T) {
+			got, err := parseDuration(tt.input)
+			if tt.err {
+				if err == nil {
+					t.Fatalf("expected error for %q, got %v", tt.input, got)
+				}
+				return
+			}
+			if err != nil {
+				t.Fatalf("unexpected error for %q: %v", tt.input, err)
+			}
+			if got != tt.want {
+				t.Errorf("parseDuration(%q) = %v, want %v", tt.input, got, tt.want)
+			}
+		})
+	}
+}
+
 func TestParseSnapshotTimestamp(t *testing.T) {
 	tests := []struct {
 		name       string
--- a/internal/vaultik/integration_test.go
+++ b/internal/vaultik/integration_test.go
@@ -585,6 +585,19 @@ func TestEndToEndFileStorage(t *testing.T) {
 		require.NoError(t, afero.WriteFile(fs, path, content, 0o644))
 	}

+	// Create a file with non-default permissions.
+	restrictedPath := filepath.Join(dataDir, "restricted.txt")
+	require.NoError(t, afero.WriteFile(fs, restrictedPath, []byte("secret"), 0o600))
+	testFiles[restrictedPath] = []byte("secret")
+
+	// Create an empty directory (should survive round-trip).
+	emptyDir := filepath.Join(dataDir, "emptydir")
+	require.NoError(t, fs.MkdirAll(emptyDir, 0o755))
+
+	// Create a symlink.
+	symlinkPath := filepath.Join(dataDir, "link-to-small")
+	require.NoError(t, os.Symlink("small.txt", symlinkPath))
+
 	// FileStorer is the real-world local-disk backend.
 	storer, err := storage.NewFileStorer(storeDir)
 	require.NoError(t, err)
@@ -669,6 +682,25 @@ func TestEndToEndFileStorage(t *testing.T) {
 		require.NoError(t, err, "restored file missing: %s", restoredPath)
 		require.Equalf(t, expected, got, "byte-equality failed for %s", origPath)
 	}
+
+	// Verify the restricted file kept its permissions.
+	restoredRestricted := filepath.Join(restoreDir, restrictedPath)
+	rInfo, err := os.Stat(restoredRestricted)
+	require.NoError(t, err)
+	assert.Equal(t, os.FileMode(0o600), rInfo.Mode().Perm(),
+		"restricted file should preserve 0600 permissions")
+
+	// Verify the empty directory was restored.
+	restoredEmptyDir := filepath.Join(restoreDir, emptyDir)
+	dInfo, err := os.Stat(restoredEmptyDir)
+	require.NoError(t, err, "empty directory should be restored")
+	assert.True(t, dInfo.IsDir(), "emptydir should be a directory")
+
+	// Verify the symlink was restored with the correct target.
+	restoredSymlink := filepath.Join(restoreDir, symlinkPath)
+	target, err := os.Readlink(restoredSymlink)
+	require.NoError(t, err, "symlink should be restored")
+	assert.Equal(t, "small.txt", target, "symlink target should be preserved")
 }

 // bytesPattern returns a deterministic byte slice of length n with a tag prefix,
--- a/internal/vaultik/snapshot.go
+++ b/internal/vaultik/snapshot.go
@@ -22,10 +22,11 @@ import (

 // SnapshotCreateOptions contains options for the snapshot create command
 type SnapshotCreateOptions struct {
-	Cron       bool
-	Prune      bool
-	SkipErrors bool     // Skip file read errors (log them loudly but continue)
-	Snapshots  []string // Optional list of snapshot names to process (empty = all)
+	Cron          bool
+	Prune         bool
+	KeepNewerThan string   // With --prune: keep snapshots newer than this duration (e.g. "4w"); default: keep only latest
+	SkipErrors    bool     // Skip file read errors (log them loudly but continue)
+	Snapshots     []string // Optional list of snapshot names to process (empty = all)
 }

 // CreateSnapshot executes the snapshot creation operation
@@ -86,7 +87,7 @@ func (v *Vaultik) CreateSnapshot(opts *SnapshotCreateOptions) error {
 	}

 	if opts.Prune {
-		if err := v.runPostBackupPrune(snapshotNames); err != nil {
+		if err := v.runPostBackupPrune(snapshotNames, opts.KeepNewerThan); err != nil {
 			return fmt.Errorf("post-backup prune: %w", err)
 		}
 	}
@@ -94,19 +95,26 @@ func (v *Vaultik) CreateSnapshot(opts *SnapshotCreateOptions) error {
 	return nil
 }

-// runPostBackupPrune drops older snapshots of the given names (keeping only
-// the latest of each) and removes orphan blobs from remote storage. Invoked
-// when `snapshot create --prune` is used.
-func (v *Vaultik) runPostBackupPrune(snapshotNames []string) error {
-	log.Info("Running post-backup prune", "snapshots", snapshotNames)
+// runPostBackupPrune drops older snapshots of the given names and removes
+// orphan blobs from remote storage. If keepNewerThan is set (e.g. "4w"),
+// snapshots newer than that duration are kept. Otherwise only the latest
+// snapshot of each name is kept.
+func (v *Vaultik) runPostBackupPrune(snapshotNames []string, keepNewerThan string) error {
+	log.Info("Running post-backup prune", "snapshots", snapshotNames, "keep_newer_than", keepNewerThan)
 	v.printlnStdout("\n=== Post-backup prune ===")

 	purgeOpts := &SnapshotPurgeOptions{
-		KeepLatest: true,
-		Force:      true,
-		Names:      snapshotNames,
-		Quiet:      true,
+		Force: true,
+		Names: snapshotNames,
+		Quiet: true,
 	}
+
+	if keepNewerThan != "" {
+		purgeOpts.OlderThan = keepNewerThan
+	} else {
+		purgeOpts.KeepLatest = true
+	}
+
 	if err := v.PurgeSnapshotsWithOptions(purgeOpts); err != nil {
 		return fmt.Errorf("purging old snapshots: %w", err)
 	}
Author	SHA1	Message	Date
sneak	4a3e61f8e1	Merge docs/limitations-section All checks were successful check / check (push) Successful in 1m19s Details	2026-06-09 13:38:32 -04:00
sneak	6fbcac0cd8	Add limitations section to README	2026-06-09 13:38:32 -04:00
sneak	34f73f72d8	Merge feature/keep-newer-than	2026-06-09 13:22:24 -04:00
sneak	ee240faa32	Add --keep-newer-than flag for rolling retention window snapshot create --prune now accepts --keep-newer-than <duration> (e.g. 4w, 30d, 6mo) to keep a rolling window of snapshots instead of only the latest. Supports d/w/mo/y units and combinations (2w3d). Without --keep-newer-than, --prune still defaults to keep-latest-only.	2026-06-09 13:22:24 -04:00
sneak	f719ab3adc	Merge docs/consolidate-readme	2026-06-09 12:57:33 -04:00
sneak	1a8baf7491	Consolidate docs: rewrite README as primary reference, remove TODO.md README now covers: storage backends (s3/file/rclone), all CLI commands with full flag docs, configuration reference table, architecture overview, roadmap (post-1.0 only), and development workflow. TODO.md removed — completed items dropped, remaining roadmap items merged into README. ARCHITECTURE.md updated: correct snapshot ID format, storage.Storer instead of s3.Client, binary SQLite export instead of SQL dump.	2026-06-09 12:57:33 -04:00
sneak	7d5d3fa598	Merge test/e2e-symlinks-dirs-perms: backup symlinks, empty dirs, permissions	2026-06-09 12:47:22 -04:00
sneak	ac5d2f4a0d	Back up symlinks, empty directories, and file permissions Scanner now records symlinks (with their target) and directories during the walk phase instead of skipping them. processFileStreaming detects non-regular entries and writes the DB record without chunking. The e2e test (TestEndToEndFileStorage) now verifies: - Symlink target preserved through backup→restore - Empty directory survives round-trip - File permissions (0600) restored correctly	2026-06-09 12:47:18 -04:00