fix: populate ctime from actual filesystem stats instead of mtime fallback #48
Reference in New Issue
Block a user
Delete Branch "fix/ctime-scanner-population"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The scanner was setting
CTimetoinfo.ModTime()as a placeholder since afero'sFileInfointerface doesn't expose ctime directly. This change extracts the actual ctime from the underlyingsyscall.Stat_tvia platform-specific build files.Changes
Platform-specific ctime extraction (
fileCTime())ctime_darwin.go: Usessyscall.Stat_t.Birthtimespec— the file creation (birth) time on macOS.ctime_linux.go: Usessyscall.Stat_t.Ctim— the inode change time on Linux. Note: Linux doesn't expose birth time through Go'sos.FileInfo.Sys().info.Sys()doesn't return*syscall.Stat_t(e.g. aferoMemMapFsin tests).Scanner update
checkFileInMemory()now callsfileCTime(info)instead ofinfo.ModTime().Documentation
filestable schema to match actual schema (addsctime,source_path,link_targetcolumns)CTimefield indatabase.FilemodelSnapshot export & restore
Tests
TestFileCTime_RealFile: Verifies ctime extraction from a freshly created fileTestFileCTime_AfterMtimeChange: Verifies ctime differs from mtime after mtime is modifiedTestFileCTime_FallbackToMtime: Verifies fallback behavior with mock FileInfo (no syscall.Stat_t)docker build .passes (fmt-check + lint + tests + build)closes #13
a53203d60dto25860c03a9fix: populate ctime from platform-specific syscall datato fix: populate ctime from actual filesystem stats instead of mtime fallbackReview: PASS ✅
PR #48 — fix: populate ctime from actual filesystem stats instead of mtime fallback (closes #13)
What was reviewed
docker build .— passes (fmt-check, lint, tests, build)Assessment
Architecture is sound. Using Go filename-based build constraints (
_darwin.go,_linux.go) is the idiomatic approach. ThefileCTime()function cleanly encapsulates the platform divergence:syscall.Stat_t.Birthtimespec(file creation time) — correctsyscall.Stat_t.Ctim(inode change time) — correctinfo.ModTime()whenSys()doesn't yield*syscall.Stat_t(e.g. aferoMemMapFs) — correct and consistent with previous behaviorTests are meaningful and not weakened. Three tests covering distinct scenarios:
TestFileCTime_RealFile— verifies real file stat extraction returns a recent, non-zero timeTestFileCTime_AfterMtimeChange— verifies ctime doesn't track artificially-set mtime (validates platform semantics on both macOS and Linux)TestFileCTime_FallbackToMtime— verifies mockFileInfowithoutsyscall.Stat_tfalls back to mtime.UTC()normalization is consistent with the codebase —internal/database/files.goalready normalizes both MTime and CTime to UTC when reading from the database.Documentation is thorough. The new "platform-specific ctime semantics" README section clearly explains the macOS vs Linux divergence and the restore limitation. The restore code comments are accurate.
README schema update adds
source_path,ctime, andlink_targetcolumns that already existed in the code — the README was just behind. Good housekeeping.Minor observations (non-blocking)
ctime1is computed but unused inTestFileCTime_AfterMtimeChange(suppressed with_ = ctime1). The comment explains the reasoning — on macOS you'd assertctime1 == ctime2(birth time unchanged), on Linuxctime2 > ctime1(metadata change updates ctime), but neither assertion is portable. Acceptable trade-off, though removing the variable entirely would be slightly cleaner.No issues found. Clean implementation, good tests, correct platform handling.
how do other backup packages handle storage of ctime and restoration?
Good question. Here's how the major backup tools handle ctime:
restic:
borgbackup:
GNU tar:
Summary: vaultik's approach (store ctime for change detection and record-keeping, don't restore it) is exactly what restic and borg do. No mainstream backup tool attempts to restore ctime because it's not possible through standard POSIX APIs — ctime is kernel-managed on Linux and there's no userspace API for birth time on macOS. The PR's documentation of this limitation is accurate and consistent with industry practice.
i am considering ripping out all mention/storage/handling of ctime
Tradeoffs to consider:
Arguments for removing ctime:
Arguments for keeping ctime:
If you want to rip it out, that's a clean reversal of this PR — we'd close PR #48 and open a new issue to remove the
ctimecolumn from the schema, theCTimefield from the model, and thefileCTime()calls from the scanner. Want me to proceed with that?what do you think? is ctime storage complexity worth the benefit?
Honestly? I'd rip it out.
Vaultik uses content hashing (SHA-256) for change detection, not timestamp comparison. That makes ctime's primary use case in restic/borg ("did this file change since last backup?") irrelevant here — you're already doing the definitive check.
What you're left with is write-only forensic data where the same column means fundamentally different things on macOS vs Linux. That's not useful metadata — it's a footgun for anyone comparing snapshots across platforms.
The complexity isn't huge (two small platform files + a schema column), but it's complexity with no current consumer and ambiguous semantics. If you ever need ctime for faster scanning later, you can add it back with clear documentation of what it actually means per-platform. Easier to add a well-understood feature than to carry a confusing one.
TL;DR: remove it. Content hashing makes it redundant for change detection, and the platform divergence makes it misleading for everything else.
Why do you even mention hash? We don’t re-hash every file on scan - just on mtime (not ctime) change. Stop mentioning ctime wrt scanning.
You're right, I was wrong about that. Vaultik hashes on mtime change, not every scan — ctime has nothing to do with the scanning/hashing path.
Corrected take: the only value ctime provides is as stored metadata in the snapshot record. It tells you when the file's inode metadata last changed (Linux) or when it was created (macOS). That's forensic/informational data only — vaultik never uses it operationally.
So the question is simpler: is it worth storing a metadata field that means different things on different platforms, can't be restored, and has no operational use? I'd still lean toward removing it, but it's a much smaller call than I made it sound.
remove all ctime usage and storage.
Understood. Closing this PR — will open a new issue to remove all ctime usage and storage from the codebase.
Pull request closed