- Write complete .mf format specification (FORMAT.md) - Fill in all design question answers in TODO.md - Mark completed implementation items in TODO.md - Bump VERSION from 0.1.0 to 1.0.0 in Makefile - Update README to reference FORMAT.md and reflect 1.0 status
6.1 KiB
.mf File Format Specification
Version 1.0
Overview
An .mf file is a binary manifest that describes a directory tree of files,
including their paths, sizes, and cryptographic checksums. It supports
optional GPG signatures for integrity verification and optional timestamps
for metadata preservation.
File Structure
An .mf file consists of two parts, concatenated:
- Magic bytes (8 bytes): the ASCII string
ZNAVSRFG - Outer message: a Protocol Buffers serialized
MFFileOutermessage
There is no length prefix or version byte between the magic and the protobuf message. The protobuf message extends to the end of the file.
See mfer/mf.proto for exact field numbers and types.
Outer Message (MFFileOuter)
The outer message contains:
| Field | Number | Type | Description |
|---|---|---|---|
version |
101 | enum | Must be VERSION_ONE (1) |
compressionType |
102 | enum | Compression of innerMessage; must be COMPRESSION_ZSTD (1) |
size |
103 | int64 | Uncompressed size of innerMessage (corruption detection) |
sha256 |
104 | bytes | SHA-256 hash of the compressed innerMessage (corruption detection) |
uuid |
105 | bytes | Random v4 UUID; must match the inner message UUID |
innerMessage |
199 | bytes | Zstd-compressed serialized MFFile message |
signature |
201 | bytes (optional) | GPG signature (ASCII-armored or binary) |
signer |
202 | bytes (optional) | Full GPG key ID of the signer |
signingPubKey |
203 | bytes (optional) | Full GPG signing public key |
SHA-256 Hash
The sha256 field (104) covers the compressed innerMessage bytes.
This allows verifying data integrity before decompression.
Compression
The innerMessage field is compressed with Zstandard (zstd).
Implementations must enforce a decompression size limit to prevent
decompression bombs. The reference implementation limits decompressed size to
256 MB.
Inner Message (MFFile)
After decompressing innerMessage, the result is a serialized MFFile
(referred to as the manifest):
| Field | Number | Type | Description |
|---|---|---|---|
version |
100 | enum | Must be VERSION_ONE (1) |
files |
101 | repeated MFFilePath |
List of files in the manifest |
uuid |
102 | bytes | Random v4 UUID; must match outer UUID |
createdAt |
201 | Timestamp (optional) | When the manifest was created |
File Entries (MFFilePath)
Each file entry contains:
| Field | Number | Type | Description |
|---|---|---|---|
path |
1 | string | Relative file path (see Path Rules) |
size |
2 | int64 | File size in bytes |
hashes |
3 | repeated MFFileChecksum |
At least one hash required |
mimeType |
301 | string (optional) | MIME type |
mtime |
302 | Timestamp (optional) | Modification time |
ctime |
303 | Timestamp (optional) | Change time (inode metadata change) |
Field 304 (atime) has been removed from the specification. Access time is
volatile and non-deterministic; it is not useful for integrity verification.
Path Rules
All path values must satisfy these invariants:
- UTF-8: paths must be valid UTF-8
- Forward slashes: use
/as the path separator (never\) - Relative only: no leading
/ - No parent traversal: no
..path segments - No empty segments: no
//sequences - No trailing slash: paths refer to files, not directories
Implementations must validate these invariants when reading and writing manifests. Paths that violate these rules must be rejected.
Hash Format (MFFileChecksum)
Each checksum is a single bytes multiHash field containing a
multihash-encoded value. Multihash is
self-describing: the encoded bytes include a varint algorithm identifier
followed by a varint digest length followed by the digest itself.
The 1.0 implementation writes SHA-256 multihashes (0x12 algorithm code).
Implementations must be able to verify SHA-256 multihashes at minimum.
Signature Scheme
Signing is optional. When present, the signature covers a canonical string constructed as:
ZNAVSRFG-<UUID>-<SHA256>
Where:
ZNAVSRFGis the magic bytes string (literal ASCII)<UUID>is the hex-encoded UUID from the outer message<SHA256>is the hex-encoded SHA-256 hash from the outer message (covering compressed data)
Components are separated by hyphens. The signature is produced by GPG over
this canonical string and stored in the signature field of the outer
message.
Deterministic Serialization
By default, manifests are generated deterministically:
- File entries are sorted by
pathin lexicographic byte order createdAtis omitted unless explicitly requestedatimeis never included (field removed from schema)
This ensures that two independent runs over the same directory tree produce
byte-identical .mf files (assuming file contents and metadata have not
changed).
MIME Type
The recommended MIME type for .mf files is application/octet-stream.
The .mf file extension is the canonical identifier.
Reference
- Proto definition:
mfer/mf.proto - Reference implementation: git.eeqj.de/sneak/mfer