# .mf File Format Specification Version 1.0 ## Overview An `.mf` file is a binary manifest that describes a directory tree of files, including their paths, sizes, and cryptographic checksums. It supports optional GPG signatures for integrity verification and optional timestamps for metadata preservation. ## File Structure An `.mf` file consists of two parts, concatenated: 1. **Magic bytes** (8 bytes): the ASCII string `ZNAVSRFG` 2. **Outer message**: a Protocol Buffers serialized `MFFileOuter` message There is no length prefix or version byte between the magic and the protobuf message. The protobuf message extends to the end of the file. See [`mfer/mf.proto`](mfer/mf.proto) for exact field numbers and types. ## Outer Message (`MFFileOuter`) The outer message contains: | Field | Number | Type | Description | |--------------------|--------|-------------------|--------------------------------------------------| | `version` | 101 | enum | Must be `VERSION_ONE` (1) | | `compressionType` | 102 | enum | Compression of `innerMessage`; must be `COMPRESSION_ZSTD` (1) | | `size` | 103 | int64 | Uncompressed size of `innerMessage` (corruption detection) | | `sha256` | 104 | bytes | SHA-256 hash of the **compressed** `innerMessage` (corruption detection) | | `uuid` | 105 | bytes | Random v4 UUID; must match the inner message UUID | | `innerMessage` | 199 | bytes | Zstd-compressed serialized `MFFile` message | | `signature` | 201 | bytes (optional) | GPG signature (ASCII-armored or binary) | | `signer` | 202 | bytes (optional) | Full GPG key ID of the signer | | `signingPubKey` | 203 | bytes (optional) | Full GPG signing public key | ### SHA-256 Hash The `sha256` field (104) covers the **compressed** `innerMessage` bytes. This allows verifying data integrity before decompression. ## Compression The `innerMessage` field is compressed with [Zstandard (zstd)](https://facebook.github.io/zstd/). Implementations must enforce a decompression size limit to prevent decompression bombs. The reference implementation limits decompressed size to 256 MB. ## Inner Message (`MFFile`) After decompressing `innerMessage`, the result is a serialized `MFFile` (referred to as the manifest): | Field | Number | Type | Description | |-------------|--------|-----------------------|--------------------------------------------| | `version` | 100 | enum | Must be `VERSION_ONE` (1) | | `files` | 101 | repeated `MFFilePath` | List of files in the manifest | | `uuid` | 102 | bytes | Random v4 UUID; must match outer UUID | | `createdAt` | 201 | Timestamp (optional) | When the manifest was created | ## File Entries (`MFFilePath`) Each file entry contains: | Field | Number | Type | Description | |------------|--------|---------------------------|--------------------------------------| | `path` | 1 | string | Relative file path (see Path Rules) | | `size` | 2 | int64 | File size in bytes | | `hashes` | 3 | repeated `MFFileChecksum` | At least one hash required | | `mimeType` | 301 | string (optional) | MIME type | | `mtime` | 302 | Timestamp (optional) | Modification time | | `ctime` | 303 | Timestamp (optional) | Change time (inode metadata change) | Field 304 (`atime`) has been removed from the specification. Access time is volatile and non-deterministic; it is not useful for integrity verification. ## Path Rules All `path` values must satisfy these invariants: - **UTF-8**: paths must be valid UTF-8 - **Forward slashes**: use `/` as the path separator (never `\`) - **Relative only**: no leading `/` - **No parent traversal**: no `..` path segments - **No empty segments**: no `//` sequences - **No trailing slash**: paths refer to files, not directories Implementations must validate these invariants when reading and writing manifests. Paths that violate these rules must be rejected. ## Hash Format (`MFFileChecksum`) Each checksum is a single `bytes multiHash` field containing a [multihash](https://multiformats.io/multihash/)-encoded value. Multihash is self-describing: the encoded bytes include a varint algorithm identifier followed by a varint digest length followed by the digest itself. The 1.0 implementation writes SHA-256 multihashes (`0x12` algorithm code). Implementations must be able to verify SHA-256 multihashes at minimum. ## Signature Scheme Signing is optional. When present, the signature covers a canonical string constructed as: ``` ZNAVSRFG-- ``` Where: - `ZNAVSRFG` is the magic bytes string (literal ASCII) - `` is the hex-encoded UUID from the outer message - `` is the hex-encoded SHA-256 hash from the outer message (covering compressed data) Components are separated by hyphens. The signature is produced by GPG over this canonical string and stored in the `signature` field of the outer message. ## Deterministic Serialization By default, manifests are generated deterministically: - File entries are sorted by `path` in **lexicographic byte order** - `createdAt` is omitted unless explicitly requested - `atime` is never included (field removed from schema) This ensures that two independent runs over the same directory tree produce byte-identical `.mf` files (assuming file contents and metadata have not changed). ## MIME Type The recommended MIME type for `.mf` files is `application/octet-stream`. The `.mf` file extension is the canonical identifier. ## Reference - Proto definition: [`mfer/mf.proto`](mfer/mf.proto) - Reference implementation: [git.eeqj.de/sneak/mfer](https://git.eeqj.de/sneak/mfer)