mfer/FORMAT.md
clawbot ca3e29e802 docs: add FORMAT.md, answer design questions, bump version to 1.0.0
- Write complete .mf format specification (FORMAT.md)
- Fill in all design question answers in TODO.md
- Mark completed implementation items in TODO.md
- Bump VERSION from 0.1.0 to 1.0.0 in Makefile
- Update README to reference FORMAT.md and reflect 1.0 status
2026-02-20 03:45:19 -08:00

6.1 KiB

.mf File Format Specification

Version 1.0

Overview

An .mf file is a binary manifest that describes a directory tree of files, including their paths, sizes, and cryptographic checksums. It supports optional GPG signatures for integrity verification and optional timestamps for metadata preservation.

File Structure

An .mf file consists of two parts, concatenated:

  1. Magic bytes (8 bytes): the ASCII string ZNAVSRFG
  2. Outer message: a Protocol Buffers serialized MFFileOuter message

There is no length prefix or version byte between the magic and the protobuf message. The protobuf message extends to the end of the file.

See mfer/mf.proto for exact field numbers and types.

Outer Message (MFFileOuter)

The outer message contains:

Field Number Type Description
version 101 enum Must be VERSION_ONE (1)
compressionType 102 enum Compression of innerMessage; must be COMPRESSION_ZSTD (1)
size 103 int64 Uncompressed size of innerMessage (corruption detection)
sha256 104 bytes SHA-256 hash of the compressed innerMessage (corruption detection)
uuid 105 bytes Random v4 UUID; must match the inner message UUID
innerMessage 199 bytes Zstd-compressed serialized MFFile message
signature 201 bytes (optional) GPG signature (ASCII-armored or binary)
signer 202 bytes (optional) Full GPG key ID of the signer
signingPubKey 203 bytes (optional) Full GPG signing public key

SHA-256 Hash

The sha256 field (104) covers the compressed innerMessage bytes. This allows verifying data integrity before decompression.

Compression

The innerMessage field is compressed with Zstandard (zstd). Implementations must enforce a decompression size limit to prevent decompression bombs. The reference implementation limits decompressed size to 256 MB.

Inner Message (MFFile)

After decompressing innerMessage, the result is a serialized MFFile (referred to as the manifest):

Field Number Type Description
version 100 enum Must be VERSION_ONE (1)
files 101 repeated MFFilePath List of files in the manifest
uuid 102 bytes Random v4 UUID; must match outer UUID
createdAt 201 Timestamp (optional) When the manifest was created

File Entries (MFFilePath)

Each file entry contains:

Field Number Type Description
path 1 string Relative file path (see Path Rules)
size 2 int64 File size in bytes
hashes 3 repeated MFFileChecksum At least one hash required
mimeType 301 string (optional) MIME type
mtime 302 Timestamp (optional) Modification time
ctime 303 Timestamp (optional) Change time (inode metadata change)

Field 304 (atime) has been removed from the specification. Access time is volatile and non-deterministic; it is not useful for integrity verification.

Path Rules

All path values must satisfy these invariants:

  • UTF-8: paths must be valid UTF-8
  • Forward slashes: use / as the path separator (never \)
  • Relative only: no leading /
  • No parent traversal: no .. path segments
  • No empty segments: no // sequences
  • No trailing slash: paths refer to files, not directories

Implementations must validate these invariants when reading and writing manifests. Paths that violate these rules must be rejected.

Hash Format (MFFileChecksum)

Each checksum is a single bytes multiHash field containing a multihash-encoded value. Multihash is self-describing: the encoded bytes include a varint algorithm identifier followed by a varint digest length followed by the digest itself.

The 1.0 implementation writes SHA-256 multihashes (0x12 algorithm code). Implementations must be able to verify SHA-256 multihashes at minimum.

Signature Scheme

Signing is optional. When present, the signature covers a canonical string constructed as:

ZNAVSRFG-<UUID>-<SHA256>

Where:

  • ZNAVSRFG is the magic bytes string (literal ASCII)
  • <UUID> is the hex-encoded UUID from the outer message
  • <SHA256> is the hex-encoded SHA-256 hash from the outer message (covering compressed data)

Components are separated by hyphens. The signature is produced by GPG over this canonical string and stored in the signature field of the outer message.

Deterministic Serialization

By default, manifests are generated deterministically:

  • File entries are sorted by path in lexicographic byte order
  • createdAt is omitted unless explicitly requested
  • atime is never included (field removed from schema)

This ensures that two independent runs over the same directory tree produce byte-identical .mf files (assuming file contents and metadata have not changed).

MIME Type

The recommended MIME type for .mf files is application/octet-stream. The .mf file extension is the canonical identifier.

Reference