3.7 KiB
mfer
Manifest file generator and checker.
Problem Statement
Given a plain URL, there is no standard way to safely and programmatically download everything "under" that URL path. wget -r
can traverse directory listings if they're enabled, but every server has a different format, and this does not verify cryptographic integrity of the files.
Currently, the solution that people are using are sidecar files in the format of SHASUMS
checksum files, as well as a SHASUMS.asc
PGP detached signature. This is not checksum-algorithm-agnostic and the sidecar file is not always consistently named.
Proposed Solution
A standard, a manifest file format, and a tool for generating same.
The manifest file would be called index.mf
, and the tool for generating such would be called mfer
.
The manifest file would do several important things:
- have a standard filename, so if given
https://example.com/downloadpackage/
one could fetchhttps://example.com/downloadpackage/index.mf
to enumerate the full directory listing. - contain a version field for extensibility
- contain structured data (protobuf, json, or cbor)
- provide an inner signed container, so that the manifest file itself can embed a signature and a public key alongside in a single file
- contain a list of files, each with a relative path to the manifest
- contain manifest timestamp
- contain mtime information for files so that file metadata can be preserved
- contain cryptographic checksums in several different formats for each file
- probably encoded with multihash to indicate algo + hash
- sha256 at the minimum
- would be nice to include an IPFS/IPLD CIDv1 root hash for each file, which likely involves doing an ipfs file object chunking
Design Goals
- Replace SHASUMS/SHASUMS.asc files
- be easy to download/resume
- be easy to use across protocols (given an HTTPS url, fetch manifest, then download file contents via bittorrent or ipfs)
Non-Goals
- Manifest generation speed
- Small manifest file size (within reason)
Open Questions
- Should the manifest file include checksums of individual file chunks, or just for the whole assembled file?
- If so, should the chunksize be fixed or dynamic?
Tool Examples
mfer gen
/mfer gen .
- recurses under current directory and writes out an
index.mf
- recurses under current directory and writes out an
mfer check
/mfer check .
- verifies checksums of all files in manifest, displaying error and exiting nonzero if any files are missing or corrupted
Implementation Plan
Phase One:
- golang module for reusability/embedding
- golang module client providing
mfer
CLI
Phase Two:
- ES5 or TypeScript module for reusability/embedding
- ES5/TypeScript module client providing
mfjs
CLI
Hopes And Dreams
aria2c https://example.com/manifestdirectory/
- (fetches
https://example.com/manifestdirectory/index.mf
, downloads and checksums all files, resumes any that exist locally already)
- (fetches
mfer fetch https://example.com/manifestdirectory/
Use Cases
Web Images
I'd like to be able to put a bunch of images into a directory, generate a manifest, and then point a slideshow client (such as an ambient display, or a react app with the target directory in a query string arg) at that statically hosted directory, and have it discover the full list of images available at that URL.
Software Distribution
I'd like to be able to download a whole tree of files available via HTTP resumably by either HTTP or IPFS/BitTorrent without a .torrent file.
Filesystem Archive Integrity
I use filesystems that don't include data checksums, and I would like a cryptographically signed checksum file so that I can later verify that a set of archive files have not been modified, none are missing, and that the checksums have not been altered in storage by a second party.