# mfer Manifest file generator and checker. # Problem Statement Given a plain URL, there is no standard way to safely and programmatically download everything "under" that URL path. `wget -r` can traverse directory listings if they're enabled, but every server has a different format, and this does not verify cryptographic integrity of the files, or enable them to be fetched using a different protocol other than HTTP/s. Currently, the solution that people are using are sidecar files in the format of `SHASUMS` checksum files, as well as a `SHASUMS.asc` PGP detached signature. This is not checksum-algorithm-agnostic and the sidecar file is not always consistently named. # Proposed Solution A standard, a manifest file format, and a tool for generating same. The manifest file would be called `index.mf`, and the tool for generating such would be called `mfer`. The manifest file would do several important things: * have a standard filename, so if given `https://example.com/downloadpackage/` one could fetch `https://example.com/downloadpackage/index.mf` to enumerate the full directory listing. * contain a version field for extensibility * contain structured data (protobuf, json, or cbor) * provide an inner signed container, so that the manifest file itself can embed a signature and a public key alongside in a single file * contain a list of files, each with a relative path to the manifest * contain manifest timestamp * contain ctime/mtime information for files so that file metadata can be preserved * contain cryptographic checksums in several different algorithms for each file * probably encoded with multihash to indicate algo + hash * sha256 at the minimum * would be nice to include an IPFS/IPLD CIDv1 root hash for each file, which likely involves doing an ipfs file object chunking * maybe a bittorrent chunklist for torrent client compatibility? perhaps a top-level infohash for the whole manifest? # Design Goals * Replace SHASUMS/SHASUMS.asc files * be easy to download/resume * be easy to use across protocols (given an HTTPS url, fetch manifest, then download file contents via bittorrent or ipfs) # Non-Goals * Manifest generation speed * likely involves IPFS chunking, bittorrent chunking, and several different cryptographic hash functions over the entirety of each and every file * Small manifest file size (within reason) * 10MiB files are "small" these days, given modern storage/bandwidth * metadata size should not be used as an excuse to sacrifice utility (such as providing checksums over each chunk of a large file) # Open Questions * Should the manifest file include checksums of individual file chunks, or just for the whole assembled file? * If so, should the chunksize be fixed or dynamic? # Tool Examples * `mfer gen` / `mfer gen .` * recurses under current directory and writes out an `index.mf` * `mfer check` / `mfer check .` * verifies checksums of all files in manifest, displaying error and exiting nonzero if any files are missing or corrupted * `mfer fetch https://example.com/stuff/` * fetches `/stuff/index.mf` and downloads all files listed in manifest, optionally resuming any that already exist locally, and assures cryptographic integrity of downloaded files. # Implementation Plan ## Phase One: * golang module for reusability/embedding * golang module client providing `mfer` CLI ## Phase Two: * ES5 or TypeScript module for reusability/embedding * ES5/TypeScript module client providing `mfjs` CLI # Hopes And Dreams * `aria2c https://example.com/manifestdirectory/` * (fetches `https://example.com/manifestdirectory/index.mf`, downloads and checksums all files, resumes any that exist locally already) * `mfer fetch https://example.com/manifestdirectory/` # Use Cases ## Web Images I'd like to be able to put a bunch of images into a directory, generate a manifest, and then point a slideshow client (such as an ambient display, or a react app with the target directory in a query string arg) at that statically hosted directory, and have it discover the full list of images available at that URL. ## Software Distribution I'd like to be able to download a whole tree of files available via HTTP resumably by either HTTP or IPFS/BitTorrent without a .torrent file. ## Filesystem Archive Integrity I use filesystems that don't include data checksums, and I would like a cryptographically signed checksum file so that I can later verify that a set of archive files have not been modified, none are missing, and that the checksums have not been altered in storage by a second party. # Collaboration Please email [`sneak@sneak.berlin`](mailto:sneak@sneak.berlin) with your desired username for an account on this Gitea instance. I am currently interested in hiring a contractor skilled with the Go standard library interfaces to specify this tool in full and develop a prototype implementation.