Jeffrey Paul 5a3f228d74 | ||
---|---|---|
LICENSE | ||
README.md |
README.md
mfer
Manifest file generator and checker.
Problem Statement
Given a plain URL, there is no standard way to safely and programmatically download everything "under" that URL path. wget -r
can traverse directory listings if they're enabled, but every server has a different format, and this does not verify cryptographic integrity of the files, or enable them to be fetched using a different protocol other than HTTP/s.
Currently, the solution that people are using are sidecar files in the format of SHASUMS
checksum files, as well as a SHASUMS.asc
PGP detached signature. This is not checksum-algorithm-agnostic and the sidecar file is not always consistently named.
Proposed Solution
A standard, a manifest file format, and a tool for generating same.
The manifest file would be called index.mf
, and the tool for generating such would be called mfer
.
The manifest file would do several important things:
- have a standard filename, so if given
https://example.com/downloadpackage/
one could fetchhttps://example.com/downloadpackage/index.mf
to enumerate the full directory listing. - contain a version field for extensibility
- contain structured data (protobuf, json, or cbor)
- provide an inner signed container, so that the manifest file itself can embed a signature and a public key alongside in a single file
- contain a list of files, each with a relative path to the manifest
- contain manifest timestamp
- contain ctime/mtime information for files so that file metadata can be preserved
- contain cryptographic checksums in several different algorithms for each file
- probably encoded with multihash to indicate algo + hash
- sha256 at the minimum
- would be nice to include an IPFS/IPLD CIDv1 root hash for each file, which likely involves doing an ipfs file object chunking
- maybe even including the complete IPFS/IPLD directory tree objects and chunklists?
- this is because generating an
index.mf
does not imply publishing on ipfs at that time
- this is because generating an
- maybe a bittorrent chunklist for torrent client compatibility? perhaps a top-level infohash for the whole manifest?
Design Goals
- Replace SHASUMS/SHASUMS.asc files
- be easy to download/resume
- be easy to use across protocols (given an HTTPS url, fetch manifest, then download file contents via bittorrent or ipfs)
Non-Goals
- Manifest generation speed
- likely involves IPFS chunking, bittorrent chunking, and several different cryptographic hash functions over the entirety of each and every file
- Small manifest file size (within reason)
- 30MiB files are "small" these days, given modern storage/bandwidth
- metadata size should not be used as an excuse to sacrifice utility (such as providing checksums over each chunk of a large file)
Open Questions
- Should the manifest file include checksums of individual file chunks, or just for the whole assembled file?
- If so, should the chunksize be fixed or dynamic?
Tool Examples
mfer gen
/mfer gen .
- recurses under current directory and writes out an
index.mf
- recurses under current directory and writes out an
mfer check
/mfer check .
- verifies checksums of all files in manifest, displaying error and exiting nonzero if any files are missing or corrupted
mfer fetch https://example.com/stuff/
- fetches
/stuff/index.mf
and downloads all files listed in manifest, optionally resuming any that already exist locally, and assures cryptographic integrity of downloaded files.
- fetches
Implementation Plan
Phase One:
- golang module for reusability/embedding
- golang module client providing
mfer
CLI
Phase Two:
- ES6 or TypeScript module for reusability/embedding
- ES6/TypeScript module client providing
mfer.js
CLI
Hopes And Dreams
aria2c https://example.com/manifestdirectory/
- (fetches
https://example.com/manifestdirectory/index.mf
, downloads and checksums all files, resumes any that exist locally already)
- (fetches
mfer fetch https://example.com/manifestdirectory/
- a command line option to zero/omit mtime/ctime, as well as manifest timestamp, and sort all directory listings so that manifest file generation is deterministic/reproducible
- URL format
mfer fetch https://exmaple.com/manifestdirectory/?key=5539AD00DE4C42F3AFE11575052443F4DF2A55C2
to assert in the URL which PGP signing key should be used in the manifest, so that shared URLs have a cryptographic trust root
Use Cases
Web Images
I'd like to be able to put a bunch of images into a directory, generate a manifest, and then point a slideshow client (such as an ambient display, or a react app with the target directory in a query string arg) at that statically hosted directory, and have it discover the full list of images available at that URL.
Software Distribution
I'd like to be able to download a whole tree of files available via HTTP resumably by either HTTP or IPFS/BitTorrent without a .torrent file.
Filesystem Archive Integrity
I use filesystems that don't include data checksums, and I would like a cryptographically signed checksum file so that I can later verify that a set of archive files have not been modified, none are missing, and that the checksums have not been altered in storage by a second party.
Collaboration
Please email sneak@sneak.berlin
with your desired username for an account on this Gitea instance.
I am currently interested in hiring a contractor skilled with the Go standard library interfaces to specify this tool in full and develop a prototype implementation.