format readme, add build status badge
Some checks failed
continuous-integration/drone/push Build is failing
Some checks failed
continuous-integration/drone/push Build is failing
This commit is contained in:
parent
ea96e0d786
commit
7e4f8366a7
114
README.md
114
README.md
@ -2,19 +2,33 @@
|
|||||||
|
|
||||||
Manifest file generator and checker.
|
Manifest file generator and checker.
|
||||||
|
|
||||||
|
# Build Status
|
||||||
|
|
||||||
|
[![Build Status](https://drone.datavi.be/api/badges/sneak/mfer/status.svg)](https://drone.datavi.be/sneak/mfer)
|
||||||
|
|
||||||
# Problem Statement
|
# Problem Statement
|
||||||
|
|
||||||
Given a plain URL, there is no standard way to safely and programmatically download everything "under" that URL path. `wget -r` can traverse directory listings if they're enabled, but every server has a different format, and this does not verify cryptographic integrity of the files, or enable them to be fetched using a different protocol other than HTTP/s.
|
Given a plain URL, there is no standard way to safely and programmatically
|
||||||
|
download everything "under" that URL path. `wget -r` can traverse directory
|
||||||
|
listings if they're enabled, but every server has a different format, and
|
||||||
|
this does not verify cryptographic integrity of the files, or enable them to
|
||||||
|
be fetched using a different protocol other than HTTP/s.
|
||||||
|
|
||||||
Currently, the solution that people are using are sidecar files in the format of `SHASUMS` checksum files, as well as a `SHASUMS.asc` PGP detached signature. This is not checksum-algorithm-agnostic and the sidecar file is not always consistently named.
|
Currently, the solution that people are using are sidecar files in the
|
||||||
|
format of `SHASUMS` checksum files, as well as a `SHASUMS.asc` PGP detached
|
||||||
|
signature. This is not checksum-algorithm-agnostic and the sidecar file is
|
||||||
|
not always consistently named.
|
||||||
|
|
||||||
Real issues I face:
|
Real issues I face:
|
||||||
|
|
||||||
- when I plug in an ExFAT hard drive, I don't know if any files on the filesystem are corrupted or missing
|
- when I plug in an ExFAT hard drive, I don't know if any files on the
|
||||||
|
filesystem are corrupted or missing
|
||||||
- current ad-hoc solution are `SHASUMS`/`SHASUMS.asc` files
|
- current ad-hoc solution are `SHASUMS`/`SHASUMS.asc` files
|
||||||
- when I want to mirror an HTTP archive, I have to use special tools like debmirror that understand the archive format
|
- when I want to mirror an HTTP archive, I have to use special tools like
|
||||||
|
debmirror that understand the archive format
|
||||||
- the debian repository metadata structure is hot garbage
|
- the debian repository metadata structure is hot garbage
|
||||||
- when I download a large file via HTTP, I have no way of knowing if the file content is what it's supposed to be
|
- when I download a large file via HTTP, I have no way of knowing if the
|
||||||
|
file content is what it's supposed to be
|
||||||
|
|
||||||
# Proposed Solution
|
# Proposed Solution
|
||||||
|
|
||||||
@ -24,35 +38,50 @@ The manifest file would be called `index.mf`, and the tool for generating such w
|
|||||||
|
|
||||||
The manifest file would do several important things:
|
The manifest file would do several important things:
|
||||||
|
|
||||||
- have a standard filename, so if given `https://example.com/downloadpackage/` one could fetch `https://example.com/downloadpackage/index.mf` to enumerate the full directory listing.
|
- have a standard filename, so if given
|
||||||
|
`https://example.com/downloadpackage/` one could fetch
|
||||||
|
`https://example.com/downloadpackage/index.mf` to enumerate the full
|
||||||
|
directory listing.
|
||||||
- contain a version field for extensibility
|
- contain a version field for extensibility
|
||||||
- contain structured data (protobuf, json, or cbor)
|
- contain structured data (protobuf, json, or cbor)
|
||||||
- provide an inner signed container, so that the manifest file itself can embed a signature and a public key alongside in a single file
|
- provide an inner signed container, so that the manifest file itself can
|
||||||
|
embed a signature and a public key alongside in a single file
|
||||||
- contain a list of files, each with a relative path to the manifest
|
- contain a list of files, each with a relative path to the manifest
|
||||||
- contain manifest timestamp
|
- contain manifest timestamp
|
||||||
- contain ctime/mtime information for files so that file metadata can be preserved
|
- contain ctime/mtime information for files so that file metadata can be
|
||||||
- contain cryptographic checksums in several different algorithms for each file
|
preserved
|
||||||
|
- contain cryptographic checksums in several different algorithms for each
|
||||||
|
file
|
||||||
- probably encoded with multihash to indicate algo + hash
|
- probably encoded with multihash to indicate algo + hash
|
||||||
- sha256 at the minimum
|
- sha256 at the minimum
|
||||||
- would be nice to include an IPFS/IPLD CIDv1 root hash for each file, which likely involves doing an ipfs file object chunking
|
- would be nice to include an IPFS/IPLD CIDv1 root hash for each file,
|
||||||
- maybe even including the complete IPFS/IPLD directory tree objects and chunklists?
|
which likely involves doing an ipfs file object chunking
|
||||||
- this is because generating an `index.mf` does not imply publishing on ipfs at that time
|
- maybe even including the complete IPFS/IPLD directory tree objects and
|
||||||
- maybe a bittorrent chunklist for torrent client compatibility? perhaps a top-level infohash for the whole manifest?
|
chunklists?
|
||||||
|
- this is because generating an `index.mf` does not imply publishing on
|
||||||
|
ipfs at that time
|
||||||
|
- maybe a bittorrent chunklist for torrent client compatibility? perhaps a
|
||||||
|
top-level infohash for the whole manifest?
|
||||||
|
|
||||||
# Design Goals
|
# Design Goals
|
||||||
|
|
||||||
- Replace SHASUMS/SHASUMS.asc files
|
- Replace SHASUMS/SHASUMS.asc files
|
||||||
- be easy to download/resume a whole directory tree published via HTTP
|
- be easy to download/resume a whole directory tree published via HTTP
|
||||||
- be easy to use across protocols (given an HTTPS url, fetch manifest, then download file contents via bittorrent or ipfs)
|
- be easy to use across protocols (given an HTTPS url, fetch manifest, then
|
||||||
- not strongly coupled to HTTP use case, should not require special hosting, content types, or HTTP headers being sent
|
download file contents via bittorrent or ipfs)
|
||||||
|
- not strongly coupled to HTTP use case, should not require special hosting,
|
||||||
|
content types, or HTTP headers being sent
|
||||||
|
|
||||||
# Non-Goals
|
# Non-Goals
|
||||||
|
|
||||||
- Manifest generation speed
|
- Manifest generation speed
|
||||||
- likely involves IPFS chunking, bittorrent chunking, and several different cryptographic hash functions over the entirety of each and every file
|
- likely involves IPFS chunking, bittorrent chunking, and several
|
||||||
|
different cryptographic hash functions over the entirety of each and
|
||||||
|
every file
|
||||||
- Small manifest file size (within reason)
|
- Small manifest file size (within reason)
|
||||||
- 30MiB files are "small" these days, given modern storage/bandwidth
|
- 30MiB files are "small" these days, given modern storage/bandwidth
|
||||||
- metadata size should not be used as an excuse to sacrifice utility (such as providing checksums over each chunk of a large file)
|
- metadata size should not be used as an excuse to sacrifice utility (such
|
||||||
|
as providing checksums over each chunk of a large file)
|
||||||
|
|
||||||
# Open Questions
|
# Open Questions
|
||||||
|
|
||||||
@ -71,9 +100,12 @@ The manifest file would do several important things:
|
|||||||
- `mfer gen` / `mfer gen .`
|
- `mfer gen` / `mfer gen .`
|
||||||
- recurses under current directory and writes out an `index.mf`
|
- recurses under current directory and writes out an `index.mf`
|
||||||
- `mfer check` / `mfer check .`
|
- `mfer check` / `mfer check .`
|
||||||
- verifies checksums of all files in manifest, displaying error and exiting nonzero if any files are missing or corrupted
|
- verifies checksums of all files in manifest, displaying error and
|
||||||
|
exiting nonzero if any files are missing or corrupted
|
||||||
- `mfer fetch https://example.com/stuff/`
|
- `mfer fetch https://example.com/stuff/`
|
||||||
- fetches `/stuff/index.mf` and downloads all files listed in manifest, optionally resuming any that already exist locally, and assures cryptographic integrity of downloaded files.
|
- fetches `/stuff/index.mf` and downloads all files listed in manifest,
|
||||||
|
optionally resuming any that already exist locally, and assures
|
||||||
|
cryptographic integrity of downloaded files.
|
||||||
|
|
||||||
# Implementation Plan
|
# Implementation Plan
|
||||||
|
|
||||||
@ -90,33 +122,55 @@ The manifest file would do several important things:
|
|||||||
# Hopes And Dreams
|
# Hopes And Dreams
|
||||||
|
|
||||||
- `aria2c https://example.com/manifestdirectory/`
|
- `aria2c https://example.com/manifestdirectory/`
|
||||||
- (fetches `https://example.com/manifestdirectory/index.mf`, downloads and checksums all files, resumes any that exist locally already)
|
- (fetches `https://example.com/manifestdirectory/index.mf`, downloads and
|
||||||
|
checksums all files, resumes any that exist locally already)
|
||||||
- `mfer fetch https://example.com/manifestdirectory/`
|
- `mfer fetch https://example.com/manifestdirectory/`
|
||||||
- a command line option to zero/omit mtime/ctime, as well as manifest timestamp, and sort all directory listings so that manifest file generation is deterministic/reproducible
|
- a command line option to zero/omit mtime/ctime, as well as manifest
|
||||||
- URL format `mfer fetch https://exmaple.com/manifestdirectory/?key=5539AD00DE4C42F3AFE11575052443F4DF2A55C2` to assert in the URL which PGP signing key should be used in the manifest, so that shared URLs have a cryptographic trust root
|
timestamp, and sort all directory listings so that manifest file
|
||||||
- a "well-known" key in the manifest that maps well known keys (could reuse the http spec) to specific file paths in the manifest.
|
generation is deterministic/reproducible
|
||||||
- example: a `berlin.sneak.app.slideshow` key that maps to a json slideshow config listing what image paths to show, and for how long, and in what order
|
- URL format `mfer fetch
|
||||||
|
https://exmaple.com/manifestdirectory/?key=5539AD00DE4C42F3AFE11575052443F4DF2A55C2`
|
||||||
|
to assert in the URL which PGP signing key should be used in the manifest,
|
||||||
|
so that shared URLs have a cryptographic trust root
|
||||||
|
- a "well-known" key in the manifest that maps well known keys (could reuse
|
||||||
|
the http spec) to specific file paths in the manifest.
|
||||||
|
- example: a `berlin.sneak.app.slideshow` key that maps to a json
|
||||||
|
slideshow config listing what image paths to show, and for how long, and
|
||||||
|
in what order
|
||||||
|
|
||||||
# Use Cases
|
# Use Cases
|
||||||
|
|
||||||
## Web Images
|
## Web Images
|
||||||
|
|
||||||
I'd like to be able to put a bunch of images into a directory, generate a manifest, and then point a slideshow client (such as an ambient display, or a react app with the target directory in a query string arg) at that statically hosted directory, and have it discover the full list of images available at that URL.
|
I'd like to be able to put a bunch of images into a directory, generate a
|
||||||
|
manifest, and then point a slideshow client (such as an ambient display, or
|
||||||
|
a react app with the target directory in a query string arg) at that
|
||||||
|
statically hosted directory, and have it discover the full list of images
|
||||||
|
available at that URL.
|
||||||
|
|
||||||
## Software Distribution
|
## Software Distribution
|
||||||
|
|
||||||
I'd like to be able to download a whole tree of files available via HTTP resumably by either HTTP or IPFS/BitTorrent without a .torrent file.
|
I'd like to be able to download a whole tree of files available via HTTP
|
||||||
|
resumably by either HTTP or IPFS/BitTorrent without a .torrent file.
|
||||||
|
|
||||||
## Filesystem Archive Integrity
|
## Filesystem Archive Integrity
|
||||||
|
|
||||||
I use filesystems that don't include data checksums, and I would like a cryptographically signed checksum file so that I can later verify that a set of archive files have not been modified, none are missing, and that the checksums have not been altered in storage by a second party.
|
I use filesystems that don't include data checksums, and I would like a
|
||||||
|
cryptographically signed checksum file so that I can later verify that a set
|
||||||
|
of archive files have not been modified, none are missing, and that the
|
||||||
|
checksums have not been altered in storage by a second party.
|
||||||
|
|
||||||
## Filesystem-Independent Checksums
|
## Filesystem-Independent Checksums
|
||||||
|
|
||||||
I would like to be able to plug in a hard drive or flash drive and, if there is an `index.mf` in the root, automatically detect missing/corrupted files, regardless of filesystem format.
|
I would like to be able to plug in a hard drive or flash drive and, if there
|
||||||
|
is an `index.mf` in the root, automatically detect missing/corrupted files,
|
||||||
|
regardless of filesystem format.
|
||||||
|
|
||||||
# Collaboration
|
# Collaboration
|
||||||
|
|
||||||
Please email [`sneak@sneak.berlin`](mailto:sneak@sneak.berlin) with your desired username for an account on this Gitea instance.
|
Please email [`sneak@sneak.berlin`](mailto:sneak@sneak.berlin) with your
|
||||||
|
desired username for an account on this Gitea instance.
|
||||||
|
|
||||||
I am currently interested in hiring a contractor skilled with the Go standard library interfaces to specify this tool in full and develop a prototype implementation.
|
I am currently interested in hiring a contractor skilled with the Go
|
||||||
|
standard library interfaces to specify this tool in full and develop a
|
||||||
|
prototype implementation.
|
||||||
|
Loading…
Reference in New Issue
Block a user