Compare commits

..

No commits in common. "92b92c190d7fc9856d0bd0355b42b967732878c4" and "0f86942849f5a3f5c798e120089166823bda8681" have entirely different histories.

3 changed files with 77 additions and 84 deletions

View File

@ -10,6 +10,3 @@ compile: $(PROTOC_GEN_GO)
clean: clean:
rm -rfv proto/*.pb.go rm -rfv proto/*.pb.go
fmt:
prettier -w .

101
README.md
View File

@ -10,11 +10,11 @@ Currently, the solution that people are using are sidecar files in the format of
Real issues I face: Real issues I face:
- when I plug in an ExFAT hard drive, I don't know if any files on the filesystem are corrupted or missing * when I plug in an ExFAT hard drive, I don't know if any files on the filesystem are corrupted or missing
- current ad-hoc solution are `SHASUMS`/`SHASUMS.asc` files * current ad-hoc solution are `SHASUMS`/`SHASUMS.asc` files
- when I want to mirror an HTTP archive, I have to use special tools like debmirror that understand the archive format * when I want to mirror an HTTP archive, I have to use special tools like debmirror that understand the archive format
- the debian repository metadata structure is hot garbage * the debian repository metadata structure is hot garbage
- when I download a large file via HTTP, I have no way of knowing if the file content is what it's supposed to be * when I download a large file via HTTP, I have no way of knowing if the file content is what it's supposed to be
# Proposed Solution # Proposed Solution
@ -24,78 +24,76 @@ The manifest file would be called `index.mf`, and the tool for generating such w
The manifest file would do several important things: The manifest file would do several important things:
- have a standard filename, so if given `https://example.com/downloadpackage/` one could fetch `https://example.com/downloadpackage/index.mf` to enumerate the full directory listing. * have a standard filename, so if given `https://example.com/downloadpackage/` one could fetch `https://example.com/downloadpackage/index.mf` to enumerate the full directory listing.
- contain a version field for extensibility * contain a version field for extensibility
- contain structured data (protobuf, json, or cbor) * contain structured data (protobuf, json, or cbor)
- provide an inner signed container, so that the manifest file itself can embed a signature and a public key alongside in a single file * provide an inner signed container, so that the manifest file itself can embed a signature and a public key alongside in a single file
- contain a list of files, each with a relative path to the manifest * contain a list of files, each with a relative path to the manifest
- contain manifest timestamp * contain manifest timestamp
- contain ctime/mtime information for files so that file metadata can be preserved * contain ctime/mtime information for files so that file metadata can be preserved
- contain cryptographic checksums in several different algorithms for each file * contain cryptographic checksums in several different algorithms for each file
- probably encoded with multihash to indicate algo + hash * probably encoded with multihash to indicate algo + hash
- sha256 at the minimum * sha256 at the minimum
- would be nice to include an IPFS/IPLD CIDv1 root hash for each file, which likely involves doing an ipfs file object chunking * would be nice to include an IPFS/IPLD CIDv1 root hash for each file, which likely involves doing an ipfs file object chunking
- maybe even including the complete IPFS/IPLD directory tree objects and chunklists? * maybe even including the complete IPFS/IPLD directory tree objects and chunklists?
- this is because generating an `index.mf` does not imply publishing on ipfs at that time * this is because generating an `index.mf` does not imply publishing on ipfs at that time
- maybe a bittorrent chunklist for torrent client compatibility? perhaps a top-level infohash for the whole manifest? * maybe a bittorrent chunklist for torrent client compatibility? perhaps a top-level infohash for the whole manifest?
# Design Goals # Design Goals
- Replace SHASUMS/SHASUMS.asc files * Replace SHASUMS/SHASUMS.asc files
- be easy to download/resume a whole directory tree published via HTTP * be easy to download/resume a whole directory tree published via HTTP
- be easy to use across protocols (given an HTTPS url, fetch manifest, then download file contents via bittorrent or ipfs) * be easy to use across protocols (given an HTTPS url, fetch manifest, then download file contents via bittorrent or ipfs)
- not strongly coupled to HTTP use case, should not require special hosting, content types, or HTTP headers being sent * not strongly coupled to HTTP use case, should not require special hosting, content types, or HTTP headers being sent
# Non-Goals # Non-Goals
* Manifest generation speed
- Manifest generation speed * likely involves IPFS chunking, bittorrent chunking, and several different cryptographic hash functions over the entirety of each and every file
- likely involves IPFS chunking, bittorrent chunking, and several different cryptographic hash functions over the entirety of each and every file * Small manifest file size (within reason)
- Small manifest file size (within reason) * 30MiB files are "small" these days, given modern storage/bandwidth
- 30MiB files are "small" these days, given modern storage/bandwidth * metadata size should not be used as an excuse to sacrifice utility (such as providing checksums over each chunk of a large file)
- metadata size should not be used as an excuse to sacrifice utility (such as providing checksums over each chunk of a large file)
# Open Questions # Open Questions
- Should the manifest file include checksums of individual file chunks, or just for the whole assembled file? * Should the manifest file include checksums of individual file chunks, or just for the whole assembled file?
* If so, should the chunksize be fixed or dynamic?
- If so, should the chunksize be fixed or dynamic? * Should the manifest signature format be GnuPG signatures, or those from
- Should the manifest signature format be GnuPG signatures, or those from
OpenBSD's signify (of which there is a good [golang OpenBSD's signify (of which there is a good [golang
implementation](https://github.com/frankbraun/gosignify)? implementation](https://github.com/frankbraun/gosignify)?
- Should the on-disk serialization format be proto3 or json? * Should the on-disk serialization format be proto3 or json?
# Tool Examples # Tool Examples
- `mfer gen` / `mfer gen .` * `mfer gen` / `mfer gen .`
- recurses under current directory and writes out an `index.mf` * recurses under current directory and writes out an `index.mf`
- `mfer check` / `mfer check .` * `mfer check` / `mfer check .`
- verifies checksums of all files in manifest, displaying error and exiting nonzero if any files are missing or corrupted * verifies checksums of all files in manifest, displaying error and exiting nonzero if any files are missing or corrupted
- `mfer fetch https://example.com/stuff/` * `mfer fetch https://example.com/stuff/`
- fetches `/stuff/index.mf` and downloads all files listed in manifest, optionally resuming any that already exist locally, and assures cryptographic integrity of downloaded files. * fetches `/stuff/index.mf` and downloads all files listed in manifest, optionally resuming any that already exist locally, and assures cryptographic integrity of downloaded files.
# Implementation Plan # Implementation Plan
## Phase One: ## Phase One:
- golang module for reusability/embedding * golang module for reusability/embedding
- golang module client providing `mfer` CLI * golang module client providing `mfer` CLI
## Phase Two: ## Phase Two:
- ES6 or TypeScript module for reusability/embedding * ES6 or TypeScript module for reusability/embedding
- ES6/TypeScript module client providing `mfer.js` CLI * ES6/TypeScript module client providing `mfer.js` CLI
# Hopes And Dreams # Hopes And Dreams
- `aria2c https://example.com/manifestdirectory/` * `aria2c https://example.com/manifestdirectory/`
- (fetches `https://example.com/manifestdirectory/index.mf`, downloads and checksums all files, resumes any that exist locally already) * (fetches `https://example.com/manifestdirectory/index.mf`, downloads and checksums all files, resumes any that exist locally already)
- `mfer fetch https://example.com/manifestdirectory/` * `mfer fetch https://example.com/manifestdirectory/`
- a command line option to zero/omit mtime/ctime, as well as manifest timestamp, and sort all directory listings so that manifest file generation is deterministic/reproducible * a command line option to zero/omit mtime/ctime, as well as manifest timestamp, and sort all directory listings so that manifest file generation is deterministic/reproducible
- URL format `mfer fetch https://exmaple.com/manifestdirectory/?key=5539AD00DE4C42F3AFE11575052443F4DF2A55C2` to assert in the URL which PGP signing key should be used in the manifest, so that shared URLs have a cryptographic trust root * URL format `mfer fetch https://exmaple.com/manifestdirectory/?key=5539AD00DE4C42F3AFE11575052443F4DF2A55C2` to assert in the URL which PGP signing key should be used in the manifest, so that shared URLs have a cryptographic trust root
- a "well-known" key in the manifest that maps well known keys (could reuse the http spec) to specific file paths in the manifest. * a "well-known" key in the manifest that maps well known keys (could reuse the http spec) to specific file paths in the manifest.
- example: a `berlin.sneak.app.slideshow` key that maps to a json slideshow config listing what image paths to show, and for how long, and in what order * example: a `berlin.sneak.app.slideshow` key that maps to a json slideshow config listing what image paths to show, and for how long, and in what order
# Use Cases # Use Cases
@ -116,7 +114,6 @@ I use filesystems that don't include data checksums, and I would like a cryptogr
I would like to be able to plug in a hard drive or flash drive and, if there is an `index.mf` in the root, automatically detect missing/corrupted files, regardless of filesystem format. I would like to be able to plug in a hard drive or flash drive and, if there is an `index.mf` in the root, automatically detect missing/corrupted files, regardless of filesystem format.
# Collaboration # Collaboration
Please email [`sneak@sneak.berlin`](mailto:sneak@sneak.berlin) with your desired username for an account on this Gitea instance. Please email [`sneak@sneak.berlin`](mailto:sneak@sneak.berlin) with your desired username for an account on this Gitea instance.
I am currently interested in hiring a contractor skilled with the Go standard library interfaces to specify this tool in full and develop a prototype implementation. I am currently interested in hiring a contractor skilled with the Go standard library interfaces to specify this tool in full and develop a prototype implementation.

View File

@ -28,9 +28,8 @@ message MFFile {
message MFFilePath { message MFFilePath {
string path = 1; string path = 1;
uint64 size = 2;
// when verifying, count(hashes) must be > 0. // when verifying, count(hashes) must be > 0.
optional repeated MFFileChecksum hashes = 201; optional repeated MFFileChecksum hashes = 2;
optional string mimeType = 101; optional string mimeType = 101;
optional string mtime = 102; optional string mtime = 102;
optional string ctime = 103; optional string ctime = 103;