Compare commits
	
		
			3 Commits
		
	
	
		
			0f86942849
			...
			92b92c190d
		
	
	| Author | SHA1 | Date | |
|---|---|---|---|
| 92b92c190d | |||
| 38df94a9b2 | |||
| 4a6469b003 | 
							
								
								
									
										3
									
								
								Makefile
									
									
									
									
									
								
							
							
						
						
									
										3
									
								
								Makefile
									
									
									
									
									
								
							@ -10,3 +10,6 @@ compile: $(PROTOC_GEN_GO)
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
clean:
 | 
					clean:
 | 
				
			||||||
	rm -rfv proto/*.pb.go
 | 
						rm -rfv proto/*.pb.go
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					fmt:
 | 
				
			||||||
 | 
						prettier -w .
 | 
				
			||||||
 | 
				
			|||||||
							
								
								
									
										105
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										105
									
								
								README.md
									
									
									
									
									
								
							@ -4,17 +4,17 @@ Manifest file generator and checker.
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
# Problem Statement
 | 
					# Problem Statement
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Given a plain URL, there is no standard way to safely and programmatically download everything "under" that URL path.  `wget -r` can traverse directory listings if they're enabled, but every server has a different format, and this does not verify cryptographic integrity of the files, or enable them to be fetched using a different protocol other than HTTP/s.
 | 
					Given a plain URL, there is no standard way to safely and programmatically download everything "under" that URL path. `wget -r` can traverse directory listings if they're enabled, but every server has a different format, and this does not verify cryptographic integrity of the files, or enable them to be fetched using a different protocol other than HTTP/s.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Currently, the solution that people are using are sidecar files in the format of `SHASUMS` checksum files, as well as a `SHASUMS.asc` PGP detached signature.  This is not checksum-algorithm-agnostic and the sidecar file is not always consistently named.
 | 
					Currently, the solution that people are using are sidecar files in the format of `SHASUMS` checksum files, as well as a `SHASUMS.asc` PGP detached signature. This is not checksum-algorithm-agnostic and the sidecar file is not always consistently named.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Real issues I face:
 | 
					Real issues I face:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* when I plug in an ExFAT hard drive, I don't know if any files on the filesystem are corrupted or missing
 | 
					- when I plug in an ExFAT hard drive, I don't know if any files on the filesystem are corrupted or missing
 | 
				
			||||||
    * current ad-hoc solution are `SHASUMS`/`SHASUMS.asc` files
 | 
					  - current ad-hoc solution are `SHASUMS`/`SHASUMS.asc` files
 | 
				
			||||||
* when I want to mirror an HTTP archive, I have to use special tools like debmirror that understand the archive format
 | 
					- when I want to mirror an HTTP archive, I have to use special tools like debmirror that understand the archive format
 | 
				
			||||||
    * the debian repository metadata structure is hot garbage
 | 
					  - the debian repository metadata structure is hot garbage
 | 
				
			||||||
* when I download a large file via HTTP, I have no way of knowing if the file content is what it's supposed to be
 | 
					- when I download a large file via HTTP, I have no way of knowing if the file content is what it's supposed to be
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Proposed Solution
 | 
					# Proposed Solution
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -24,76 +24,78 @@ The manifest file would be called `index.mf`, and the tool for generating such w
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
The manifest file would do several important things:
 | 
					The manifest file would do several important things:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* have a standard filename, so if given `https://example.com/downloadpackage/` one could fetch `https://example.com/downloadpackage/index.mf` to enumerate the full directory listing.
 | 
					- have a standard filename, so if given `https://example.com/downloadpackage/` one could fetch `https://example.com/downloadpackage/index.mf` to enumerate the full directory listing.
 | 
				
			||||||
* contain a version field for extensibility
 | 
					- contain a version field for extensibility
 | 
				
			||||||
* contain structured data (protobuf, json, or cbor)
 | 
					- contain structured data (protobuf, json, or cbor)
 | 
				
			||||||
* provide an inner signed container, so that the manifest file itself can embed a signature and a public key alongside in a single file
 | 
					- provide an inner signed container, so that the manifest file itself can embed a signature and a public key alongside in a single file
 | 
				
			||||||
* contain a list of files, each with a relative path to the manifest
 | 
					- contain a list of files, each with a relative path to the manifest
 | 
				
			||||||
* contain manifest timestamp
 | 
					- contain manifest timestamp
 | 
				
			||||||
* contain ctime/mtime information for files so that file metadata can be preserved
 | 
					- contain ctime/mtime information for files so that file metadata can be preserved
 | 
				
			||||||
* contain cryptographic checksums in several different algorithms for each file
 | 
					- contain cryptographic checksums in several different algorithms for each file
 | 
				
			||||||
    * probably encoded with multihash to indicate algo + hash
 | 
					  - probably encoded with multihash to indicate algo + hash
 | 
				
			||||||
    * sha256 at the minimum
 | 
					  - sha256 at the minimum
 | 
				
			||||||
    * would be nice to include an IPFS/IPLD CIDv1 root hash for each file, which likely involves doing an ipfs file object chunking
 | 
					  - would be nice to include an IPFS/IPLD CIDv1 root hash for each file, which likely involves doing an ipfs file object chunking
 | 
				
			||||||
    * maybe even including the complete IPFS/IPLD directory tree objects and chunklists?
 | 
					  - maybe even including the complete IPFS/IPLD directory tree objects and chunklists?
 | 
				
			||||||
        * this is because generating an `index.mf` does not imply publishing on ipfs at that time
 | 
					    - this is because generating an `index.mf` does not imply publishing on ipfs at that time
 | 
				
			||||||
    * maybe a bittorrent chunklist for torrent client compatibility? perhaps a top-level infohash for the whole manifest?
 | 
					  - maybe a bittorrent chunklist for torrent client compatibility? perhaps a top-level infohash for the whole manifest?
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Design Goals
 | 
					# Design Goals
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Replace SHASUMS/SHASUMS.asc files
 | 
					- Replace SHASUMS/SHASUMS.asc files
 | 
				
			||||||
* be easy to download/resume a whole directory tree published via HTTP
 | 
					- be easy to download/resume a whole directory tree published via HTTP
 | 
				
			||||||
* be easy to use across protocols (given an HTTPS url, fetch manifest, then download file contents via bittorrent or ipfs)
 | 
					- be easy to use across protocols (given an HTTPS url, fetch manifest, then download file contents via bittorrent or ipfs)
 | 
				
			||||||
* not strongly coupled to HTTP use case, should not require special hosting, content types, or HTTP headers being sent
 | 
					- not strongly coupled to HTTP use case, should not require special hosting, content types, or HTTP headers being sent
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Non-Goals
 | 
					# Non-Goals
 | 
				
			||||||
* Manifest generation speed
 | 
					
 | 
				
			||||||
    * likely involves IPFS chunking, bittorrent chunking, and several different cryptographic hash functions over the entirety of each and every file
 | 
					- Manifest generation speed
 | 
				
			||||||
* Small manifest file size (within reason)
 | 
					  - likely involves IPFS chunking, bittorrent chunking, and several different cryptographic hash functions over the entirety of each and every file
 | 
				
			||||||
    * 30MiB files are "small" these days, given modern storage/bandwidth
 | 
					- Small manifest file size (within reason)
 | 
				
			||||||
    * metadata size should not be used as an excuse to sacrifice utility (such as providing checksums over each chunk of a large file)
 | 
					  - 30MiB files are "small" these days, given modern storage/bandwidth
 | 
				
			||||||
 | 
					  - metadata size should not be used as an excuse to sacrifice utility (such as providing checksums over each chunk of a large file)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Open Questions
 | 
					# Open Questions
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Should the manifest file include checksums of individual file chunks, or just for the whole assembled file?
 | 
					- Should the manifest file include checksums of individual file chunks, or just for the whole assembled file?
 | 
				
			||||||
    * If so, should the chunksize be fixed or dynamic?
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Should the manifest signature format be GnuPG signatures, or those from
 | 
					  - If so, should the chunksize be fixed or dynamic?
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- Should the manifest signature format be GnuPG signatures, or those from
 | 
				
			||||||
  OpenBSD's signify (of which there is a good [golang
 | 
					  OpenBSD's signify (of which there is a good [golang
 | 
				
			||||||
  implementation](https://github.com/frankbraun/gosignify)?
 | 
					  implementation](https://github.com/frankbraun/gosignify)?
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Should the on-disk serialization format be proto3 or json?
 | 
					- Should the on-disk serialization format be proto3 or json?
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Tool Examples
 | 
					# Tool Examples
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `mfer gen` / `mfer gen .`
 | 
					- `mfer gen` / `mfer gen .`
 | 
				
			||||||
    * recurses under current directory and writes out an `index.mf`
 | 
					  - recurses under current directory and writes out an `index.mf`
 | 
				
			||||||
* `mfer check` / `mfer check .`
 | 
					- `mfer check` / `mfer check .`
 | 
				
			||||||
    * verifies checksums of all files in manifest, displaying error and exiting nonzero if any files are missing or corrupted
 | 
					  - verifies checksums of all files in manifest, displaying error and exiting nonzero if any files are missing or corrupted
 | 
				
			||||||
* `mfer fetch https://example.com/stuff/`
 | 
					- `mfer fetch https://example.com/stuff/`
 | 
				
			||||||
    * fetches `/stuff/index.mf` and downloads all files listed in manifest, optionally resuming any that already exist locally, and assures cryptographic integrity of downloaded files.
 | 
					  - fetches `/stuff/index.mf` and downloads all files listed in manifest, optionally resuming any that already exist locally, and assures cryptographic integrity of downloaded files.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Implementation Plan
 | 
					# Implementation Plan
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Phase One:
 | 
					## Phase One:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* golang module for reusability/embedding
 | 
					- golang module for reusability/embedding
 | 
				
			||||||
* golang module client providing `mfer` CLI
 | 
					- golang module client providing `mfer` CLI
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Phase Two:
 | 
					## Phase Two:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* ES6 or TypeScript module for reusability/embedding
 | 
					- ES6 or TypeScript module for reusability/embedding
 | 
				
			||||||
* ES6/TypeScript module client providing `mfer.js` CLI
 | 
					- ES6/TypeScript module client providing `mfer.js` CLI
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Hopes And Dreams
 | 
					# Hopes And Dreams
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `aria2c https://example.com/manifestdirectory/`
 | 
					- `aria2c https://example.com/manifestdirectory/`
 | 
				
			||||||
    * (fetches `https://example.com/manifestdirectory/index.mf`, downloads and checksums all files, resumes any that exist locally already)
 | 
					  - (fetches `https://example.com/manifestdirectory/index.mf`, downloads and checksums all files, resumes any that exist locally already)
 | 
				
			||||||
* `mfer fetch https://example.com/manifestdirectory/`
 | 
					- `mfer fetch https://example.com/manifestdirectory/`
 | 
				
			||||||
* a command line option to zero/omit mtime/ctime, as well as manifest timestamp, and sort all directory listings so that manifest file generation is deterministic/reproducible
 | 
					- a command line option to zero/omit mtime/ctime, as well as manifest timestamp, and sort all directory listings so that manifest file generation is deterministic/reproducible
 | 
				
			||||||
* URL format `mfer fetch https://exmaple.com/manifestdirectory/?key=5539AD00DE4C42F3AFE11575052443F4DF2A55C2` to assert in the URL which PGP signing key should be used in the manifest, so that shared URLs have a cryptographic trust root
 | 
					- URL format `mfer fetch https://exmaple.com/manifestdirectory/?key=5539AD00DE4C42F3AFE11575052443F4DF2A55C2` to assert in the URL which PGP signing key should be used in the manifest, so that shared URLs have a cryptographic trust root
 | 
				
			||||||
* a "well-known" key in the manifest that maps well known keys (could reuse the http spec) to specific file paths in the manifest.
 | 
					- a "well-known" key in the manifest that maps well known keys (could reuse the http spec) to specific file paths in the manifest.
 | 
				
			||||||
    * example: a `berlin.sneak.app.slideshow` key that maps to a json slideshow config listing what image paths to show, and for how long, and in what order
 | 
					  - example: a `berlin.sneak.app.slideshow` key that maps to a json slideshow config listing what image paths to show, and for how long, and in what order
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Use Cases
 | 
					# Use Cases
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -114,6 +116,7 @@ I use filesystems that don't include data checksums, and I would like a cryptogr
 | 
				
			|||||||
I would like to be able to plug in a hard drive or flash drive and, if there is an `index.mf` in the root, automatically detect missing/corrupted files, regardless of filesystem format.
 | 
					I would like to be able to plug in a hard drive or flash drive and, if there is an `index.mf` in the root, automatically detect missing/corrupted files, regardless of filesystem format.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Collaboration
 | 
					# Collaboration
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Please email [`sneak@sneak.berlin`](mailto:sneak@sneak.berlin) with your desired username for an account on this Gitea instance.
 | 
					Please email [`sneak@sneak.berlin`](mailto:sneak@sneak.berlin) with your desired username for an account on this Gitea instance.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
I am currently interested in hiring a contractor skilled with the Go standard library interfaces to specify this tool in full and develop a prototype implementation.
 | 
					I am currently interested in hiring a contractor skilled with the Go standard library interfaces to specify this tool in full and develop a prototype implementation.
 | 
				
			||||||
 | 
				
			|||||||
@ -3,33 +3,34 @@ syntax = "proto3";
 | 
				
			|||||||
option go_package = "mfer";
 | 
					option go_package = "mfer";
 | 
				
			||||||
 | 
					
 | 
				
			||||||
message MFFile {
 | 
					message MFFile {
 | 
				
			||||||
  enum Version {
 | 
					    enum Version {
 | 
				
			||||||
    NONE = 0;
 | 
					        NONE = 0;
 | 
				
			||||||
    ONE = 1; // only one for now
 | 
					        ONE = 1; // only one for now
 | 
				
			||||||
  }
 | 
					    }
 | 
				
			||||||
  Version version = 1;
 | 
					    Version version = 1;
 | 
				
			||||||
  bytes innerMessage = 2;
 | 
					    bytes innerMessage = 2;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  // these are used solely to detect corruption/truncation
 | 
					    // these are used solely to detect corruption/truncation
 | 
				
			||||||
  // and not for cryptographic integrity.
 | 
					    // and not for cryptographic integrity.
 | 
				
			||||||
  uint64 size = 3;
 | 
					    uint64 size = 3;
 | 
				
			||||||
  bytes sha256 = 4;
 | 
					    bytes sha256 = 4;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  // think we might use gosignify instead of gpg:
 | 
					    // think we might use gosignify instead of gpg:
 | 
				
			||||||
  // github.com/frankbraun/gosignify
 | 
					    // github.com/frankbraun/gosignify
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  //detached signature, ascii or binary
 | 
					    //detached signature, ascii or binary
 | 
				
			||||||
  optional bytes signature = 5;
 | 
					    optional bytes signature = 5;
 | 
				
			||||||
  //full GPG key id
 | 
					    //full GPG key id
 | 
				
			||||||
  optional bytes signer = 6;
 | 
					    optional bytes signer = 6;
 | 
				
			||||||
  //full GPG signing public key, ascii or binary
 | 
					    //full GPG signing public key, ascii or binary
 | 
				
			||||||
  optional bytes signingPubKey = 7;
 | 
					    optional bytes signingPubKey = 7;
 | 
				
			||||||
}
 | 
					}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
message MFFilePath {
 | 
					message MFFilePath {
 | 
				
			||||||
    string path = 1;
 | 
					    string path = 1;
 | 
				
			||||||
 | 
					    uint64 size = 2;
 | 
				
			||||||
    // when verifying, count(hashes) must be > 0.
 | 
					    // when verifying, count(hashes) must be > 0.
 | 
				
			||||||
    optional repeated MFFileChecksum hashes = 2;
 | 
					    optional repeated MFFileChecksum hashes = 201;
 | 
				
			||||||
    optional string mimeType = 101;
 | 
					    optional string mimeType = 101;
 | 
				
			||||||
    optional string mtime = 102;
 | 
					    optional string mtime = 102;
 | 
				
			||||||
    optional string ctime = 103;
 | 
					    optional string ctime = 103;
 | 
				
			||||||
@ -42,11 +43,11 @@ message MFFileChecksum {
 | 
				
			|||||||
}
 | 
					}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
message MFFileInner {
 | 
					message MFFileInner {
 | 
				
			||||||
  enum Version {
 | 
					    enum Version {
 | 
				
			||||||
    NONE = 0;
 | 
					        NONE = 0;
 | 
				
			||||||
    ONE = 1; // only one for now
 | 
					        ONE = 1; // only one for now
 | 
				
			||||||
  }
 | 
					    }
 | 
				
			||||||
  Version version = 1;
 | 
					    Version version = 1;
 | 
				
			||||||
  uint64 count = 2;
 | 
					    uint64 count = 2;
 | 
				
			||||||
  repeated MFFilePath files = 3;
 | 
					    repeated MFFilePath files = 3;
 | 
				
			||||||
}
 | 
					}
 | 
				
			||||||
 | 
				
			|||||||
		Loading…
	
		Reference in New Issue
	
	Block a user