Deliverable 1: Spec document for file format #1

Open
opened 2021-10-26 07:50:16 +00:00 by sneak · 11 comments
Owner

This document is the specification of the on-disk manifest file format.

It is to be written in markdown.

This document is the specification of the on-disk manifest file format. It is to be written in markdown.
sneak changed title from Deliverable 1: Spec document for format to Deliverable 1: Spec document for file format 2021-11-03 11:03:12 +00:00
Author
Owner

I think the file format is:

<64 bit magic number for the file format> + zstd(protobuf of mf.proto MFFile) + <64 bit uint64 length entire .mf file>.

Thus the spec for the file format will describe this, and reference the .proto file.

I think the file format is: <64 bit magic number for the file format> + zstd(protobuf of mf.proto MFFile) + <64 bit uint64 length entire .mf file>. Thus the spec for the file format will describe this, and reference the .proto file.
Author
Owner

The file format spec should be committed into this repo.

The file format spec should be committed into this repo.

Is there any sample spec format I can use? @sneak

Is there any sample spec format I can use? @sneak
Author
Owner

No, this is just a markdown document that you will write from scratch that describes the file format, along with a description of steps used by implementations when creating or validating a manifest file.

No, this is just a markdown document that you will write from scratch that describes the file format, along with a description of steps used by implementations when creating or validating a manifest file.
Author
Owner

the file format description will be something like "the file begins with a magic number, a uint64 in network order, that is specified as constant X. the following bytes are zstd compressed data. the final data in the file is a uint64 in network order that specifies the size of the zstd data, the magic number, and the trailing size field. the zstd compressed data shall be...." etc

"upon verification, first read the magic number to ensure it matches x. second, you read the last 64 bits of the file and retrieve the uint64 size field, and ensure it matches the length of the entire file. then, decompress the zstd payload and ensure the zstd compression is valid. then, deserialize proto..." etc

the idea is that a programmer could take this spec document that describes the file format, and take the .proto, and with just those two things, could implement the format in any language, without asking us questions.

the file format description will be something like "the file begins with a magic number, a uint64 in network order, that is specified as constant X. the following bytes are zstd compressed data. the final data in the file is a uint64 in network order that specifies the size of the zstd data, the magic number, and the trailing size field. the zstd compressed data shall be...." etc "upon verification, first read the magic number to ensure it matches x. second, you read the last 64 bits of the file and retrieve the uint64 size field, and ensure it matches the length of the entire file. then, decompress the zstd payload and ensure the zstd compression is valid. then, deserialize proto..." etc the idea is that a programmer could take this spec document that describes the file format, and take the .proto, and with just those two things, could implement the format in any language, without asking us questions.

As we discussed, following format seems more convenient for me.

<64 bit magic number for the file format> + <64 bit uint64 length entire .mf file> + zstd(protobuf of mf.proto MFFile)

As we discussed, following format seems more convenient for me. ```<64 bit magic number for the file format> + <64 bit uint64 length entire .mf file> + zstd(protobuf of mf.proto MFFile)```
Author
Owner

Sounds good, and makes sense. I put the length at the end to detect truncation, but the length field does that just by existing, so it can go at the beginning and it will also work fine. :)

Sounds good, and makes sense. I put the length at the end to detect truncation, but the length field does that just by existing, so it can go at the beginning and it will also work fine. :)
Author
Owner

path attribute is relative to root directory where index.mf file lives.

path may not resolve to directories above/outside root folder.

so lets say there is https://sneak.berlin/test/index.mf, a path attr of hello.jpg means https://sneak.berlin/test/hello.jpg

a path of dir2/hello.jpg is https://sneak.berlin/test/dir2/hello.jpg

No path attribute may begin ../ as that would violate the constraint of referring only to files inside of the directory containing index.mf. Checking only for leading ../ is insufficient however, as validdirectory/../../.. is also not permitted. The path must be resolved and verified to not be outside of the root of the manifest directory.

`path` attribute is relative to root directory where `index.mf` file lives. `path` may not resolve to directories above/outside root folder. so lets say there is `https://sneak.berlin/test/index.mf`, a path attr of `hello.jpg` means `https://sneak.berlin/test/hello.jpg` a path of `dir2/hello.jpg` is `https://sneak.berlin/test/dir2/hello.jpg` No path attribute may begin `../` as that would violate the constraint of referring only to files inside of the directory containing `index.mf`. Checking only for leading `../` is insufficient however, as `validdirectory/../../..` is also not permitted. The path must be resolved and verified to not be outside of the root of the manifest directory.
Author
Owner

Directories are not listed in the manifest, only regular files. The directory structure is implied. No empty directories can thus be specified.

We need to decide if symlinks are permitted or not. I think the answer is no, but I'd like it to be yes if we can figure out a simple way of putting them in the MFFilePath structure.

Directories are not listed in the manifest, only regular files. The directory structure is implied. No empty directories can thus be specified. We need to decide if symlinks are permitted or not. I think the answer is no, but I'd like it to be yes if we can figure out a simple way of putting them in the `MFFilePath` structure.

understood, ideally yes but for first iteration I suppose simpler is better. What do you think @sneak ?

understood, ideally yes but for first iteration I suppose simpler is better. What do you think @sneak ?
Author
Owner

Yes, I agree - simpler is better.

Yes, I agree - simpler is better.
Sign in to join this conversation.
No Label
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sneak/mfer#1
No description provided.