From 6ba32f5b3548f2a7bcdf395b51dcc3401c376e8e Mon Sep 17 00:00:00 2001 From: clawbot Date: Tue, 17 Mar 2026 05:07:43 +0100 Subject: [PATCH] Add REPO_POLICIES.md, rename CLAUDE.md to AGENTS.md, deduplicate (#51) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes #48 ## Changes - **Added `REPO_POLICIES.md`** — copied from the standard template at [sneak/prompts](https://git.eeqj.de/sneak/prompts/src/branch/main/prompts/REPO_POLICIES.md) (last_modified: 2026-03-10). This is the authoritative cross-project policy document covering repository structure, tooling, Docker, formatting, testing, and workflow standards. - **Renamed `CLAUDE.md` → `AGENTS.md`** — deduplicated content: - Rules already covered by `REPO_POLICIES.md` (e.g. `git add -A`, Makefile targets) are no longer repeated - `AGENTS.md` retains only agent-specific workflow instructions: test-first bug fixing, no AI attribution in commits, per-change make fmt/test/lint workflow, and repo-specific notes (proto files, FORMAT.md, TODO.md) - **Updated `README.md`** — added a reference to `REPO_POLICIES.md` in the Participation section - **Formatting** — `make fmt` (prettier) applied to all markdown files ## Verification `docker build .` passes clean — lint, fmt-check, and all tests green. Co-authored-by: clawbot Reviewed-on: https://git.eeqj.de/sneak/mfer/pulls/51 Co-authored-by: clawbot Co-committed-by: clawbot --- AGENTS.md | 29 ++++++ CLAUDE.md | 20 ---- FORMAT.md | 51 +++++----- README.md | 33 +++--- REPO_POLICIES.md | 255 +++++++++++++++++++++++++++++++++++++++++++++++ TODO.md | 30 +++--- 6 files changed, 342 insertions(+), 76 deletions(-) create mode 100644 AGENTS.md delete mode 100644 CLAUDE.md create mode 100644 REPO_POLICIES.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..89f7c2c --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,29 @@ +# Agent Instructions + +Read `REPO_POLICIES.md` before making any changes. It is the authoritative +source for coding standards, formatting, linting, and workflow rules. + +## Workflow + +- When fixing a bug, write a failing test FIRST. Only after the test fails, + write the code to fix the bug. Then ensure the test passes. Leave the test in + place and commit it with the bugfix. Don't run shell commands to test bugfixes + or reproduce bugs. Write tests! + +- After each change, run `make fmt`, then `make test`, then `make lint`. Fix any + failures before committing. + +- After each change, commit only the files you've changed. Push after committing. + +## Attribution + +- Never mention Claude, Anthropic, or any AI/LLM tooling in commit messages. Do + not use attribution. + +## Repository-Specific Notes + +- This is a Go library + CLI tool for generating `.mf` manifest files. +- The proto definition is in `mfer/mf.proto`; generated `.pb.go` files are + committed (required for `go get` compatibility). +- The format specification is in `FORMAT.md`. +- See `TODO.md` for the 1.0 implementation plan and open design questions. diff --git a/CLAUDE.md b/CLAUDE.md deleted file mode 100644 index 913107d..0000000 --- a/CLAUDE.md +++ /dev/null @@ -1,20 +0,0 @@ -# Important Rules - -- when fixing a bug, write a failing test FIRST. only after the test fails, write - the code to fix the bug. then ensure the test passes. leave the test in - place and commit it with the bugfix. don't run shell commands to test - bugfixes or reproduce bugs. write tests! - -- never, ever mention claude or anthropic in commit messages. do not use attribution - -- after each change, run "make fmt". - -- after each change, run "make test" and ensure all tests pass. - -- after each change, run "make lint" and ensure no linting errors. fix any - you find, one by one. - -- after each change, commit the files you've changed. push after - committing. - -- NEVER use `git add -A`. always add only individual files that you've changed. diff --git a/FORMAT.md b/FORMAT.md index e09dfb8..ec661cb 100644 --- a/FORMAT.md +++ b/FORMAT.md @@ -25,17 +25,17 @@ See [`mfer/mf.proto`](mfer/mf.proto) for exact field numbers and types. The outer message contains: -| Field | Number | Type | Description | -|--------------------|--------|-------------------|--------------------------------------------------| -| `version` | 101 | enum | Must be `VERSION_ONE` (1) | -| `compressionType` | 102 | enum | Compression of `innerMessage`; must be `COMPRESSION_ZSTD` (1) | -| `size` | 103 | int64 | Uncompressed size of `innerMessage` (corruption detection) | -| `sha256` | 104 | bytes | SHA-256 hash of the **compressed** `innerMessage` (corruption detection) | -| `uuid` | 105 | bytes | Random v4 UUID; must match the inner message UUID | -| `innerMessage` | 199 | bytes | Zstd-compressed serialized `MFFile` message | -| `signature` | 201 | bytes (optional) | GPG signature (ASCII-armored or binary) | -| `signer` | 202 | bytes (optional) | Full GPG key ID of the signer | -| `signingPubKey` | 203 | bytes (optional) | Full GPG signing public key | +| Field | Number | Type | Description | +| ----------------- | ------ | ---------------- | ------------------------------------------------------------------------ | +| `version` | 101 | enum | Must be `VERSION_ONE` (1) | +| `compressionType` | 102 | enum | Compression of `innerMessage`; must be `COMPRESSION_ZSTD` (1) | +| `size` | 103 | int64 | Uncompressed size of `innerMessage` (corruption detection) | +| `sha256` | 104 | bytes | SHA-256 hash of the **compressed** `innerMessage` (corruption detection) | +| `uuid` | 105 | bytes | Random v4 UUID; must match the inner message UUID | +| `innerMessage` | 199 | bytes | Zstd-compressed serialized `MFFile` message | +| `signature` | 201 | bytes (optional) | GPG signature (ASCII-armored or binary) | +| `signer` | 202 | bytes (optional) | Full GPG key ID of the signer | +| `signingPubKey` | 203 | bytes (optional) | Full GPG signing public key | ### SHA-256 Hash @@ -54,25 +54,25 @@ decompression bombs. The reference implementation limits decompressed size to After decompressing `innerMessage`, the result is a serialized `MFFile` (referred to as the manifest): -| Field | Number | Type | Description | -|-------------|--------|-----------------------|--------------------------------------------| -| `version` | 100 | enum | Must be `VERSION_ONE` (1) | -| `files` | 101 | repeated `MFFilePath` | List of files in the manifest | -| `uuid` | 102 | bytes | Random v4 UUID; must match outer UUID | -| `createdAt` | 201 | Timestamp (optional) | When the manifest was created | +| Field | Number | Type | Description | +| ----------- | ------ | --------------------- | ------------------------------------- | +| `version` | 100 | enum | Must be `VERSION_ONE` (1) | +| `files` | 101 | repeated `MFFilePath` | List of files in the manifest | +| `uuid` | 102 | bytes | Random v4 UUID; must match outer UUID | +| `createdAt` | 201 | Timestamp (optional) | When the manifest was created | ## File Entries (`MFFilePath`) Each file entry contains: -| Field | Number | Type | Description | -|------------|--------|---------------------------|--------------------------------------| -| `path` | 1 | string | Relative file path (see Path Rules) | -| `size` | 2 | int64 | File size in bytes | -| `hashes` | 3 | repeated `MFFileChecksum` | At least one hash required | -| `mimeType` | 301 | string (optional) | MIME type | -| `mtime` | 302 | Timestamp (optional) | Modification time | -| `ctime` | 303 | Timestamp (optional) | Change time (inode metadata change) | +| Field | Number | Type | Description | +| ---------- | ------ | ------------------------- | ----------------------------------- | +| `path` | 1 | string | Relative file path (see Path Rules) | +| `size` | 2 | int64 | File size in bytes | +| `hashes` | 3 | repeated `MFFileChecksum` | At least one hash required | +| `mimeType` | 301 | string (optional) | MIME type | +| `mtime` | 302 | Timestamp (optional) | Modification time | +| `ctime` | 303 | Timestamp (optional) | Change time (inode metadata change) | Field 304 (`atime`) has been removed from the specification. Access time is volatile and non-deterministic; it is not useful for integrity verification. @@ -111,6 +111,7 @@ ZNAVSRFG-- ``` Where: + - `ZNAVSRFG` is the magic bytes string (literal ASCII) - `` is the hex-encoded UUID from the outer message - `` is the hex-encoded SHA-256 hash from the outer message (covering compressed data) diff --git a/README.md b/README.md index 40963c6..102a68f 100644 --- a/README.md +++ b/README.md @@ -3,25 +3,25 @@ [mfer](https://git.eeqj.de/sneak/mfer) is a reference implementation library and thin wrapper command-line utility written in [Go](https://golang.org) and first published in 2022 under the [WTFPL](https://wtfpl.net) (public -domain) license. It specifies and generates `.mf` manifest files over a +domain) license. It specifies and generates `.mf` manifest files over a directory tree of files to encapsulate metadata about them (such as cryptographic checksums or signatures over same) to aid in archiving, -downloading, and streaming, or mirroring. The manifest files' data is +downloading, and streaming, or mirroring. The manifest files' data is serialized with Google's [protobuf serialization -format](https://developers.google.com/protocol-buffers). The structure of +format](https://developers.google.com/protocol-buffers). The structure of these files can be found [in the format specification](https://git.eeqj.de/sneak/mfer/src/branch/main/mfer/mf.proto) which is included in the [project repository](https://git.eeqj.de/sneak/mfer). The current version is pre-1.0 and while the repo was published in 2022, -there has not yet been any versioned release. [SemVer](https://semver.org) +there has not yet been any versioned release. [SemVer](https://semver.org) will be used for releases. This project was started by [@sneak](https://sneak.berlin) to scratch an itch in 2022 and is currently a one-person effort, though the goal is for this to emerge as a de-facto standard and be incorporated into other -software. A compatible javascript library is planned. +software. A compatible javascript library is planned. # Build Status @@ -30,18 +30,20 @@ software. A compatible javascript library is planned. # Participation The community is as yet nonexistent so there are no defined policies or -norms yet. Primary development happens on a privately-run Gitea instance at +norms yet. Primary development happens on a privately-run Gitea instance at [https://git.eeqj.de/sneak/mfer](https://git.eeqj.de/sneak/mfer) and issues are [tracked there](https://git.eeqj.de/sneak/mfer/issues). Changes must always be formatted with a standard `go fmt`, syntactically valid, and must pass the linting defined in the repository (presently only -the `golangci-lint` defaults), which can be run with a `make lint`. The +the `golangci-lint` defaults), which can be run with a `make lint`. The `main` branch is protected and all changes must be made via [pull requests](https://git.eeqj.de/sneak/mfer/pulls) and pass CI to be merged. Any changes submitted to this project must also be [WTFPL-licensed](https://wtfpl.net) to be considered. +See [`REPO_POLICIES.md`](REPO_POLICIES.md) for detailed coding standards, +tooling requirements, and workflow conventions. # Problem Statement @@ -123,7 +125,6 @@ The manifest file would do several important things: # Open Questions - Should the manifest file include checksums of individual file chunks, or just for the whole assembled file? - - If so, should the chunksize be fixed or dynamic? - Should the manifest signature format be GnuPG signatures, or those from @@ -211,20 +212,20 @@ desired username for an account on this Gitea instance. ## Prior Art: Metalink -* [Metalink - Mozilla Wiki](https://wiki.mozilla.org/Metalink) -* [Metalink - Wikipedia](https://en.wikipedia.org/wiki/Metalink) -* [RFC 5854 - The Metalink Download Description Format](https://datatracker.ietf.org/doc/html/rfc5854) -* [RFC 6249 - Metalink/HTTP: Mirrors and Hashes](https://www.rfc-editor.org/rfc/rfc6249.html) +- [Metalink - Mozilla Wiki](https://wiki.mozilla.org/Metalink) +- [Metalink - Wikipedia](https://en.wikipedia.org/wiki/Metalink) +- [RFC 5854 - The Metalink Download Description Format](https://datatracker.ietf.org/doc/html/rfc5854) +- [RFC 6249 - Metalink/HTTP: Mirrors and Hashes](https://www.rfc-editor.org/rfc/rfc6249.html) ## Links -* Repo: [https://git.eeqj.de/sneak/mfer](https://git.eeqj.de/sneak/mfer) -* Issues: [https://git.eeqj.de/sneak/mfer/issues](https://git.eeqj.de/sneak/mfer/issues) +- Repo: [https://git.eeqj.de/sneak/mfer](https://git.eeqj.de/sneak/mfer) +- Issues: [https://git.eeqj.de/sneak/mfer/issues](https://git.eeqj.de/sneak/mfer/issues) # Authors -* [@sneak <sneak@sneak.berlin>](mailto:sneak@sneak.berlin) +- [@sneak <sneak@sneak.berlin>](mailto:sneak@sneak.berlin) # License -* [WTFPL](https://wtfpl.net) +- [WTFPL](https://wtfpl.net) diff --git a/REPO_POLICIES.md b/REPO_POLICIES.md new file mode 100644 index 0000000..9f8e5dd --- /dev/null +++ b/REPO_POLICIES.md @@ -0,0 +1,255 @@ +--- +title: Repository Policies +last_modified: 2026-03-10 +--- + +This document covers repository structure, tooling, and workflow standards. Code +style conventions are in separate documents: + +- [Code Styleguide](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE.md) + (general, bash, Docker) +- [Go](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE_GO.md) +- [JavaScript](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE_JS.md) +- [Python](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE_PYTHON.md) +- [Go HTTP Server Conventions](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/GO_HTTP_SERVER_CONVENTIONS.md) + +--- + +- Cross-project documentation (such as this file) must include + `last_modified: YYYY-MM-DD` in the YAML front matter so it can be kept in sync + with the authoritative source as policies evolve. + +- **ALL external references must be pinned by cryptographic hash.** This + includes Docker base images, Go modules, npm packages, GitHub Actions, and + anything else fetched from a remote source. Version tags (`@v4`, `@latest`, + `:3.21`, etc.) are server-mutable and therefore remote code execution + vulnerabilities. The ONLY acceptable way to reference an external dependency + is by its content hash (Docker `@sha256:...`, Go module hash in `go.sum`, npm + integrity hash in lockfile, GitHub Actions `@`). No exceptions. + This also means never `curl | bash` to install tools like pyenv, nvm, rustup, + etc. Instead, download a specific release archive from GitHub, verify its hash + (hardcoded in the Dockerfile or script), and only then install. Unverified + install scripts are arbitrary remote code execution. This is the single most + important rule in this document. Double-check every external reference in + every file before committing. There are zero exceptions to this rule. + +- Every repo with software must have a root `Makefile` with these targets: + `make test`, `make lint`, `make fmt` (writes), `make fmt-check` (read-only), + `make check` (prereqs: `test`, `lint`, `fmt-check`), `make docker`, and + `make hooks` (installs pre-commit hook). A model Makefile is at + `https://git.eeqj.de/sneak/prompts/raw/branch/main/Makefile`. + +- Always use Makefile targets (`make fmt`, `make test`, `make lint`, etc.) + instead of invoking the underlying tools directly. The Makefile is the single + source of truth for how these operations are run. + +- The Makefile is authoritative documentation for how the repo is used. Beyond + the required targets above, it should have targets for every common operation: + running a local development server (`make run`, `make dev`), re-initializing + or migrating the database (`make db-reset`, `make migrate`), building + artifacts (`make build`), generating code, seeding data, or anything else a + developer would do regularly. If someone checks out the repo and types + `make`, they should see every meaningful operation available. A new + contributor should be able to understand the entire development workflow by + reading the Makefile. + +- Every repo should have a `Dockerfile`. All Dockerfiles must run `make check` + as a build step so the build fails if the branch is not green. For non-server + repos, the Dockerfile should bring up a development environment and run + `make check`. For server repos, `make check` should run as an early build + stage before the final image is assembled. + +- Every repo should have a Gitea Actions workflow (`.gitea/workflows/`) that + runs `docker build .` on push. Since the Dockerfile already runs `make check`, + a successful build implies all checks pass. + +- Use platform-standard formatters: `black` for Python, `prettier` for + JS/CSS/Markdown/HTML, `go fmt` for Go. Always use default configuration with + two exceptions: four-space indents (except Go), and `proseWrap: always` for + Markdown (hard-wrap at 80 columns). Documentation and writing repos (Markdown, + HTML, CSS) should also have `.prettierrc` and `.prettierignore`. + +- Pre-commit hook: `make check` if local testing is possible, otherwise + `make lint && make fmt-check`. The Makefile should provide a `make hooks` + target to install the pre-commit hook. + +- All repos with software must have tests that run via the platform-standard + test framework (`go test`, `pytest`, `jest`/`vitest`, etc.). If no meaningful + tests exist yet, add the most minimal test possible — e.g. importing the + module under test to verify it compiles/parses. There is no excuse for + `make test` to be a no-op. + +- `make test` must complete in under 20 seconds. Add a 30-second timeout in the + Makefile. + +- Docker builds must complete in under 5 minutes. + +- `make check` must not modify any files in the repo. Tests may use temporary + directories. + +- `main` must always pass `make check`, no exceptions. + +- Never commit secrets. `.env` files, credentials, API keys, and private keys + must be in `.gitignore`. No exceptions. + +- `.gitignore` should be comprehensive from the start: OS files (`.DS_Store`), + editor files (`.swp`, `*~`), language build artifacts, and `node_modules/`. + Fetch the standard `.gitignore` from + `https://git.eeqj.de/sneak/prompts/raw/branch/main/.gitignore` when setting up + a new repo. + +- **No build artifacts in version control.** Code-derived data (compiled + bundles, minified output, generated assets) must never be committed to the + repository if it can be avoided. The build process (e.g. Dockerfile, Makefile) + should generate these at build time. Notable exception: Go protobuf generated + files (`.pb.go`) ARE committed because repos need to work with `go get`, which + downloads code but does not execute code generation. + +- Never use `git add -A` or `git add .`. Always stage files explicitly by name. + +- Never force-push to `main`. + +- Make all changes on a feature branch. You can do whatever you want on a + feature branch. + +- `.golangci.yml` is standardized and must _NEVER_ be modified by an agent, only + manually by the user. Fetch from + `https://git.eeqj.de/sneak/prompts/raw/branch/main/.golangci.yml`. + +- When pinning images or packages by hash, add a comment above the reference + with the version and date (YYYY-MM-DD). + +- Use `yarn`, not `npm`. + +- Write all dates as YYYY-MM-DD (ISO 8601). + +- Simple projects should be configured with environment variables. + +- Dockerized web services listen on port 8080 by default, overridable with + `PORT`. + +- **HTTP/web services must be hardened for production internet exposure before + tagging 1.0.** This means full compliance with security best practices + including, without limitation, all of the following: + - **Security headers** on every response: + - `Strict-Transport-Security` (HSTS) with `max-age` of at least one year + and `includeSubDomains`. + - `Content-Security-Policy` (CSP) with a restrictive default policy + (`default-src 'self'` as a baseline, tightened per-resource as + needed). Never use `unsafe-inline` or `unsafe-eval` unless + unavoidable, and document the reason. + - `X-Frame-Options: DENY` (or `SAMEORIGIN` if framing is required). + Prefer the `frame-ancestors` CSP directive as the primary control. + - `X-Content-Type-Options: nosniff`. + - `Referrer-Policy: strict-origin-when-cross-origin` (or stricter). + - `Permissions-Policy` restricting access to browser features the + application does not use (camera, microphone, geolocation, etc.). + - **Request and response limits:** + - Maximum request body size enforced on all endpoints (e.g. Go + `http.MaxBytesReader`). Choose a sane default per-route; never accept + unbounded input. + - Maximum response body size where applicable (e.g. paginated APIs). + - `ReadTimeout` and `ReadHeaderTimeout` on the `http.Server` to defend + against slowloris attacks. + - `WriteTimeout` on the `http.Server`. + - `IdleTimeout` on the `http.Server`. + - Per-handler execution time limits via `context.WithTimeout` or + chi/stdlib `middleware.Timeout`. + - **Authentication and session security:** + - Rate limiting on password-based authentication endpoints. API keys are + high-entropy and not susceptible to brute force, so they are exempt. + - CSRF tokens on all state-mutating HTML forms. API endpoints + authenticated via `Authorization` header (Bearer token, API key) are + exempt because the browser does not attach these automatically. + - Passwords stored using bcrypt, scrypt, or argon2 — never plain-text, + MD5, or SHA. + - Session cookies set with `HttpOnly`, `Secure`, and `SameSite=Lax` (or + `Strict`) attributes. + - **Reverse proxy awareness:** + - True client IP detection when behind a reverse proxy + (`X-Forwarded-For`, `X-Real-IP`). The application must accept + forwarded headers only from a configured set of trusted proxy + addresses — never trust `X-Forwarded-For` unconditionally. + - **CORS:** + - Authenticated endpoints must restrict `Access-Control-Allow-Origin` to + an explicit allowlist of known origins. Wildcard (`*`) is acceptable + only for public, unauthenticated read-only APIs. + - **Error handling:** + - Internal errors must never leak stack traces, SQL queries, file paths, + or other implementation details to the client. Return generic error + messages in production; detailed errors only when `DEBUG` is enabled. + - **TLS:** + - Services never terminate TLS directly. They are always deployed behind + a TLS-terminating reverse proxy. The service itself listens on plain + HTTP. However, HSTS headers and `Secure` cookie flags must still be + set by the application so that the browser enforces HTTPS end-to-end. + + This list is non-exhaustive. Apply defense-in-depth: if a standard security + hardening measure exists for HTTP services and is not listed here, it is + still expected. When in doubt, harden. + +- `README.md` is the primary documentation. Required sections: + - **Description**: First line must include the project name, purpose, + category (web server, SPA, CLI tool, etc.), license, and author. Example: + "µPaaS is an MIT-licensed Go web application by @sneak that receives + git-frontend webhooks and deploys applications via Docker in realtime." + - **Getting Started**: Copy-pasteable install/usage code block. + - **Rationale**: Why does this exist? + - **Design**: How is the program structured? + - **TODO**: Update meticulously, even between commits. When planning, put + the todo list in the README so a new agent can pick up where the last one + left off. + - **License**: MIT, GPL, or WTFPL. Ask the user for new projects. Include a + `LICENSE` file in the repo root and a License section in the README. + - **Author**: [@sneak](https://sneak.berlin). + +- First commit of a new repo should contain only `README.md`. + +- Go module root: `sneak.berlin/go/`. Always run `go mod tidy` before + committing. + +- Use SemVer. + +- Database migrations live in `internal/db/migrations/` and must be embedded in + the binary. + - `000_migration.sql` — contains ONLY the creation of the migrations + tracking table itself. Nothing else. + - `001_schema.sql` — the full application schema. + - **Pre-1.0.0:** never add additional migration files (002, 003, etc.). + There is no installed base to migrate. Edit `001_schema.sql` directly. + - **Post-1.0.0:** add new numbered migration files for each schema change. + Never edit existing migrations after release. + +- All repos should have an `.editorconfig` enforcing the project's indentation + settings. + +- Avoid putting files in the repo root unless necessary. Root should contain + only project-level config files (`README.md`, `Makefile`, `Dockerfile`, + `LICENSE`, `.gitignore`, `.editorconfig`, `REPO_POLICIES.md`, and + language-specific config). Everything else goes in a subdirectory. Canonical + subdirectory names: + - `bin/` — executable scripts and tools + - `cmd/` — Go command entrypoints + - `configs/` — configuration templates and examples + - `deploy/` — deployment manifests (k8s, compose, terraform) + - `docs/` — documentation and markdown (README.md stays in root) + - `internal/` — Go internal packages + - `internal/db/migrations/` — database migrations + - `pkg/` — Go library packages + - `share/` — systemd units, data files + - `static/` — static assets (images, fonts, etc.) + - `web/` — web frontend source + +- When setting up a new repo, files from the `prompts` repo may be used as + templates. Fetch them from + `https://git.eeqj.de/sneak/prompts/raw/branch/main/`. + +- New repos must contain at minimum: + - `README.md`, `.git`, `.gitignore`, `.editorconfig` + - `LICENSE`, `REPO_POLICIES.md` (copy from the `prompts` repo) + - `Makefile` + - `Dockerfile`, `.dockerignore` + - `.gitea/workflows/check.yml` + - Go: `go.mod`, `go.sum`, `.golangci.yml` + - JS: `package.json`, `yarn.lock`, `.prettierrc`, `.prettierignore` + - Python: `pyproject.toml` diff --git a/TODO.md b/TODO.md index 6c4cd3e..266b231 100644 --- a/TODO.md +++ b/TODO.md @@ -2,83 +2,83 @@ ## Design Questions -*sneak: please answer inline below each question. These are preserved for posterity.* +_sneak: please answer inline below each question. These are preserved for posterity._ ### Format Design **1. Should `MFFileChecksum` be simplified?** Currently it's a separate message wrapping a single `bytes multiHash` field. Since multihash already self-describes the algorithm, `repeated bytes hashes` directly on `MFFilePath` would be simpler and reduce per-file protobuf overhead. Is the extra message layer intentional (e.g. planning to add per-hash metadata like `verified_at`)? -> *answer:* +> _answer:_ **2. Should file permissions/mode be stored?** The format stores mtime/ctime but not Unix file permissions. For archival use (ExFAT, filesystem-independent checksums) this may not matter, but for software distribution or filesystem restoration it's a gap. Should we reserve a field now (e.g. `optional uint32 mode = 305`) even if we don't populate it yet? -> *answer:* +> _answer:_ **3. Should `atime` be removed from the schema?** Access time is volatile, non-deterministic, and often disabled (`noatime`). Including it means two manifests of the same directory at different times will differ, which conflicts with the determinism goal. Remove it, or document it as "never set by default"? -> *answer:* +> _answer:_ **4. What are the path normalization rules?** The proto has `string path` with no specification about: always forward-slash? Must be relative? No `..` components allowed? UTF-8 NFC vs NFD normalization (macOS vs Linux)? Max path length? This is a security issue (path traversal) and a cross-platform compatibility issue. What rules should the spec mandate? -> *answer:* +> _answer:_ **5. Should we add a version byte after the magic?** Currently `ZNAVSRFG` is followed immediately by protobuf. Adding a version byte (`ZNAVSRFG\x01`) would allow future framing changes without requiring protobuf parsing to detect the version. `MFFileOuter.Version` serves this purpose but requires successful deserialization to read. Worth the extra byte? -> *answer:* +> _answer:_ **6. Should we add a length-prefix after the magic?** Protobuf is not self-delimiting. If we ever want to concatenate manifests or append data after the protobuf, the current framing is insufficient. Add a varint or fixed-width length-prefix? -> *answer:* +> _answer:_ ### Signature Design **7. What does the outer SHA-256 hash cover — compressed or uncompressed data?** The review notes it currently hashes compressed data (good for verifying before decompression), but this should be explicitly documented. Which is the intended behavior? -> *answer:* +> _answer:_ **8. Should `signatureString()` sign raw bytes instead of a hex-encoded string?** Currently the canonical string is `MAGIC-UUID-MULTIHASH` with hex encoding, which adds a transformation layer. Signing the raw `sha256` bytes (or compressed `innerMessage` directly) would be simpler. Keep the string format or switch to raw bytes? -> *answer:* +> _answer:_ **9. Should we support detached signature files (`.mf.sig`)?** Embedded signatures are better for single-file distribution. Detached `.mf.sig` files follow the familiar `SHASUMS`/`SHASUMS.asc` pattern and are simpler for HTTP serving. Support both modes? -> *answer:* +> _answer:_ **10. GPG vs pure-Go crypto for signatures?** Shelling out to `gpg` is fragile (may not be installed, version-dependent output). `github.com/ProtonMail/go-crypto` provides pure-Go OpenPGP, or we could go Ed25519/signify (simpler, no key management). Which direction? -> *answer:* +> _answer:_ ### Implementation Design **11. Should manifests be deterministic by default?** This means: sort file entries by path, omit `createdAt` timestamp (or make it opt-in), no `atime`. Should determinism be the default, with a `--include-timestamps` flag to opt in? -> *answer:* +> _answer:_ **12. Should we consolidate or keep both scanner/checker implementations?** There are two parallel implementations: `mfer/scanner.go` + `mfer/checker.go` (typed with `FileSize`, `RelFilePath`) and `internal/scanner/` + `internal/checker/` (raw `int64`, `string`). The `mfer/` versions are superior. Delete the `internal/` versions? -> *answer:* +> _answer:_ **13. Should the `manifest` type be exported?** Currently unexported with exported constructors (`New`, `NewFromPaths`, etc.). Consumers can't declare `var m *mfer.manifest`. Export the type, or define an interface? -> *answer:* +> _answer:_ **14. What should the Go module path be for 1.0?** Currently mixed between `sneak.berlin/go/mfer` and `git.eeqj.de/sneak/mfer`. Which is canonical? -> *answer:* +> _answer:_ ---