chore: restructure README with required policy sections

This commit is contained in:
Jeffrey Paul 2026-02-25 19:47:34 +07:00
parent d0fe5e7334
commit 73f1073d61

376
README.md
View File

@ -1,324 +1,138 @@
# pixa caching image reverse proxy server # pixa
This is a web service written in go that is designed to proxy images from pixa is a GPL-3.0-licensed Go web server by
source URLs, optionally resizing or transforming them, and serving the [@sneak](https://sneak.berlin) that proxies images from upstream
results. Both the source images as well as the transformed images are sources, optionally resizing or transforming them, and serves the
cached. The images served to the client are cached a configurable interval results. Both source and transformed images are cached to disk so that
so that subsequent requests to the same path on the pixa server are served subsequent requests are served without origin fetches or additional
from disk without origin server requests or additional processing. processing.
# storage ## Getting Started
* unaltered source file straight from upstream: ```bash
* `<statedir>/cache/src-content/<ab>/<cd>/<abcdef0123... sha256 of source content>` # clone and build
* source path metadata git clone https://git.eeqj.de/sneak/pixa.git
* `<statedir>/cache/src-metadata/<hostname>/<sha256 of path component>.json` cd pixa
* fetch time make build
* all original resp headers
* original request
* sha256 hash
Note that multiple source paths may reference the same content blob. We # run with a config file
won't do refcounting here, we'll use the state database for that. ./bin/pixad --config config.example.yml
* database: # or build and run via Docker
* `<statedir>/state.sqlite3` make docker
docker run -p 8080:8080 pixad:latest
```
* output documents: ## Rationale
* `<statedir>/cache/dst-content/<ab>/<cd>/<abcd... sha256 of output content>`
While the database is the long-term authority on what we have in the output Image-heavy web applications need a fast, caching reverse proxy that
cache, we must aggressively cache in-process the mapping between requests can resize and transcode images on the fly. pixa fills that role as a
and output content hashes so as to serve as a maximally efficient caching single, self-contained binary with no external runtime dependencies
proxy for extremely popular/hot request paths. The goal is the ability to beyond libvips. It supports HMAC-SHA256 signed URLs with expiration to
easily support 1-5k r/s. prevent abuse, and whitelisted source hosts for open access.
# Routes ## Design
/img/<size>/<orig host>/<orig path>?signature=<sig>&format=<format> ### Storage
Images are only fetched from origins using TLS. Origin certificates must be - **Source content**:
valid at time of fetch. `<statedir>/cache/src-content/<ab>/<cd>/<sha256 of source content>`
- **Source metadata**:
`<statedir>/cache/src-metadata/<hostname>/<sha256 of path>.json`
(fetch time, original headers, request, content hash)
- **Database**: `<statedir>/state.sqlite3` (SQLite)
- **Output documents**:
`<statedir>/cache/dst-content/<ab>/<cd>/<sha256 of output content>`
<format> is one of 'orig', 'png', 'jpeg', 'webp' Multiple source paths may reference the same content blob; the
database tracks references rather than using filesystem refcounting.
In-process caching of request-to-output mappings targets 1-5k r/s.
<size> is one of 'orig' or '<x resolution>x<y resolution>' ### Routes
# Source Hosts ```
/v1/image/<host>/<path>/<size>.<format>?sig=<signature>&exp=<expiration>
```
Source hosts may be whitelisted in the pixa configuration. If not in the Images are only fetched from origins using TLS with valid certificates.
explicit whitelist, a signature using a shared secret must be appended.
## Signature Specification - `<format>`: one of `orig`, `png`, `jpeg`, `webp`
- `<size>`: `orig` or `<width>x<height>` (e.g. `800x600`)
Signatures use HMAC-SHA256 and include an expiration timestamp to prevent replay attacks. ### Source Hosts
### Signed Data Format Source hosts may be whitelisted in the configuration. Non-whitelisted
hosts require an HMAC-SHA256 signature.
The signature is computed over a colon-separated string: #### Signature Specification
Signatures use HMAC-SHA256 and include an expiration timestamp to
prevent replay attacks.
**Signed data format** (colon-separated):
``` ```
HMAC-SHA256(secret, "host:path:query:width:height:format:expiration") HMAC-SHA256(secret, "host:path:query:width:height:format:expiration")
``` ```
Where: Where:
- `host` - Source origin hostname (e.g., `cdn.example.com`)
- `path` - Source path (e.g., `/photos/cat.jpg`)
- `query` - Source query string, empty string if none
- `width` - Requested width in pixels, `0` for original
- `height` - Requested height in pixels, `0` for original
- `format` - Output format (jpeg, png, webp, avif, gif, orig)
- `expiration` - Unix timestamp when signature expires
### URL Format with Signature - `host` — source origin hostname (e.g. `cdn.example.com`)
- `path` — source path (e.g. `/photos/cat.jpg`)
- `query` — source query string, empty string if none
- `width` — requested width in pixels, `0` for original
- `height` — requested height in pixels, `0` for original
- `format` — output format (jpeg, png, webp, avif, gif, orig)
- `expiration` — Unix timestamp when signature expires
``` **Example:** resize
/v1/image/<host>/<path>/<size>.<format>?sig=<signature>&exp=<expiration> `https://cdn.example.com/photos/cat.jpg` to 800x600 WebP with
``` expiration 1704067200:
### Example
For a request to resize `https://cdn.example.com/photos/cat.jpg` to 800x600 WebP
with expiration at Unix timestamp 1704067200:
1. Build the signature input:
```
cdn.example.com:/photos/cat.jpg::800:600:webp:1704067200
```
1. Build input:
`cdn.example.com:/photos/cat.jpg::800:600:webp:1704067200`
2. Compute HMAC-SHA256 with your secret key 2. Compute HMAC-SHA256 with your secret key
3. Base64URL-encode the result 3. Base64URL-encode the result
4. URL:
`/v1/image/cdn.example.com/photos/cat.jpg/800x600.webp?sig=<base64url>&exp=1704067200`
4. Final URL: **Whitelist patterns:**
```
/v1/image/cdn.example.com/photos/cat.jpg/800x600.webp?sig=<base64url>&exp=1704067200
```
### Whitelist Patterns - **Exact match**: `cdn.example.com` — matches only that host
- **Suffix match**: `.example.com` — matches `cdn.example.com`,
`images.example.com`, and `example.com`
The whitelist supports two pattern types: ### Configuration
- **Exact match**: `cdn.example.com` - matches only that host
- **Suffix match**: `.example.com` - matches `cdn.example.com`, `images.example.com`, and `example.com`
# configuration Configured via YAML file (`--config`). Key settings:
* access-control-allow-origin config - `access_control_allow_origin` — CORS origin
* source host whitelist - `source_host_whitelist` — list of allowed upstream hosts
* upstream fetch timeout - `upstream_fetch_timeout` — timeout for origin requests
* upstream max response size - `upstream_max_response_size` — max origin response size
* downstream timeout - `downstream_timeout` — client response timeout
* downstream max request size - `signing_key` — HMAC secret for URL signatures
* downstream max response size
* internal processing timeout
* referer blacklist
# Design Review & Recommendations See `config.example.yml` for all options with defaults.
## Security Concerns ### Architecture
### Critical - **Dependency injection**: Uber fx
- **HMAC signature scheme is undefined** - The "FIXME" for signature - **HTTP router**: go-chi
construction is a blocker. Recommend HMAC-SHA256 over the full path: - **Image processing**: govips (CGO wrapper for libvips)
`HMAC-SHA256(secret, "/<size>/<host>/<path>?format=<format>")` - **Database**: SQLite via modernc.org/sqlite
- **No signature expiration** - Signatures should include a timestamp to - **Static assets**: embedded via `//go:embed`
prevent indefinite replay. Add `&expires=<unix_ts>` and include it in the - **Metrics**: Prometheus
HMAC input - **Logging**: stdlib slog
- **Path traversal risk** - Ensure `<orig path>` cannot contain `..`
sequences or be used to access unintended resources on origin
- **SSRF potential** - Even with TLS requirement, internal/private IPs
(10.x, 172.16.x, 192.168.x, 127.x, ::1, link-local) must be blocked to
prevent server-side request forgery
- **Open redirect via Host header** - Validate that requests cannot be
manipulated to cache content under incorrect keys
### Important ## TODO
- **No authentication for cache purge** - If cache invalidation is needed, it requires auth
- **Response header sanitization** - Strip sensitive headers from upstream before forwarding (X-Powered-By, Server, etc.)
- **Content-Type validation** - Verify upstream Content-Type matches expected image types before processing
- **Maximum image dimensions** - Limit output dimensions to prevent resource exhaustion (e.g., max 4096x4096)
## URL Route Improvements See [TODO.md](TODO.md) for the full prioritized task list.
Current: `/img/<size>/<orig host>/<orig path>?signature=<sig>&format=<format>` ## License
### Recommended Scheme GPL-3.0. See [LICENSE](LICENSE).
```
/v1/image/<host>/<path>/<width>x<height>.<format>?sig=<sig>&exp=<expires>
```
The size+format segment (e.g., `800x600.webp`) is appended to the source path and stripped when constructing the upstream request. This pattern is unambiguous (regex: `(\d+x\d+|orig)\.(webp|jpg|jpeg|png|avif)$`) and won't collide with real paths. ## Author
**Size options:** [@sneak](https://sneak.berlin)
- `800x600.<format>` - resize to 800x600
- `0x0.<format>` - original size, format conversion only
- `orig.<format>` - original size, format conversion only (human-friendly alias)
**Benefits:**
- API versioning (`/v1/`) allows breaking changes later
- Human-readable URLs that can be manually constructed for whitelisted domains
- Format as extension is intuitive and CDN-friendly
### Examples
**Basic resize and convert:**
```
/v1/image/cdn.example.com/photos/cat.jpg/800x600.webp?sig=abc123&exp=1704067200
```
Fetches `https://cdn.example.com/photos/cat.jpg`, resizes to 800x600, converts to webp.
**Source URL with query parameters:**
```
/v1/image/cdn.example.com/photos/cat.jpg%3Farg1=val1%26arg2=val2/800x600.webp?sig=abc123&exp=1704067200
```
Fetches `https://cdn.example.com/photos/cat.jpg?arg1=val1&arg2=val2`, resizes to 800x600, converts to webp.
Note: The source query string must be URL-encoded (`?` → `%3F`, `&``%26`) to avoid ambiguity with pixa's own query parameters.
**Original size, format conversion only:**
```
/v1/image/cdn.example.com/photos/cat.jpg/orig.webp?sig=abc123&exp=1704067200
/v1/image/cdn.example.com/photos/cat.jpg/0x0.webp?sig=abc123&exp=1704067200
```
Both fetch the original image and convert to webp without resizing.
## Additional Formats
### Output Formats to Support
- `avif` - Superior compression, growing browser support
- `gif` - For animated image passthrough (with frame limit)
- `svg` - Passthrough only, no resizing (vector)
### Input Format Whitelist (MIME types to accept)
- `image/jpeg`
- `image/png`
- `image/webp`
- `image/gif`
- `image/avif`
- `image/svg+xml` (passthrough or rasterize)
- **Reject all others** - Especially `image/x-*`, `application/*`
### Input Validation
- Verify magic bytes match declared Content-Type
- Maximum input file size (e.g., 50MB)
- Maximum input dimensions (e.g., 16384x16384)
- Reject files with embedded scripts (SVG sanitization)
## Rate Limiting
### Per-IP Limits
- Requests per second (e.g., 10 req/s burst, 100 req/min sustained)
- Concurrent connections (e.g., 50 per IP)
### Global Limits
- Total concurrent upstream fetches (prevent origin overwhelm)
- Per-origin fetch rate limiting (be a good citizen)
- Cache miss rate limiting (prevent cache-busting attacks)
### Response
- Return `429 Too Many Requests` with `Retry-After` header
- Consider `X-RateLimit-*` headers for transparency
## Additional Features for 1.0
### Must Have
- **Health check endpoint** - `/health` or `/healthz` for load balancers
- **Metrics endpoint** - `/metrics` (Prometheus format) for observability
- **Graceful shutdown** - Drain connections on SIGTERM
- **Request ID/tracing** - `X-Request-ID` header propagation
- **Cache-Control headers** - Proper `Cache-Control`, `ETag`, `Last-Modified` on responses
- **Vary header** - `Vary: Accept` if doing content negotiation
### Should Have
- **Auto-format selection** - If `format=auto`, pick best format based on `Accept` header
- **Quality parameter** - `&q=85` for lossy format quality control
- **Fit modes** - `fit=cover|contain|fill|inside|outside` for resize behavior
- **Background color** - For transparent-to-JPEG conversion
- **Blur/sharpen** - Common post-resize operations
- **Watermarking** - Optional overlay support
### Nice to Have
- **Cache warming API** - Pre-populate cache for known images
- **Cache stats API** - Hit/miss rates, storage usage
- **Admin UI** - Simple dashboard for monitoring
## Configuration Additions
```yaml
server:
listen: ":8080"
read_timeout: 30s
write_timeout: 60s
max_header_bytes: 8192
cache:
directory: "/var/cache/pixa"
max_size_gb: 100
ttl: 168h # 7 days
negative_ttl: 5m # Cache 404s briefly
upstream:
timeout: 30s
max_response_size: 52428800 # 50MB
max_concurrent: 100
user_agent: "Pixa/1.0"
processing:
max_input_pixels: 268435456 # 16384x16384
max_output_dimension: 4096
default_quality: 85
strip_metadata: true # Remove EXIF etc.
security:
hmac_secret: "${PIXA_HMAC_SECRET}" # From env
signature_ttl: 3600 # 1 hour
blocked_networks:
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
- "127.0.0.0/8"
- "::1/128"
- "fc00::/7"
rate_limit:
per_ip_rps: 10
per_ip_burst: 50
per_origin_rps: 100
cors:
allowed_origins: ["*"] # Or specific list
allowed_methods: ["GET", "HEAD", "OPTIONS"]
max_age: 86400
```
## Error Handling
### HTTP Status Codes
- `400` - Bad request (invalid parameters, malformed URL)
- `403` - Forbidden (invalid/expired signature, blocked origin)
- `404` - Origin returned 404 (cache negative response briefly)
- `413` - Payload too large (origin image exceeds limits)
- `415` - Unsupported media type (origin returned non-image)
- `422` - Unprocessable (valid image but cannot transform as requested)
- `429` - Rate limited
- `500` - Internal error
- `502` - Bad gateway (origin connection failed)
- `503` - Service unavailable (overloaded)
- `504` - Gateway timeout (origin timeout)
### Error Response Format
```json
{
"error": "invalid_signature",
"message": "Signature has expired",
"request_id": "abc123"
}
```
## Quick Wins
1. **Conditional requests** - Support `If-None-Match` / `If-Modified-Since` to return `304 Not Modified`
2. **HEAD support** - Allow clients to check image metadata without downloading
3. **Canonical URLs** - Redirect non-canonical requests to prevent cache fragmentation
4. **Debug header** - `X-Pixa-Cache: HIT|MISS|STALE` for debugging
5. **Robots.txt** - Serve a robots.txt to prevent search engine crawling of proxy URLs