chore: restructure README with required policy sections
This commit is contained in:
376
README.md
376
README.md
@@ -1,324 +1,138 @@
|
|||||||
# pixa caching image reverse proxy server
|
# pixa
|
||||||
|
|
||||||
This is a web service written in go that is designed to proxy images from
|
pixa is a GPL-3.0-licensed Go web server by
|
||||||
source URLs, optionally resizing or transforming them, and serving the
|
[@sneak](https://sneak.berlin) that proxies images from upstream
|
||||||
results. Both the source images as well as the transformed images are
|
sources, optionally resizing or transforming them, and serves the
|
||||||
cached. The images served to the client are cached a configurable interval
|
results. Both source and transformed images are cached to disk so that
|
||||||
so that subsequent requests to the same path on the pixa server are served
|
subsequent requests are served without origin fetches or additional
|
||||||
from disk without origin server requests or additional processing.
|
processing.
|
||||||
|
|
||||||
# storage
|
## Getting Started
|
||||||
|
|
||||||
* unaltered source file straight from upstream:
|
```bash
|
||||||
* `<statedir>/cache/src-content/<ab>/<cd>/<abcdef0123... sha256 of source content>`
|
# clone and build
|
||||||
* source path metadata
|
git clone https://git.eeqj.de/sneak/pixa.git
|
||||||
* `<statedir>/cache/src-metadata/<hostname>/<sha256 of path component>.json`
|
cd pixa
|
||||||
* fetch time
|
make build
|
||||||
* all original resp headers
|
|
||||||
* original request
|
|
||||||
* sha256 hash
|
|
||||||
|
|
||||||
Note that multiple source paths may reference the same content blob. We
|
# run with a config file
|
||||||
won't do refcounting here, we'll use the state database for that.
|
./bin/pixad --config config.example.yml
|
||||||
|
|
||||||
* database:
|
# or build and run via Docker
|
||||||
* `<statedir>/state.sqlite3`
|
make docker
|
||||||
|
docker run -p 8080:8080 pixad:latest
|
||||||
|
```
|
||||||
|
|
||||||
* output documents:
|
## Rationale
|
||||||
* `<statedir>/cache/dst-content/<ab>/<cd>/<abcd... sha256 of output content>`
|
|
||||||
|
|
||||||
While the database is the long-term authority on what we have in the output
|
Image-heavy web applications need a fast, caching reverse proxy that
|
||||||
cache, we must aggressively cache in-process the mapping between requests
|
can resize and transcode images on the fly. pixa fills that role as a
|
||||||
and output content hashes so as to serve as a maximally efficient caching
|
single, self-contained binary with no external runtime dependencies
|
||||||
proxy for extremely popular/hot request paths. The goal is the ability to
|
beyond libvips. It supports HMAC-SHA256 signed URLs with expiration to
|
||||||
easily support 1-5k r/s.
|
prevent abuse, and whitelisted source hosts for open access.
|
||||||
|
|
||||||
# Routes
|
## Design
|
||||||
|
|
||||||
/img/<size>/<orig host>/<orig path>?signature=<sig>&format=<format>
|
### Storage
|
||||||
|
|
||||||
Images are only fetched from origins using TLS. Origin certificates must be
|
- **Source content**:
|
||||||
valid at time of fetch.
|
`<statedir>/cache/src-content/<ab>/<cd>/<sha256 of source content>`
|
||||||
|
- **Source metadata**:
|
||||||
|
`<statedir>/cache/src-metadata/<hostname>/<sha256 of path>.json`
|
||||||
|
(fetch time, original headers, request, content hash)
|
||||||
|
- **Database**: `<statedir>/state.sqlite3` (SQLite)
|
||||||
|
- **Output documents**:
|
||||||
|
`<statedir>/cache/dst-content/<ab>/<cd>/<sha256 of output content>`
|
||||||
|
|
||||||
<format> is one of 'orig', 'png', 'jpeg', 'webp'
|
Multiple source paths may reference the same content blob; the
|
||||||
|
database tracks references rather than using filesystem refcounting.
|
||||||
|
In-process caching of request-to-output mappings targets 1-5k r/s.
|
||||||
|
|
||||||
<size> is one of 'orig' or '<x resolution>x<y resolution>'
|
### Routes
|
||||||
|
|
||||||
# Source Hosts
|
```
|
||||||
|
/v1/image/<host>/<path>/<size>.<format>?sig=<signature>&exp=<expiration>
|
||||||
|
```
|
||||||
|
|
||||||
Source hosts may be whitelisted in the pixa configuration. If not in the
|
Images are only fetched from origins using TLS with valid certificates.
|
||||||
explicit whitelist, a signature using a shared secret must be appended.
|
|
||||||
|
|
||||||
## Signature Specification
|
- `<format>`: one of `orig`, `png`, `jpeg`, `webp`
|
||||||
|
- `<size>`: `orig` or `<width>x<height>` (e.g. `800x600`)
|
||||||
|
|
||||||
Signatures use HMAC-SHA256 and include an expiration timestamp to prevent replay attacks.
|
### Source Hosts
|
||||||
|
|
||||||
### Signed Data Format
|
Source hosts may be whitelisted in the configuration. Non-whitelisted
|
||||||
|
hosts require an HMAC-SHA256 signature.
|
||||||
|
|
||||||
The signature is computed over a colon-separated string:
|
#### Signature Specification
|
||||||
|
|
||||||
|
Signatures use HMAC-SHA256 and include an expiration timestamp to
|
||||||
|
prevent replay attacks.
|
||||||
|
|
||||||
|
**Signed data format** (colon-separated):
|
||||||
|
|
||||||
```
|
```
|
||||||
HMAC-SHA256(secret, "host:path:query:width:height:format:expiration")
|
HMAC-SHA256(secret, "host:path:query:width:height:format:expiration")
|
||||||
```
|
```
|
||||||
|
|
||||||
Where:
|
Where:
|
||||||
- `host` - Source origin hostname (e.g., `cdn.example.com`)
|
|
||||||
- `path` - Source path (e.g., `/photos/cat.jpg`)
|
|
||||||
- `query` - Source query string, empty string if none
|
|
||||||
- `width` - Requested width in pixels, `0` for original
|
|
||||||
- `height` - Requested height in pixels, `0` for original
|
|
||||||
- `format` - Output format (jpeg, png, webp, avif, gif, orig)
|
|
||||||
- `expiration` - Unix timestamp when signature expires
|
|
||||||
|
|
||||||
### URL Format with Signature
|
- `host` — source origin hostname (e.g. `cdn.example.com`)
|
||||||
|
- `path` — source path (e.g. `/photos/cat.jpg`)
|
||||||
|
- `query` — source query string, empty string if none
|
||||||
|
- `width` — requested width in pixels, `0` for original
|
||||||
|
- `height` — requested height in pixels, `0` for original
|
||||||
|
- `format` — output format (jpeg, png, webp, avif, gif, orig)
|
||||||
|
- `expiration` — Unix timestamp when signature expires
|
||||||
|
|
||||||
```
|
**Example:** resize
|
||||||
/v1/image/<host>/<path>/<size>.<format>?sig=<signature>&exp=<expiration>
|
`https://cdn.example.com/photos/cat.jpg` to 800x600 WebP with
|
||||||
```
|
expiration 1704067200:
|
||||||
|
|
||||||
### Example
|
|
||||||
|
|
||||||
For a request to resize `https://cdn.example.com/photos/cat.jpg` to 800x600 WebP
|
|
||||||
with expiration at Unix timestamp 1704067200:
|
|
||||||
|
|
||||||
1. Build the signature input:
|
|
||||||
```
|
|
||||||
cdn.example.com:/photos/cat.jpg::800:600:webp:1704067200
|
|
||||||
```
|
|
||||||
|
|
||||||
|
1. Build input:
|
||||||
|
`cdn.example.com:/photos/cat.jpg::800:600:webp:1704067200`
|
||||||
2. Compute HMAC-SHA256 with your secret key
|
2. Compute HMAC-SHA256 with your secret key
|
||||||
|
|
||||||
3. Base64URL-encode the result
|
3. Base64URL-encode the result
|
||||||
|
4. URL:
|
||||||
|
`/v1/image/cdn.example.com/photos/cat.jpg/800x600.webp?sig=<base64url>&exp=1704067200`
|
||||||
|
|
||||||
4. Final URL:
|
**Whitelist patterns:**
|
||||||
```
|
|
||||||
/v1/image/cdn.example.com/photos/cat.jpg/800x600.webp?sig=<base64url>&exp=1704067200
|
|
||||||
```
|
|
||||||
|
|
||||||
### Whitelist Patterns
|
- **Exact match**: `cdn.example.com` — matches only that host
|
||||||
|
- **Suffix match**: `.example.com` — matches `cdn.example.com`,
|
||||||
|
`images.example.com`, and `example.com`
|
||||||
|
|
||||||
The whitelist supports two pattern types:
|
### Configuration
|
||||||
- **Exact match**: `cdn.example.com` - matches only that host
|
|
||||||
- **Suffix match**: `.example.com` - matches `cdn.example.com`, `images.example.com`, and `example.com`
|
|
||||||
|
|
||||||
# configuration
|
Configured via YAML file (`--config`). Key settings:
|
||||||
|
|
||||||
* access-control-allow-origin config
|
- `access_control_allow_origin` — CORS origin
|
||||||
* source host whitelist
|
- `source_host_whitelist` — list of allowed upstream hosts
|
||||||
* upstream fetch timeout
|
- `upstream_fetch_timeout` — timeout for origin requests
|
||||||
* upstream max response size
|
- `upstream_max_response_size` — max origin response size
|
||||||
* downstream timeout
|
- `downstream_timeout` — client response timeout
|
||||||
* downstream max request size
|
- `signing_key` — HMAC secret for URL signatures
|
||||||
* downstream max response size
|
|
||||||
* internal processing timeout
|
|
||||||
* referer blacklist
|
|
||||||
|
|
||||||
# Design Review & Recommendations
|
See `config.example.yml` for all options with defaults.
|
||||||
|
|
||||||
## Security Concerns
|
### Architecture
|
||||||
|
|
||||||
### Critical
|
- **Dependency injection**: Uber fx
|
||||||
- **HMAC signature scheme is undefined** - The "FIXME" for signature
|
- **HTTP router**: go-chi
|
||||||
construction is a blocker. Recommend HMAC-SHA256 over the full path:
|
- **Image processing**: govips (CGO wrapper for libvips)
|
||||||
`HMAC-SHA256(secret, "/<size>/<host>/<path>?format=<format>")`
|
- **Database**: SQLite via modernc.org/sqlite
|
||||||
- **No signature expiration** - Signatures should include a timestamp to
|
- **Static assets**: embedded via `//go:embed`
|
||||||
prevent indefinite replay. Add `&expires=<unix_ts>` and include it in the
|
- **Metrics**: Prometheus
|
||||||
HMAC input
|
- **Logging**: stdlib slog
|
||||||
- **Path traversal risk** - Ensure `<orig path>` cannot contain `..`
|
|
||||||
sequences or be used to access unintended resources on origin
|
|
||||||
- **SSRF potential** - Even with TLS requirement, internal/private IPs
|
|
||||||
(10.x, 172.16.x, 192.168.x, 127.x, ::1, link-local) must be blocked to
|
|
||||||
prevent server-side request forgery
|
|
||||||
- **Open redirect via Host header** - Validate that requests cannot be
|
|
||||||
manipulated to cache content under incorrect keys
|
|
||||||
|
|
||||||
### Important
|
## TODO
|
||||||
- **No authentication for cache purge** - If cache invalidation is needed, it requires auth
|
|
||||||
- **Response header sanitization** - Strip sensitive headers from upstream before forwarding (X-Powered-By, Server, etc.)
|
|
||||||
- **Content-Type validation** - Verify upstream Content-Type matches expected image types before processing
|
|
||||||
- **Maximum image dimensions** - Limit output dimensions to prevent resource exhaustion (e.g., max 4096x4096)
|
|
||||||
|
|
||||||
## URL Route Improvements
|
See [TODO.md](TODO.md) for the full prioritized task list.
|
||||||
|
|
||||||
Current: `/img/<size>/<orig host>/<orig path>?signature=<sig>&format=<format>`
|
## License
|
||||||
|
|
||||||
### Recommended Scheme
|
GPL-3.0. See [LICENSE](LICENSE).
|
||||||
```
|
|
||||||
/v1/image/<host>/<path>/<width>x<height>.<format>?sig=<sig>&exp=<expires>
|
|
||||||
```
|
|
||||||
|
|
||||||
The size+format segment (e.g., `800x600.webp`) is appended to the source path and stripped when constructing the upstream request. This pattern is unambiguous (regex: `(\d+x\d+|orig)\.(webp|jpg|jpeg|png|avif)$`) and won't collide with real paths.
|
## Author
|
||||||
|
|
||||||
**Size options:**
|
[@sneak](https://sneak.berlin)
|
||||||
- `800x600.<format>` - resize to 800x600
|
|
||||||
- `0x0.<format>` - original size, format conversion only
|
|
||||||
- `orig.<format>` - original size, format conversion only (human-friendly alias)
|
|
||||||
|
|
||||||
**Benefits:**
|
|
||||||
- API versioning (`/v1/`) allows breaking changes later
|
|
||||||
- Human-readable URLs that can be manually constructed for whitelisted domains
|
|
||||||
- Format as extension is intuitive and CDN-friendly
|
|
||||||
|
|
||||||
### Examples
|
|
||||||
|
|
||||||
**Basic resize and convert:**
|
|
||||||
```
|
|
||||||
/v1/image/cdn.example.com/photos/cat.jpg/800x600.webp?sig=abc123&exp=1704067200
|
|
||||||
```
|
|
||||||
Fetches `https://cdn.example.com/photos/cat.jpg`, resizes to 800x600, converts to webp.
|
|
||||||
|
|
||||||
**Source URL with query parameters:**
|
|
||||||
```
|
|
||||||
/v1/image/cdn.example.com/photos/cat.jpg%3Farg1=val1%26arg2=val2/800x600.webp?sig=abc123&exp=1704067200
|
|
||||||
```
|
|
||||||
Fetches `https://cdn.example.com/photos/cat.jpg?arg1=val1&arg2=val2`, resizes to 800x600, converts to webp.
|
|
||||||
|
|
||||||
Note: The source query string must be URL-encoded (`?` → `%3F`, `&` → `%26`) to avoid ambiguity with pixa's own query parameters.
|
|
||||||
|
|
||||||
**Original size, format conversion only:**
|
|
||||||
```
|
|
||||||
/v1/image/cdn.example.com/photos/cat.jpg/orig.webp?sig=abc123&exp=1704067200
|
|
||||||
/v1/image/cdn.example.com/photos/cat.jpg/0x0.webp?sig=abc123&exp=1704067200
|
|
||||||
```
|
|
||||||
Both fetch the original image and convert to webp without resizing.
|
|
||||||
|
|
||||||
## Additional Formats
|
|
||||||
|
|
||||||
### Output Formats to Support
|
|
||||||
- `avif` - Superior compression, growing browser support
|
|
||||||
- `gif` - For animated image passthrough (with frame limit)
|
|
||||||
- `svg` - Passthrough only, no resizing (vector)
|
|
||||||
|
|
||||||
### Input Format Whitelist (MIME types to accept)
|
|
||||||
- `image/jpeg`
|
|
||||||
- `image/png`
|
|
||||||
- `image/webp`
|
|
||||||
- `image/gif`
|
|
||||||
- `image/avif`
|
|
||||||
- `image/svg+xml` (passthrough or rasterize)
|
|
||||||
- **Reject all others** - Especially `image/x-*`, `application/*`
|
|
||||||
|
|
||||||
### Input Validation
|
|
||||||
- Verify magic bytes match declared Content-Type
|
|
||||||
- Maximum input file size (e.g., 50MB)
|
|
||||||
- Maximum input dimensions (e.g., 16384x16384)
|
|
||||||
- Reject files with embedded scripts (SVG sanitization)
|
|
||||||
|
|
||||||
## Rate Limiting
|
|
||||||
|
|
||||||
### Per-IP Limits
|
|
||||||
- Requests per second (e.g., 10 req/s burst, 100 req/min sustained)
|
|
||||||
- Concurrent connections (e.g., 50 per IP)
|
|
||||||
|
|
||||||
### Global Limits
|
|
||||||
- Total concurrent upstream fetches (prevent origin overwhelm)
|
|
||||||
- Per-origin fetch rate limiting (be a good citizen)
|
|
||||||
- Cache miss rate limiting (prevent cache-busting attacks)
|
|
||||||
|
|
||||||
### Response
|
|
||||||
- Return `429 Too Many Requests` with `Retry-After` header
|
|
||||||
- Consider `X-RateLimit-*` headers for transparency
|
|
||||||
|
|
||||||
## Additional Features for 1.0
|
|
||||||
|
|
||||||
### Must Have
|
|
||||||
- **Health check endpoint** - `/health` or `/healthz` for load balancers
|
|
||||||
- **Metrics endpoint** - `/metrics` (Prometheus format) for observability
|
|
||||||
- **Graceful shutdown** - Drain connections on SIGTERM
|
|
||||||
- **Request ID/tracing** - `X-Request-ID` header propagation
|
|
||||||
- **Cache-Control headers** - Proper `Cache-Control`, `ETag`, `Last-Modified` on responses
|
|
||||||
- **Vary header** - `Vary: Accept` if doing content negotiation
|
|
||||||
|
|
||||||
### Should Have
|
|
||||||
- **Auto-format selection** - If `format=auto`, pick best format based on `Accept` header
|
|
||||||
- **Quality parameter** - `&q=85` for lossy format quality control
|
|
||||||
- **Fit modes** - `fit=cover|contain|fill|inside|outside` for resize behavior
|
|
||||||
- **Background color** - For transparent-to-JPEG conversion
|
|
||||||
- **Blur/sharpen** - Common post-resize operations
|
|
||||||
- **Watermarking** - Optional overlay support
|
|
||||||
|
|
||||||
### Nice to Have
|
|
||||||
- **Cache warming API** - Pre-populate cache for known images
|
|
||||||
- **Cache stats API** - Hit/miss rates, storage usage
|
|
||||||
- **Admin UI** - Simple dashboard for monitoring
|
|
||||||
|
|
||||||
## Configuration Additions
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
server:
|
|
||||||
listen: ":8080"
|
|
||||||
read_timeout: 30s
|
|
||||||
write_timeout: 60s
|
|
||||||
max_header_bytes: 8192
|
|
||||||
|
|
||||||
cache:
|
|
||||||
directory: "/var/cache/pixa"
|
|
||||||
max_size_gb: 100
|
|
||||||
ttl: 168h # 7 days
|
|
||||||
negative_ttl: 5m # Cache 404s briefly
|
|
||||||
|
|
||||||
upstream:
|
|
||||||
timeout: 30s
|
|
||||||
max_response_size: 52428800 # 50MB
|
|
||||||
max_concurrent: 100
|
|
||||||
user_agent: "Pixa/1.0"
|
|
||||||
|
|
||||||
processing:
|
|
||||||
max_input_pixels: 268435456 # 16384x16384
|
|
||||||
max_output_dimension: 4096
|
|
||||||
default_quality: 85
|
|
||||||
strip_metadata: true # Remove EXIF etc.
|
|
||||||
|
|
||||||
security:
|
|
||||||
hmac_secret: "${PIXA_HMAC_SECRET}" # From env
|
|
||||||
signature_ttl: 3600 # 1 hour
|
|
||||||
blocked_networks:
|
|
||||||
- "10.0.0.0/8"
|
|
||||||
- "172.16.0.0/12"
|
|
||||||
- "192.168.0.0/16"
|
|
||||||
- "127.0.0.0/8"
|
|
||||||
- "::1/128"
|
|
||||||
- "fc00::/7"
|
|
||||||
|
|
||||||
rate_limit:
|
|
||||||
per_ip_rps: 10
|
|
||||||
per_ip_burst: 50
|
|
||||||
per_origin_rps: 100
|
|
||||||
|
|
||||||
cors:
|
|
||||||
allowed_origins: ["*"] # Or specific list
|
|
||||||
allowed_methods: ["GET", "HEAD", "OPTIONS"]
|
|
||||||
max_age: 86400
|
|
||||||
```
|
|
||||||
|
|
||||||
## Error Handling
|
|
||||||
|
|
||||||
### HTTP Status Codes
|
|
||||||
- `400` - Bad request (invalid parameters, malformed URL)
|
|
||||||
- `403` - Forbidden (invalid/expired signature, blocked origin)
|
|
||||||
- `404` - Origin returned 404 (cache negative response briefly)
|
|
||||||
- `413` - Payload too large (origin image exceeds limits)
|
|
||||||
- `415` - Unsupported media type (origin returned non-image)
|
|
||||||
- `422` - Unprocessable (valid image but cannot transform as requested)
|
|
||||||
- `429` - Rate limited
|
|
||||||
- `500` - Internal error
|
|
||||||
- `502` - Bad gateway (origin connection failed)
|
|
||||||
- `503` - Service unavailable (overloaded)
|
|
||||||
- `504` - Gateway timeout (origin timeout)
|
|
||||||
|
|
||||||
### Error Response Format
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"error": "invalid_signature",
|
|
||||||
"message": "Signature has expired",
|
|
||||||
"request_id": "abc123"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Quick Wins
|
|
||||||
|
|
||||||
1. **Conditional requests** - Support `If-None-Match` / `If-Modified-Since` to return `304 Not Modified`
|
|
||||||
2. **HEAD support** - Allow clients to check image metadata without downloading
|
|
||||||
3. **Canonical URLs** - Redirect non-canonical requests to prevent cache fragmentation
|
|
||||||
4. **Debug header** - `X-Pixa-Cache: HIT|MISS|STALE` for debugging
|
|
||||||
5. **Robots.txt** - Serve a robots.txt to prevent search engine crawling of proxy URLs
|
|
||||||
|
|||||||
Reference in New Issue
Block a user