chore: restructure README with required policy sections
This commit is contained in:
376
README.md
376
README.md
@@ -1,324 +1,138 @@
|
||||
# pixa caching image reverse proxy server
|
||||
# pixa
|
||||
|
||||
This is a web service written in go that is designed to proxy images from
|
||||
source URLs, optionally resizing or transforming them, and serving the
|
||||
results. Both the source images as well as the transformed images are
|
||||
cached. The images served to the client are cached a configurable interval
|
||||
so that subsequent requests to the same path on the pixa server are served
|
||||
from disk without origin server requests or additional processing.
|
||||
pixa is a GPL-3.0-licensed Go web server by
|
||||
[@sneak](https://sneak.berlin) that proxies images from upstream
|
||||
sources, optionally resizing or transforming them, and serves the
|
||||
results. Both source and transformed images are cached to disk so that
|
||||
subsequent requests are served without origin fetches or additional
|
||||
processing.
|
||||
|
||||
# storage
|
||||
## Getting Started
|
||||
|
||||
* unaltered source file straight from upstream:
|
||||
* `<statedir>/cache/src-content/<ab>/<cd>/<abcdef0123... sha256 of source content>`
|
||||
* source path metadata
|
||||
* `<statedir>/cache/src-metadata/<hostname>/<sha256 of path component>.json`
|
||||
* fetch time
|
||||
* all original resp headers
|
||||
* original request
|
||||
* sha256 hash
|
||||
```bash
|
||||
# clone and build
|
||||
git clone https://git.eeqj.de/sneak/pixa.git
|
||||
cd pixa
|
||||
make build
|
||||
|
||||
Note that multiple source paths may reference the same content blob. We
|
||||
won't do refcounting here, we'll use the state database for that.
|
||||
# run with a config file
|
||||
./bin/pixad --config config.example.yml
|
||||
|
||||
* database:
|
||||
* `<statedir>/state.sqlite3`
|
||||
# or build and run via Docker
|
||||
make docker
|
||||
docker run -p 8080:8080 pixad:latest
|
||||
```
|
||||
|
||||
* output documents:
|
||||
* `<statedir>/cache/dst-content/<ab>/<cd>/<abcd... sha256 of output content>`
|
||||
## Rationale
|
||||
|
||||
While the database is the long-term authority on what we have in the output
|
||||
cache, we must aggressively cache in-process the mapping between requests
|
||||
and output content hashes so as to serve as a maximally efficient caching
|
||||
proxy for extremely popular/hot request paths. The goal is the ability to
|
||||
easily support 1-5k r/s.
|
||||
Image-heavy web applications need a fast, caching reverse proxy that
|
||||
can resize and transcode images on the fly. pixa fills that role as a
|
||||
single, self-contained binary with no external runtime dependencies
|
||||
beyond libvips. It supports HMAC-SHA256 signed URLs with expiration to
|
||||
prevent abuse, and whitelisted source hosts for open access.
|
||||
|
||||
# Routes
|
||||
## Design
|
||||
|
||||
/img/<size>/<orig host>/<orig path>?signature=<sig>&format=<format>
|
||||
### Storage
|
||||
|
||||
Images are only fetched from origins using TLS. Origin certificates must be
|
||||
valid at time of fetch.
|
||||
- **Source content**:
|
||||
`<statedir>/cache/src-content/<ab>/<cd>/<sha256 of source content>`
|
||||
- **Source metadata**:
|
||||
`<statedir>/cache/src-metadata/<hostname>/<sha256 of path>.json`
|
||||
(fetch time, original headers, request, content hash)
|
||||
- **Database**: `<statedir>/state.sqlite3` (SQLite)
|
||||
- **Output documents**:
|
||||
`<statedir>/cache/dst-content/<ab>/<cd>/<sha256 of output content>`
|
||||
|
||||
<format> is one of 'orig', 'png', 'jpeg', 'webp'
|
||||
Multiple source paths may reference the same content blob; the
|
||||
database tracks references rather than using filesystem refcounting.
|
||||
In-process caching of request-to-output mappings targets 1-5k r/s.
|
||||
|
||||
<size> is one of 'orig' or '<x resolution>x<y resolution>'
|
||||
### Routes
|
||||
|
||||
# Source Hosts
|
||||
```
|
||||
/v1/image/<host>/<path>/<size>.<format>?sig=<signature>&exp=<expiration>
|
||||
```
|
||||
|
||||
Source hosts may be whitelisted in the pixa configuration. If not in the
|
||||
explicit whitelist, a signature using a shared secret must be appended.
|
||||
Images are only fetched from origins using TLS with valid certificates.
|
||||
|
||||
## Signature Specification
|
||||
- `<format>`: one of `orig`, `png`, `jpeg`, `webp`
|
||||
- `<size>`: `orig` or `<width>x<height>` (e.g. `800x600`)
|
||||
|
||||
Signatures use HMAC-SHA256 and include an expiration timestamp to prevent replay attacks.
|
||||
### Source Hosts
|
||||
|
||||
### Signed Data Format
|
||||
Source hosts may be whitelisted in the configuration. Non-whitelisted
|
||||
hosts require an HMAC-SHA256 signature.
|
||||
|
||||
The signature is computed over a colon-separated string:
|
||||
#### Signature Specification
|
||||
|
||||
Signatures use HMAC-SHA256 and include an expiration timestamp to
|
||||
prevent replay attacks.
|
||||
|
||||
**Signed data format** (colon-separated):
|
||||
|
||||
```
|
||||
HMAC-SHA256(secret, "host:path:query:width:height:format:expiration")
|
||||
```
|
||||
|
||||
Where:
|
||||
- `host` - Source origin hostname (e.g., `cdn.example.com`)
|
||||
- `path` - Source path (e.g., `/photos/cat.jpg`)
|
||||
- `query` - Source query string, empty string if none
|
||||
- `width` - Requested width in pixels, `0` for original
|
||||
- `height` - Requested height in pixels, `0` for original
|
||||
- `format` - Output format (jpeg, png, webp, avif, gif, orig)
|
||||
- `expiration` - Unix timestamp when signature expires
|
||||
|
||||
### URL Format with Signature
|
||||
- `host` — source origin hostname (e.g. `cdn.example.com`)
|
||||
- `path` — source path (e.g. `/photos/cat.jpg`)
|
||||
- `query` — source query string, empty string if none
|
||||
- `width` — requested width in pixels, `0` for original
|
||||
- `height` — requested height in pixels, `0` for original
|
||||
- `format` — output format (jpeg, png, webp, avif, gif, orig)
|
||||
- `expiration` — Unix timestamp when signature expires
|
||||
|
||||
```
|
||||
/v1/image/<host>/<path>/<size>.<format>?sig=<signature>&exp=<expiration>
|
||||
```
|
||||
|
||||
### Example
|
||||
|
||||
For a request to resize `https://cdn.example.com/photos/cat.jpg` to 800x600 WebP
|
||||
with expiration at Unix timestamp 1704067200:
|
||||
|
||||
1. Build the signature input:
|
||||
```
|
||||
cdn.example.com:/photos/cat.jpg::800:600:webp:1704067200
|
||||
```
|
||||
**Example:** resize
|
||||
`https://cdn.example.com/photos/cat.jpg` to 800x600 WebP with
|
||||
expiration 1704067200:
|
||||
|
||||
1. Build input:
|
||||
`cdn.example.com:/photos/cat.jpg::800:600:webp:1704067200`
|
||||
2. Compute HMAC-SHA256 with your secret key
|
||||
|
||||
3. Base64URL-encode the result
|
||||
4. URL:
|
||||
`/v1/image/cdn.example.com/photos/cat.jpg/800x600.webp?sig=<base64url>&exp=1704067200`
|
||||
|
||||
4. Final URL:
|
||||
```
|
||||
/v1/image/cdn.example.com/photos/cat.jpg/800x600.webp?sig=<base64url>&exp=1704067200
|
||||
```
|
||||
**Whitelist patterns:**
|
||||
|
||||
### Whitelist Patterns
|
||||
- **Exact match**: `cdn.example.com` — matches only that host
|
||||
- **Suffix match**: `.example.com` — matches `cdn.example.com`,
|
||||
`images.example.com`, and `example.com`
|
||||
|
||||
The whitelist supports two pattern types:
|
||||
- **Exact match**: `cdn.example.com` - matches only that host
|
||||
- **Suffix match**: `.example.com` - matches `cdn.example.com`, `images.example.com`, and `example.com`
|
||||
### Configuration
|
||||
|
||||
# configuration
|
||||
Configured via YAML file (`--config`). Key settings:
|
||||
|
||||
* access-control-allow-origin config
|
||||
* source host whitelist
|
||||
* upstream fetch timeout
|
||||
* upstream max response size
|
||||
* downstream timeout
|
||||
* downstream max request size
|
||||
* downstream max response size
|
||||
* internal processing timeout
|
||||
* referer blacklist
|
||||
- `access_control_allow_origin` — CORS origin
|
||||
- `source_host_whitelist` — list of allowed upstream hosts
|
||||
- `upstream_fetch_timeout` — timeout for origin requests
|
||||
- `upstream_max_response_size` — max origin response size
|
||||
- `downstream_timeout` — client response timeout
|
||||
- `signing_key` — HMAC secret for URL signatures
|
||||
|
||||
# Design Review & Recommendations
|
||||
See `config.example.yml` for all options with defaults.
|
||||
|
||||
## Security Concerns
|
||||
### Architecture
|
||||
|
||||
### Critical
|
||||
- **HMAC signature scheme is undefined** - The "FIXME" for signature
|
||||
construction is a blocker. Recommend HMAC-SHA256 over the full path:
|
||||
`HMAC-SHA256(secret, "/<size>/<host>/<path>?format=<format>")`
|
||||
- **No signature expiration** - Signatures should include a timestamp to
|
||||
prevent indefinite replay. Add `&expires=<unix_ts>` and include it in the
|
||||
HMAC input
|
||||
- **Path traversal risk** - Ensure `<orig path>` cannot contain `..`
|
||||
sequences or be used to access unintended resources on origin
|
||||
- **SSRF potential** - Even with TLS requirement, internal/private IPs
|
||||
(10.x, 172.16.x, 192.168.x, 127.x, ::1, link-local) must be blocked to
|
||||
prevent server-side request forgery
|
||||
- **Open redirect via Host header** - Validate that requests cannot be
|
||||
manipulated to cache content under incorrect keys
|
||||
- **Dependency injection**: Uber fx
|
||||
- **HTTP router**: go-chi
|
||||
- **Image processing**: govips (CGO wrapper for libvips)
|
||||
- **Database**: SQLite via modernc.org/sqlite
|
||||
- **Static assets**: embedded via `//go:embed`
|
||||
- **Metrics**: Prometheus
|
||||
- **Logging**: stdlib slog
|
||||
|
||||
### Important
|
||||
- **No authentication for cache purge** - If cache invalidation is needed, it requires auth
|
||||
- **Response header sanitization** - Strip sensitive headers from upstream before forwarding (X-Powered-By, Server, etc.)
|
||||
- **Content-Type validation** - Verify upstream Content-Type matches expected image types before processing
|
||||
- **Maximum image dimensions** - Limit output dimensions to prevent resource exhaustion (e.g., max 4096x4096)
|
||||
## TODO
|
||||
|
||||
## URL Route Improvements
|
||||
See [TODO.md](TODO.md) for the full prioritized task list.
|
||||
|
||||
Current: `/img/<size>/<orig host>/<orig path>?signature=<sig>&format=<format>`
|
||||
## License
|
||||
|
||||
### Recommended Scheme
|
||||
```
|
||||
/v1/image/<host>/<path>/<width>x<height>.<format>?sig=<sig>&exp=<expires>
|
||||
```
|
||||
GPL-3.0. See [LICENSE](LICENSE).
|
||||
|
||||
The size+format segment (e.g., `800x600.webp`) is appended to the source path and stripped when constructing the upstream request. This pattern is unambiguous (regex: `(\d+x\d+|orig)\.(webp|jpg|jpeg|png|avif)$`) and won't collide with real paths.
|
||||
## Author
|
||||
|
||||
**Size options:**
|
||||
- `800x600.<format>` - resize to 800x600
|
||||
- `0x0.<format>` - original size, format conversion only
|
||||
- `orig.<format>` - original size, format conversion only (human-friendly alias)
|
||||
|
||||
**Benefits:**
|
||||
- API versioning (`/v1/`) allows breaking changes later
|
||||
- Human-readable URLs that can be manually constructed for whitelisted domains
|
||||
- Format as extension is intuitive and CDN-friendly
|
||||
|
||||
### Examples
|
||||
|
||||
**Basic resize and convert:**
|
||||
```
|
||||
/v1/image/cdn.example.com/photos/cat.jpg/800x600.webp?sig=abc123&exp=1704067200
|
||||
```
|
||||
Fetches `https://cdn.example.com/photos/cat.jpg`, resizes to 800x600, converts to webp.
|
||||
|
||||
**Source URL with query parameters:**
|
||||
```
|
||||
/v1/image/cdn.example.com/photos/cat.jpg%3Farg1=val1%26arg2=val2/800x600.webp?sig=abc123&exp=1704067200
|
||||
```
|
||||
Fetches `https://cdn.example.com/photos/cat.jpg?arg1=val1&arg2=val2`, resizes to 800x600, converts to webp.
|
||||
|
||||
Note: The source query string must be URL-encoded (`?` → `%3F`, `&` → `%26`) to avoid ambiguity with pixa's own query parameters.
|
||||
|
||||
**Original size, format conversion only:**
|
||||
```
|
||||
/v1/image/cdn.example.com/photos/cat.jpg/orig.webp?sig=abc123&exp=1704067200
|
||||
/v1/image/cdn.example.com/photos/cat.jpg/0x0.webp?sig=abc123&exp=1704067200
|
||||
```
|
||||
Both fetch the original image and convert to webp without resizing.
|
||||
|
||||
## Additional Formats
|
||||
|
||||
### Output Formats to Support
|
||||
- `avif` - Superior compression, growing browser support
|
||||
- `gif` - For animated image passthrough (with frame limit)
|
||||
- `svg` - Passthrough only, no resizing (vector)
|
||||
|
||||
### Input Format Whitelist (MIME types to accept)
|
||||
- `image/jpeg`
|
||||
- `image/png`
|
||||
- `image/webp`
|
||||
- `image/gif`
|
||||
- `image/avif`
|
||||
- `image/svg+xml` (passthrough or rasterize)
|
||||
- **Reject all others** - Especially `image/x-*`, `application/*`
|
||||
|
||||
### Input Validation
|
||||
- Verify magic bytes match declared Content-Type
|
||||
- Maximum input file size (e.g., 50MB)
|
||||
- Maximum input dimensions (e.g., 16384x16384)
|
||||
- Reject files with embedded scripts (SVG sanitization)
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
### Per-IP Limits
|
||||
- Requests per second (e.g., 10 req/s burst, 100 req/min sustained)
|
||||
- Concurrent connections (e.g., 50 per IP)
|
||||
|
||||
### Global Limits
|
||||
- Total concurrent upstream fetches (prevent origin overwhelm)
|
||||
- Per-origin fetch rate limiting (be a good citizen)
|
||||
- Cache miss rate limiting (prevent cache-busting attacks)
|
||||
|
||||
### Response
|
||||
- Return `429 Too Many Requests` with `Retry-After` header
|
||||
- Consider `X-RateLimit-*` headers for transparency
|
||||
|
||||
## Additional Features for 1.0
|
||||
|
||||
### Must Have
|
||||
- **Health check endpoint** - `/health` or `/healthz` for load balancers
|
||||
- **Metrics endpoint** - `/metrics` (Prometheus format) for observability
|
||||
- **Graceful shutdown** - Drain connections on SIGTERM
|
||||
- **Request ID/tracing** - `X-Request-ID` header propagation
|
||||
- **Cache-Control headers** - Proper `Cache-Control`, `ETag`, `Last-Modified` on responses
|
||||
- **Vary header** - `Vary: Accept` if doing content negotiation
|
||||
|
||||
### Should Have
|
||||
- **Auto-format selection** - If `format=auto`, pick best format based on `Accept` header
|
||||
- **Quality parameter** - `&q=85` for lossy format quality control
|
||||
- **Fit modes** - `fit=cover|contain|fill|inside|outside` for resize behavior
|
||||
- **Background color** - For transparent-to-JPEG conversion
|
||||
- **Blur/sharpen** - Common post-resize operations
|
||||
- **Watermarking** - Optional overlay support
|
||||
|
||||
### Nice to Have
|
||||
- **Cache warming API** - Pre-populate cache for known images
|
||||
- **Cache stats API** - Hit/miss rates, storage usage
|
||||
- **Admin UI** - Simple dashboard for monitoring
|
||||
|
||||
## Configuration Additions
|
||||
|
||||
```yaml
|
||||
server:
|
||||
listen: ":8080"
|
||||
read_timeout: 30s
|
||||
write_timeout: 60s
|
||||
max_header_bytes: 8192
|
||||
|
||||
cache:
|
||||
directory: "/var/cache/pixa"
|
||||
max_size_gb: 100
|
||||
ttl: 168h # 7 days
|
||||
negative_ttl: 5m # Cache 404s briefly
|
||||
|
||||
upstream:
|
||||
timeout: 30s
|
||||
max_response_size: 52428800 # 50MB
|
||||
max_concurrent: 100
|
||||
user_agent: "Pixa/1.0"
|
||||
|
||||
processing:
|
||||
max_input_pixels: 268435456 # 16384x16384
|
||||
max_output_dimension: 4096
|
||||
default_quality: 85
|
||||
strip_metadata: true # Remove EXIF etc.
|
||||
|
||||
security:
|
||||
hmac_secret: "${PIXA_HMAC_SECRET}" # From env
|
||||
signature_ttl: 3600 # 1 hour
|
||||
blocked_networks:
|
||||
- "10.0.0.0/8"
|
||||
- "172.16.0.0/12"
|
||||
- "192.168.0.0/16"
|
||||
- "127.0.0.0/8"
|
||||
- "::1/128"
|
||||
- "fc00::/7"
|
||||
|
||||
rate_limit:
|
||||
per_ip_rps: 10
|
||||
per_ip_burst: 50
|
||||
per_origin_rps: 100
|
||||
|
||||
cors:
|
||||
allowed_origins: ["*"] # Or specific list
|
||||
allowed_methods: ["GET", "HEAD", "OPTIONS"]
|
||||
max_age: 86400
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### HTTP Status Codes
|
||||
- `400` - Bad request (invalid parameters, malformed URL)
|
||||
- `403` - Forbidden (invalid/expired signature, blocked origin)
|
||||
- `404` - Origin returned 404 (cache negative response briefly)
|
||||
- `413` - Payload too large (origin image exceeds limits)
|
||||
- `415` - Unsupported media type (origin returned non-image)
|
||||
- `422` - Unprocessable (valid image but cannot transform as requested)
|
||||
- `429` - Rate limited
|
||||
- `500` - Internal error
|
||||
- `502` - Bad gateway (origin connection failed)
|
||||
- `503` - Service unavailable (overloaded)
|
||||
- `504` - Gateway timeout (origin timeout)
|
||||
|
||||
### Error Response Format
|
||||
```json
|
||||
{
|
||||
"error": "invalid_signature",
|
||||
"message": "Signature has expired",
|
||||
"request_id": "abc123"
|
||||
}
|
||||
```
|
||||
|
||||
## Quick Wins
|
||||
|
||||
1. **Conditional requests** - Support `If-None-Match` / `If-Modified-Since` to return `304 Not Modified`
|
||||
2. **HEAD support** - Allow clients to check image metadata without downloading
|
||||
3. **Canonical URLs** - Redirect non-canonical requests to prevent cache fragmentation
|
||||
4. **Debug header** - `X-Pixa-Cache: HIT|MISS|STALE` for debugging
|
||||
5. **Robots.txt** - Serve a robots.txt to prevent search engine crawling of proxy URLs
|
||||
[@sneak](https://sneak.berlin)
|
||||
|
||||
Reference in New Issue
Block a user