Documents the signature scheme: HMAC-SHA256 over host:path:query:width:height:format:expiration with base64url encoding and Unix timestamp expiration.
325 lines
11 KiB
Markdown
325 lines
11 KiB
Markdown
# pixa caching image reverse proxy server
|
|
|
|
This is a web service written in go that is designed to proxy images from
|
|
source URLs, optionally resizing or transforming them, and serving the
|
|
results. Both the source images as well as the transformed images are
|
|
cached. The images served to the client are cached a configurable interval
|
|
so that subsequent requests to the same path on the pixa server are served
|
|
from disk without origin server requests or additional processing.
|
|
|
|
# storage
|
|
|
|
* unaltered source file straight from upstream:
|
|
* `<statedir>/cache/src-content/<ab>/<cd>/<abcdef0123... sha256 of source content>`
|
|
* source path metadata
|
|
* `<statedir>/cache/src-metadata/<hostname>/<sha256 of path component>.json`
|
|
* fetch time
|
|
* all original resp headers
|
|
* original request
|
|
* sha256 hash
|
|
|
|
Note that multiple source paths may reference the same content blob. We
|
|
won't do refcounting here, we'll use the state database for that.
|
|
|
|
* database:
|
|
* `<statedir>/state.sqlite3`
|
|
|
|
* output documents:
|
|
* `<statedir>/cache/dst-content/<ab>/<cd>/<abcd... sha256 of output content>`
|
|
|
|
While the database is the long-term authority on what we have in the output
|
|
cache, we must aggressively cache in-process the mapping between requests
|
|
and output content hashes so as to serve as a maximally efficient caching
|
|
proxy for extremely popular/hot request paths. The goal is the ability to
|
|
easily support 1-5k r/s.
|
|
|
|
# Routes
|
|
|
|
/img/<size>/<orig host>/<orig path>?signature=<sig>&format=<format>
|
|
|
|
Images are only fetched from origins using TLS. Origin certificates must be
|
|
valid at time of fetch.
|
|
|
|
<format> is one of 'orig', 'png', 'jpeg', 'webp'
|
|
|
|
<size> is one of 'orig' or '<x resolution>x<y resolution>'
|
|
|
|
# Source Hosts
|
|
|
|
Source hosts may be whitelisted in the pixa configuration. If not in the
|
|
explicit whitelist, a signature using a shared secret must be appended.
|
|
|
|
## Signature Specification
|
|
|
|
Signatures use HMAC-SHA256 and include an expiration timestamp to prevent replay attacks.
|
|
|
|
### Signed Data Format
|
|
|
|
The signature is computed over a colon-separated string:
|
|
|
|
```
|
|
HMAC-SHA256(secret, "host:path:query:width:height:format:expiration")
|
|
```
|
|
|
|
Where:
|
|
- `host` - Source origin hostname (e.g., `cdn.example.com`)
|
|
- `path` - Source path (e.g., `/photos/cat.jpg`)
|
|
- `query` - Source query string, empty string if none
|
|
- `width` - Requested width in pixels, `0` for original
|
|
- `height` - Requested height in pixels, `0` for original
|
|
- `format` - Output format (jpeg, png, webp, avif, gif, orig)
|
|
- `expiration` - Unix timestamp when signature expires
|
|
|
|
### URL Format with Signature
|
|
|
|
```
|
|
/v1/image/<host>/<path>/<size>.<format>?sig=<signature>&exp=<expiration>
|
|
```
|
|
|
|
### Example
|
|
|
|
For a request to resize `https://cdn.example.com/photos/cat.jpg` to 800x600 WebP
|
|
with expiration at Unix timestamp 1704067200:
|
|
|
|
1. Build the signature input:
|
|
```
|
|
cdn.example.com:/photos/cat.jpg::800:600:webp:1704067200
|
|
```
|
|
|
|
2. Compute HMAC-SHA256 with your secret key
|
|
|
|
3. Base64URL-encode the result
|
|
|
|
4. Final URL:
|
|
```
|
|
/v1/image/cdn.example.com/photos/cat.jpg/800x600.webp?sig=<base64url>&exp=1704067200
|
|
```
|
|
|
|
### Whitelist Patterns
|
|
|
|
The whitelist supports two pattern types:
|
|
- **Exact match**: `cdn.example.com` - matches only that host
|
|
- **Suffix match**: `.example.com` - matches `cdn.example.com`, `images.example.com`, and `example.com`
|
|
|
|
# configuration
|
|
|
|
* access-control-allow-origin config
|
|
* source host whitelist
|
|
* upstream fetch timeout
|
|
* upstream max response size
|
|
* downstream timeout
|
|
* downstream max request size
|
|
* downstream max response size
|
|
* internal processing timeout
|
|
* referer blacklist
|
|
|
|
# Design Review & Recommendations
|
|
|
|
## Security Concerns
|
|
|
|
### Critical
|
|
- **HMAC signature scheme is undefined** - The "FIXME" for signature
|
|
construction is a blocker. Recommend HMAC-SHA256 over the full path:
|
|
`HMAC-SHA256(secret, "/<size>/<host>/<path>?format=<format>")`
|
|
- **No signature expiration** - Signatures should include a timestamp to
|
|
prevent indefinite replay. Add `&expires=<unix_ts>` and include it in the
|
|
HMAC input
|
|
- **Path traversal risk** - Ensure `<orig path>` cannot contain `..`
|
|
sequences or be used to access unintended resources on origin
|
|
- **SSRF potential** - Even with TLS requirement, internal/private IPs
|
|
(10.x, 172.16.x, 192.168.x, 127.x, ::1, link-local) must be blocked to
|
|
prevent server-side request forgery
|
|
- **Open redirect via Host header** - Validate that requests cannot be
|
|
manipulated to cache content under incorrect keys
|
|
|
|
### Important
|
|
- **No authentication for cache purge** - If cache invalidation is needed, it requires auth
|
|
- **Response header sanitization** - Strip sensitive headers from upstream before forwarding (X-Powered-By, Server, etc.)
|
|
- **Content-Type validation** - Verify upstream Content-Type matches expected image types before processing
|
|
- **Maximum image dimensions** - Limit output dimensions to prevent resource exhaustion (e.g., max 4096x4096)
|
|
|
|
## URL Route Improvements
|
|
|
|
Current: `/img/<size>/<orig host>/<orig path>?signature=<sig>&format=<format>`
|
|
|
|
### Recommended Scheme
|
|
```
|
|
/v1/image/<host>/<path>/<width>x<height>.<format>?sig=<sig>&exp=<expires>
|
|
```
|
|
|
|
The size+format segment (e.g., `800x600.webp`) is appended to the source path and stripped when constructing the upstream request. This pattern is unambiguous (regex: `(\d+x\d+|orig)\.(webp|jpg|jpeg|png|avif)$`) and won't collide with real paths.
|
|
|
|
**Size options:**
|
|
- `800x600.<format>` - resize to 800x600
|
|
- `0x0.<format>` - original size, format conversion only
|
|
- `orig.<format>` - original size, format conversion only (human-friendly alias)
|
|
|
|
**Benefits:**
|
|
- API versioning (`/v1/`) allows breaking changes later
|
|
- Human-readable URLs that can be manually constructed for whitelisted domains
|
|
- Format as extension is intuitive and CDN-friendly
|
|
|
|
### Examples
|
|
|
|
**Basic resize and convert:**
|
|
```
|
|
/v1/image/cdn.example.com/photos/cat.jpg/800x600.webp?sig=abc123&exp=1704067200
|
|
```
|
|
Fetches `https://cdn.example.com/photos/cat.jpg`, resizes to 800x600, converts to webp.
|
|
|
|
**Source URL with query parameters:**
|
|
```
|
|
/v1/image/cdn.example.com/photos/cat.jpg%3Farg1=val1%26arg2=val2/800x600.webp?sig=abc123&exp=1704067200
|
|
```
|
|
Fetches `https://cdn.example.com/photos/cat.jpg?arg1=val1&arg2=val2`, resizes to 800x600, converts to webp.
|
|
|
|
Note: The source query string must be URL-encoded (`?` → `%3F`, `&` → `%26`) to avoid ambiguity with pixa's own query parameters.
|
|
|
|
**Original size, format conversion only:**
|
|
```
|
|
/v1/image/cdn.example.com/photos/cat.jpg/orig.webp?sig=abc123&exp=1704067200
|
|
/v1/image/cdn.example.com/photos/cat.jpg/0x0.webp?sig=abc123&exp=1704067200
|
|
```
|
|
Both fetch the original image and convert to webp without resizing.
|
|
|
|
## Additional Formats
|
|
|
|
### Output Formats to Support
|
|
- `avif` - Superior compression, growing browser support
|
|
- `gif` - For animated image passthrough (with frame limit)
|
|
- `svg` - Passthrough only, no resizing (vector)
|
|
|
|
### Input Format Whitelist (MIME types to accept)
|
|
- `image/jpeg`
|
|
- `image/png`
|
|
- `image/webp`
|
|
- `image/gif`
|
|
- `image/avif`
|
|
- `image/svg+xml` (passthrough or rasterize)
|
|
- **Reject all others** - Especially `image/x-*`, `application/*`
|
|
|
|
### Input Validation
|
|
- Verify magic bytes match declared Content-Type
|
|
- Maximum input file size (e.g., 50MB)
|
|
- Maximum input dimensions (e.g., 16384x16384)
|
|
- Reject files with embedded scripts (SVG sanitization)
|
|
|
|
## Rate Limiting
|
|
|
|
### Per-IP Limits
|
|
- Requests per second (e.g., 10 req/s burst, 100 req/min sustained)
|
|
- Concurrent connections (e.g., 50 per IP)
|
|
|
|
### Global Limits
|
|
- Total concurrent upstream fetches (prevent origin overwhelm)
|
|
- Per-origin fetch rate limiting (be a good citizen)
|
|
- Cache miss rate limiting (prevent cache-busting attacks)
|
|
|
|
### Response
|
|
- Return `429 Too Many Requests` with `Retry-After` header
|
|
- Consider `X-RateLimit-*` headers for transparency
|
|
|
|
## Additional Features for 1.0
|
|
|
|
### Must Have
|
|
- **Health check endpoint** - `/health` or `/healthz` for load balancers
|
|
- **Metrics endpoint** - `/metrics` (Prometheus format) for observability
|
|
- **Graceful shutdown** - Drain connections on SIGTERM
|
|
- **Request ID/tracing** - `X-Request-ID` header propagation
|
|
- **Cache-Control headers** - Proper `Cache-Control`, `ETag`, `Last-Modified` on responses
|
|
- **Vary header** - `Vary: Accept` if doing content negotiation
|
|
|
|
### Should Have
|
|
- **Auto-format selection** - If `format=auto`, pick best format based on `Accept` header
|
|
- **Quality parameter** - `&q=85` for lossy format quality control
|
|
- **Fit modes** - `fit=cover|contain|fill|inside|outside` for resize behavior
|
|
- **Background color** - For transparent-to-JPEG conversion
|
|
- **Blur/sharpen** - Common post-resize operations
|
|
- **Watermarking** - Optional overlay support
|
|
|
|
### Nice to Have
|
|
- **Cache warming API** - Pre-populate cache for known images
|
|
- **Cache stats API** - Hit/miss rates, storage usage
|
|
- **Admin UI** - Simple dashboard for monitoring
|
|
|
|
## Configuration Additions
|
|
|
|
```yaml
|
|
server:
|
|
listen: ":8080"
|
|
read_timeout: 30s
|
|
write_timeout: 60s
|
|
max_header_bytes: 8192
|
|
|
|
cache:
|
|
directory: "/var/cache/pixa"
|
|
max_size_gb: 100
|
|
ttl: 168h # 7 days
|
|
negative_ttl: 5m # Cache 404s briefly
|
|
|
|
upstream:
|
|
timeout: 30s
|
|
max_response_size: 52428800 # 50MB
|
|
max_concurrent: 100
|
|
user_agent: "Pixa/1.0"
|
|
|
|
processing:
|
|
max_input_pixels: 268435456 # 16384x16384
|
|
max_output_dimension: 4096
|
|
default_quality: 85
|
|
strip_metadata: true # Remove EXIF etc.
|
|
|
|
security:
|
|
hmac_secret: "${PIXA_HMAC_SECRET}" # From env
|
|
signature_ttl: 3600 # 1 hour
|
|
blocked_networks:
|
|
- "10.0.0.0/8"
|
|
- "172.16.0.0/12"
|
|
- "192.168.0.0/16"
|
|
- "127.0.0.0/8"
|
|
- "::1/128"
|
|
- "fc00::/7"
|
|
|
|
rate_limit:
|
|
per_ip_rps: 10
|
|
per_ip_burst: 50
|
|
per_origin_rps: 100
|
|
|
|
cors:
|
|
allowed_origins: ["*"] # Or specific list
|
|
allowed_methods: ["GET", "HEAD", "OPTIONS"]
|
|
max_age: 86400
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### HTTP Status Codes
|
|
- `400` - Bad request (invalid parameters, malformed URL)
|
|
- `403` - Forbidden (invalid/expired signature, blocked origin)
|
|
- `404` - Origin returned 404 (cache negative response briefly)
|
|
- `413` - Payload too large (origin image exceeds limits)
|
|
- `415` - Unsupported media type (origin returned non-image)
|
|
- `422` - Unprocessable (valid image but cannot transform as requested)
|
|
- `429` - Rate limited
|
|
- `500` - Internal error
|
|
- `502` - Bad gateway (origin connection failed)
|
|
- `503` - Service unavailable (overloaded)
|
|
- `504` - Gateway timeout (origin timeout)
|
|
|
|
### Error Response Format
|
|
```json
|
|
{
|
|
"error": "invalid_signature",
|
|
"message": "Signature has expired",
|
|
"request_id": "abc123"
|
|
}
|
|
```
|
|
|
|
## Quick Wins
|
|
|
|
1. **Conditional requests** - Support `If-None-Match` / `If-Modified-Since` to return `304 Not Modified`
|
|
2. **HEAD support** - Allow clients to check image metadata without downloading
|
|
3. **Canonical URLs** - Redirect non-canonical requests to prevent cache fragmentation
|
|
4. **Debug header** - `X-Pixa-Cache: HIT|MISS|STALE` for debugging
|
|
5. **Robots.txt** - Serve a robots.txt to prevent search engine crawling of proxy URLs
|