Add project documentation and linter config
This commit is contained in:
parent
4ef9141960
commit
6071fd5bb7
117
.golangci.yml
Normal file
117
.golangci.yml
Normal file
@ -0,0 +1,117 @@
|
||||
version: "2"
|
||||
|
||||
run:
|
||||
go: "1.24"
|
||||
tests: false
|
||||
|
||||
linters:
|
||||
enable:
|
||||
# Additional linters requested
|
||||
- testifylint # Checks usage of github.com/stretchr/testify
|
||||
- usetesting # usetesting is an analyzer that detects using os.Setenv instead of t.Setenv since Go 1.17
|
||||
# - tagliatelle # Disabled: we need snake_case for external API compatibility
|
||||
- nlreturn # nlreturn checks for a new line before return and branch statements
|
||||
- nilnil # Checks that there is no simultaneous return of nil error and an invalid value
|
||||
- nestif # Reports deeply nested if statements
|
||||
- mnd # An analyzer to detect magic numbers
|
||||
- lll # Reports long lines
|
||||
- intrange # intrange is a linter to find places where for loops could make use of an integer range
|
||||
- gochecknoglobals # Check that no global variables exist
|
||||
|
||||
# Default/existing linters that are commonly useful
|
||||
- govet
|
||||
- errcheck
|
||||
- staticcheck
|
||||
- unused
|
||||
- ineffassign
|
||||
- misspell
|
||||
- revive
|
||||
- gosec
|
||||
- unconvert
|
||||
- unparam
|
||||
|
||||
linters-settings:
|
||||
lll:
|
||||
line-length: 120
|
||||
|
||||
nestif:
|
||||
min-complexity: 4
|
||||
|
||||
nlreturn:
|
||||
block-size: 2
|
||||
|
||||
revive:
|
||||
rules:
|
||||
- name: var-naming
|
||||
arguments:
|
||||
- []
|
||||
- []
|
||||
- "upperCaseConst=true"
|
||||
|
||||
tagliatelle:
|
||||
case:
|
||||
rules:
|
||||
json: snake
|
||||
yaml: snake
|
||||
xml: snake
|
||||
bson: snake
|
||||
|
||||
testifylint:
|
||||
enable-all: true
|
||||
|
||||
usetesting: {}
|
||||
|
||||
issues:
|
||||
max-issues-per-linter: 0
|
||||
max-same-issues: 0
|
||||
exclude-rules:
|
||||
# Exclude unused parameter warnings for cobra command signatures
|
||||
- text: "parameter '(args|cmd)' seems to be unused"
|
||||
linters:
|
||||
- revive
|
||||
|
||||
# Allow ALL_CAPS constant names
|
||||
- text: "don't use ALL_CAPS in Go names"
|
||||
linters:
|
||||
- revive
|
||||
|
||||
# Allow snake_case JSON tags for external API compatibility
|
||||
- path: "internal/types/ris.go"
|
||||
linters:
|
||||
- tagliatelle
|
||||
|
||||
# Allow snake_case JSON tags for database models
|
||||
- path: "internal/database/models.go"
|
||||
linters:
|
||||
- tagliatelle
|
||||
|
||||
# Allow generic package name for types that define data structures
|
||||
- path: "internal/types/"
|
||||
text: "avoid meaningless package names"
|
||||
linters:
|
||||
- revive
|
||||
|
||||
# Allow globals in the globals package (by design)
|
||||
- path: "internal/globals/"
|
||||
linters:
|
||||
- gochecknoglobals
|
||||
|
||||
# Allow globals in main (Version/Buildarch set by ldflags)
|
||||
- path: "cmd/"
|
||||
linters:
|
||||
- gochecknoglobals
|
||||
|
||||
# Allow blank imports for driver registration
|
||||
- text: "blank-imports"
|
||||
linters:
|
||||
- revive
|
||||
|
||||
# Allow unused fx.Lifecycle parameters (required by fx signature)
|
||||
- text: "parameter 'lc' seems to be unused"
|
||||
linters:
|
||||
- revive
|
||||
|
||||
# Allow unused context parameters in fx hooks
|
||||
- text: "parameter 'ctx' seems to be unused"
|
||||
linters:
|
||||
- revive
|
||||
16
CLAUDE.md
Normal file
16
CLAUDE.md
Normal file
@ -0,0 +1,16 @@
|
||||
# repository rules
|
||||
|
||||
* never use `git add -A` - add specific changes to a deliberate commit. a
|
||||
commit should contain one change. after each change, make a commit with a
|
||||
good one-line summary.
|
||||
|
||||
* NEVER modify the linter config without asking first.
|
||||
|
||||
* NEVER modify tests to exclude special cases or otherwise get them to pass
|
||||
without asking first. in almost all cases, the code should be changed,
|
||||
NOT the tests.
|
||||
|
||||
* when linting, assume the linter config is CORRECT, and that each item
|
||||
output by the linter is something that legitimately needs fixing in the
|
||||
code.
|
||||
|
||||
1267
CONVENTIONS.md
Normal file
1267
CONVENTIONS.md
Normal file
File diff suppressed because it is too large
Load Diff
275
README.md
Normal file
275
README.md
Normal file
@ -0,0 +1,275 @@
|
||||
# pixa caching image reverse proxy server
|
||||
|
||||
This is a web service written in go that is designed to proxy images from
|
||||
source URLs, optionally resizing or transforming them, and serving the
|
||||
results. Both the source images as well as the transformed images are
|
||||
cached. The images served to the client are cached a configurable interval
|
||||
so that subsequent requests to the same path on the pixa server are served
|
||||
from disk without origin server requests or additional processing.
|
||||
|
||||
# storage
|
||||
|
||||
* unaltered source file straight from upstream:
|
||||
* `<statedir>/cache/src-content/<ab>/<cd>/<abcdef0123... sha256 of source content>`
|
||||
* source path metadata
|
||||
* `<statedir>/cache/src-metadata/<hostname>/<sha256 of path component>.json`
|
||||
* fetch time
|
||||
* all original resp headers
|
||||
* original request
|
||||
* sha256 hash
|
||||
|
||||
Note that multiple source paths may reference the same content blob. We
|
||||
won't do refcounting here, we'll use the state database for that.
|
||||
|
||||
* database:
|
||||
* `<statedir>/state.sqlite3`
|
||||
|
||||
* output documents:
|
||||
* `<statedir>/cache/dst-content/<ab>/<cd>/<abcd... sha256 of output content>`
|
||||
|
||||
While the database is the long-term authority on what we have in the output
|
||||
cache, we must aggressively cache in-process the mapping between requests
|
||||
and output content hashes so as to serve as a maximally efficient caching
|
||||
proxy for extremely popular/hot request paths. The goal is the ability to
|
||||
easily support 1-5k r/s.
|
||||
|
||||
# Routes
|
||||
|
||||
/img/<size>/<orig host>/<orig path>?signature=<sig>&format=<format>
|
||||
|
||||
Images are only fetched from origins using TLS. Origin certificates must be
|
||||
valid at time of fetch.
|
||||
|
||||
<format> is one of 'orig', 'png', 'jpeg', 'webp'
|
||||
|
||||
<size> is one of 'orig' or '<x resolution>x<y resolution>'
|
||||
|
||||
# Source Hosts
|
||||
|
||||
Source hosts may be whitelisted in the pixa configuration. If not in the
|
||||
explicit whitelist, a signature using a shared secret must be appended. It
|
||||
is constructed using HMAC in the following way:
|
||||
|
||||
FIXME
|
||||
|
||||
# configuration
|
||||
|
||||
* access-control-allow-origin config
|
||||
* source host whitelist
|
||||
* upstream fetch timeout
|
||||
* upstream max response size
|
||||
* downstream timeout
|
||||
* downstream max request size
|
||||
* downstream max response size
|
||||
* internal processing timeout
|
||||
* referer blacklist
|
||||
|
||||
# Design Review & Recommendations
|
||||
|
||||
## Security Concerns
|
||||
|
||||
### Critical
|
||||
- **HMAC signature scheme is undefined** - The "FIXME" for signature
|
||||
construction is a blocker. Recommend HMAC-SHA256 over the full path:
|
||||
`HMAC-SHA256(secret, "/<size>/<host>/<path>?format=<format>")`
|
||||
- **No signature expiration** - Signatures should include a timestamp to
|
||||
prevent indefinite replay. Add `&expires=<unix_ts>` and include it in the
|
||||
HMAC input
|
||||
- **Path traversal risk** - Ensure `<orig path>` cannot contain `..`
|
||||
sequences or be used to access unintended resources on origin
|
||||
- **SSRF potential** - Even with TLS requirement, internal/private IPs
|
||||
(10.x, 172.16.x, 192.168.x, 127.x, ::1, link-local) must be blocked to
|
||||
prevent server-side request forgery
|
||||
- **Open redirect via Host header** - Validate that requests cannot be
|
||||
manipulated to cache content under incorrect keys
|
||||
|
||||
### Important
|
||||
- **No authentication for cache purge** - If cache invalidation is needed, it requires auth
|
||||
- **Response header sanitization** - Strip sensitive headers from upstream before forwarding (X-Powered-By, Server, etc.)
|
||||
- **Content-Type validation** - Verify upstream Content-Type matches expected image types before processing
|
||||
- **Maximum image dimensions** - Limit output dimensions to prevent resource exhaustion (e.g., max 4096x4096)
|
||||
|
||||
## URL Route Improvements
|
||||
|
||||
Current: `/img/<size>/<orig host>/<orig path>?signature=<sig>&format=<format>`
|
||||
|
||||
### Recommended Scheme
|
||||
```
|
||||
/v1/image/<host>/<path>/<width>x<height>.<format>?sig=<sig>&exp=<expires>
|
||||
```
|
||||
|
||||
The size+format segment (e.g., `800x600.webp`) is appended to the source path and stripped when constructing the upstream request. This pattern is unambiguous (regex: `(\d+x\d+|orig)\.(webp|jpg|jpeg|png|avif)$`) and won't collide with real paths.
|
||||
|
||||
**Size options:**
|
||||
- `800x600.<format>` - resize to 800x600
|
||||
- `0x0.<format>` - original size, format conversion only
|
||||
- `orig.<format>` - original size, format conversion only (human-friendly alias)
|
||||
|
||||
**Benefits:**
|
||||
- API versioning (`/v1/`) allows breaking changes later
|
||||
- Human-readable URLs that can be manually constructed for whitelisted domains
|
||||
- Format as extension is intuitive and CDN-friendly
|
||||
|
||||
### Examples
|
||||
|
||||
**Basic resize and convert:**
|
||||
```
|
||||
/v1/image/cdn.example.com/photos/cat.jpg/800x600.webp?sig=abc123&exp=1704067200
|
||||
```
|
||||
Fetches `https://cdn.example.com/photos/cat.jpg`, resizes to 800x600, converts to webp.
|
||||
|
||||
**Source URL with query parameters:**
|
||||
```
|
||||
/v1/image/cdn.example.com/photos/cat.jpg%3Farg1=val1%26arg2=val2/800x600.webp?sig=abc123&exp=1704067200
|
||||
```
|
||||
Fetches `https://cdn.example.com/photos/cat.jpg?arg1=val1&arg2=val2`, resizes to 800x600, converts to webp.
|
||||
|
||||
Note: The source query string must be URL-encoded (`?` → `%3F`, `&` → `%26`) to avoid ambiguity with pixa's own query parameters.
|
||||
|
||||
**Original size, format conversion only:**
|
||||
```
|
||||
/v1/image/cdn.example.com/photos/cat.jpg/orig.webp?sig=abc123&exp=1704067200
|
||||
/v1/image/cdn.example.com/photos/cat.jpg/0x0.webp?sig=abc123&exp=1704067200
|
||||
```
|
||||
Both fetch the original image and convert to webp without resizing.
|
||||
|
||||
## Additional Formats
|
||||
|
||||
### Output Formats to Support
|
||||
- `avif` - Superior compression, growing browser support
|
||||
- `gif` - For animated image passthrough (with frame limit)
|
||||
- `svg` - Passthrough only, no resizing (vector)
|
||||
|
||||
### Input Format Whitelist (MIME types to accept)
|
||||
- `image/jpeg`
|
||||
- `image/png`
|
||||
- `image/webp`
|
||||
- `image/gif`
|
||||
- `image/avif`
|
||||
- `image/svg+xml` (passthrough or rasterize)
|
||||
- **Reject all others** - Especially `image/x-*`, `application/*`
|
||||
|
||||
### Input Validation
|
||||
- Verify magic bytes match declared Content-Type
|
||||
- Maximum input file size (e.g., 50MB)
|
||||
- Maximum input dimensions (e.g., 16384x16384)
|
||||
- Reject files with embedded scripts (SVG sanitization)
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
### Per-IP Limits
|
||||
- Requests per second (e.g., 10 req/s burst, 100 req/min sustained)
|
||||
- Concurrent connections (e.g., 50 per IP)
|
||||
|
||||
### Global Limits
|
||||
- Total concurrent upstream fetches (prevent origin overwhelm)
|
||||
- Per-origin fetch rate limiting (be a good citizen)
|
||||
- Cache miss rate limiting (prevent cache-busting attacks)
|
||||
|
||||
### Response
|
||||
- Return `429 Too Many Requests` with `Retry-After` header
|
||||
- Consider `X-RateLimit-*` headers for transparency
|
||||
|
||||
## Additional Features for 1.0
|
||||
|
||||
### Must Have
|
||||
- **Health check endpoint** - `/health` or `/healthz` for load balancers
|
||||
- **Metrics endpoint** - `/metrics` (Prometheus format) for observability
|
||||
- **Graceful shutdown** - Drain connections on SIGTERM
|
||||
- **Request ID/tracing** - `X-Request-ID` header propagation
|
||||
- **Cache-Control headers** - Proper `Cache-Control`, `ETag`, `Last-Modified` on responses
|
||||
- **Vary header** - `Vary: Accept` if doing content negotiation
|
||||
|
||||
### Should Have
|
||||
- **Auto-format selection** - If `format=auto`, pick best format based on `Accept` header
|
||||
- **Quality parameter** - `&q=85` for lossy format quality control
|
||||
- **Fit modes** - `fit=cover|contain|fill|inside|outside` for resize behavior
|
||||
- **Background color** - For transparent-to-JPEG conversion
|
||||
- **Blur/sharpen** - Common post-resize operations
|
||||
- **Watermarking** - Optional overlay support
|
||||
|
||||
### Nice to Have
|
||||
- **Cache warming API** - Pre-populate cache for known images
|
||||
- **Cache stats API** - Hit/miss rates, storage usage
|
||||
- **Admin UI** - Simple dashboard for monitoring
|
||||
|
||||
## Configuration Additions
|
||||
|
||||
```yaml
|
||||
server:
|
||||
listen: ":8080"
|
||||
read_timeout: 30s
|
||||
write_timeout: 60s
|
||||
max_header_bytes: 8192
|
||||
|
||||
cache:
|
||||
directory: "/var/cache/pixa"
|
||||
max_size_gb: 100
|
||||
ttl: 168h # 7 days
|
||||
negative_ttl: 5m # Cache 404s briefly
|
||||
|
||||
upstream:
|
||||
timeout: 30s
|
||||
max_response_size: 52428800 # 50MB
|
||||
max_concurrent: 100
|
||||
user_agent: "Pixa/1.0"
|
||||
|
||||
processing:
|
||||
max_input_pixels: 268435456 # 16384x16384
|
||||
max_output_dimension: 4096
|
||||
default_quality: 85
|
||||
strip_metadata: true # Remove EXIF etc.
|
||||
|
||||
security:
|
||||
hmac_secret: "${PIXA_HMAC_SECRET}" # From env
|
||||
signature_ttl: 3600 # 1 hour
|
||||
blocked_networks:
|
||||
- "10.0.0.0/8"
|
||||
- "172.16.0.0/12"
|
||||
- "192.168.0.0/16"
|
||||
- "127.0.0.0/8"
|
||||
- "::1/128"
|
||||
- "fc00::/7"
|
||||
|
||||
rate_limit:
|
||||
per_ip_rps: 10
|
||||
per_ip_burst: 50
|
||||
per_origin_rps: 100
|
||||
|
||||
cors:
|
||||
allowed_origins: ["*"] # Or specific list
|
||||
allowed_methods: ["GET", "HEAD", "OPTIONS"]
|
||||
max_age: 86400
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### HTTP Status Codes
|
||||
- `400` - Bad request (invalid parameters, malformed URL)
|
||||
- `403` - Forbidden (invalid/expired signature, blocked origin)
|
||||
- `404` - Origin returned 404 (cache negative response briefly)
|
||||
- `413` - Payload too large (origin image exceeds limits)
|
||||
- `415` - Unsupported media type (origin returned non-image)
|
||||
- `422` - Unprocessable (valid image but cannot transform as requested)
|
||||
- `429` - Rate limited
|
||||
- `500` - Internal error
|
||||
- `502` - Bad gateway (origin connection failed)
|
||||
- `503` - Service unavailable (overloaded)
|
||||
- `504` - Gateway timeout (origin timeout)
|
||||
|
||||
### Error Response Format
|
||||
```json
|
||||
{
|
||||
"error": "invalid_signature",
|
||||
"message": "Signature has expired",
|
||||
"request_id": "abc123"
|
||||
}
|
||||
```
|
||||
|
||||
## Quick Wins
|
||||
|
||||
1. **Conditional requests** - Support `If-None-Match` / `If-Modified-Since` to return `304 Not Modified`
|
||||
2. **HEAD support** - Allow clients to check image metadata without downloading
|
||||
3. **Canonical URLs** - Redirect non-canonical requests to prevent cache fragmentation
|
||||
4. **Debug header** - `X-Pixa-Cache: HIT|MISS|STALE` for debugging
|
||||
5. **Robots.txt** - Serve a robots.txt to prevent search engine crawling of proxy URLs
|
||||
Loading…
Reference in New Issue
Block a user