feat: add observability improvements (metrics, audit log, structured logging)
All checks were successful
Check / check (pull_request) Successful in 1m45s

- Add Prometheus metrics package (internal/metrics) with deployment,
  container health, webhook, HTTP request, and audit counters/histograms
- Add audit_log SQLite table via migration 007
- Add AuditEntry model with CRUD operations and query methods
- Add audit service (internal/service/audit) for recording user actions
- Instrument deploy service with deployment duration, count, and
  in-flight metrics; container health gauge updates on deploy completion
- Instrument webhook service with event counters by app/type/matched
- Instrument HTTP middleware with request count, duration, and response
  size metrics; also log response bytes in structured request logs
- Add audit logging to all key handler operations: login/logout, app
  CRUD, deploy, cancel, rollback, restart/stop/start, webhook receipt,
  and initial setup
- Add GET /api/audit endpoint for querying recent audit entries
- Make /metrics endpoint always available (optionally auth-protected)
- Add comprehensive tests for metrics, audit model, and audit service
- Update existing test infrastructure with metrics and audit dependencies
- Update README with Observability section documenting all metrics,
  audit log, and structured logging
This commit is contained in:
clawbot
2026-03-17 02:23:44 -07:00
parent fd110e69db
commit f558e2cdd8
21 changed files with 1399 additions and 42 deletions

View File

@@ -36,11 +36,13 @@ upaas/
│ ├── handlers/ # HTTP request handlers
│ ├── healthcheck/ # Health status service
│ ├── logger/ # Structured logging (slog)
│ ├── middleware/ # HTTP middleware (auth, logging, CORS)
│ ├── metrics/ # Prometheus metrics registration
│ ├── middleware/ # HTTP middleware (auth, logging, CORS, metrics)
│ ├── models/ # Active Record style database models
│ ├── server/ # HTTP server and routes
│ ├── service/
│ │ ├── app/ # App management service
│ │ ├── audit/ # Audit logging service
│ │ ├── auth/ # Authentication service
│ │ ├── deploy/ # Deployment orchestration
│ │ ├── notify/ # Notifications (ntfy, Slack)
@@ -58,11 +60,13 @@ Uses Uber fx for dependency injection. Components are wired in this order:
2. `logger` - Structured logging
3. `config` - Configuration loading
4. `database` - SQLite connection + migrations
5. `healthcheck` - Health status
6. `auth` - Authentication service
7. `app` - App management
8. `docker` - Docker client
9. `notify` - Notification service
5. `metrics` - Prometheus metrics registration
6. `healthcheck` - Health status
7. `auth` - Authentication service
8. `app` - App management
9. `docker` - Docker client
10. `notify` - Notification service
11. `audit` - Audit logging service
10. `deploy` - Deployment service
11. `webhook` - Webhook processing
12. `middleware` - HTTP middleware
@@ -211,6 +215,48 @@ Example: `HOST_DATA_DIR=/srv/upaas/data docker compose up -d`
Session secrets are automatically generated on first startup and persisted to `$UPAAS_DATA_DIR/session.key`.
## Observability
### Prometheus Metrics
All custom metrics are exposed under the `upaas_` namespace at `/metrics`. The
endpoint is always available and can be optionally protected with basic auth via
`METRICS_USERNAME` and `METRICS_PASSWORD`.
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `upaas_deployments_total` | Counter | `app`, `status` | Total deployments (success/failed/cancelled) |
| `upaas_deployments_duration_seconds` | Histogram | `app`, `status` | Deployment duration |
| `upaas_deployments_in_flight` | Gauge | `app` | Currently running deployments |
| `upaas_container_healthy` | Gauge | `app` | Container health (1=healthy, 0=unhealthy) |
| `upaas_webhook_events_total` | Counter | `app`, `event_type`, `matched` | Webhook events received |
| `upaas_http_requests_total` | Counter | `method`, `status_code` | HTTP requests |
| `upaas_http_request_duration_seconds` | Histogram | `method` | HTTP request latency |
| `upaas_http_response_size_bytes` | Histogram | `method` | HTTP response sizes |
| `upaas_audit_events_total` | Counter | `action` | Audit log events |
### Audit Log
All user-facing actions are recorded in an `audit_log` SQLite table with:
- **Who**: user ID and username
- **What**: action type and affected resource (app, deployment, session, etc.)
- **Where**: client IP (via X-Real-IP/X-Forwarded-For/RemoteAddr)
- **When**: timestamp
Audited actions include login/logout, app CRUD, deployments, container
start/stop/restart, rollbacks, deployment cancellation, and webhook receipt.
The audit log is available via the API at `GET /api/audit?limit=N` (max 500,
default 50).
### Structured Logging
All operations use Go's `slog` structured logger. HTTP requests are logged with
method, URL, status code, response size, latency, user agent, and client IP.
Deployment events are logged with app name, status, and duration. Audit events
are also logged to stdout for correlation with external log aggregators.
## License
WTFPL