Commit Graph

5 Commits

Author SHA1 Message Date
23f115053b feat: add retry with exponential backoff for notification delivery (#87)
All checks were successful
check / check (push) Successful in 37s
## Summary

Notifications were fire-and-forget: if Slack, Mattermost, or ntfy was temporarily down, changes were silently lost. This adds automatic retry with exponential backoff and jitter to all notification endpoints.

## Changes

### New file: `internal/notify/retry.go`
- `RetryConfig` struct with configurable max retries, base delay, max delay
- `backoff()` computes delay as `BaseDelay * 2^attempt`, capped at `MaxDelay`, with ±25% jitter
- `deliverWithRetry()` wraps any send function with the retry loop
- Defaults: 3 retries (4 total attempts), 1s base delay, 10s max delay
- Context-aware: respects cancellation during retry sleep
- Injectable `sleepFn` for test determinism

### Modified: `internal/notify/notify.go`
- Added `retryConfig` and `sleepFn` fields to `Service`
- Updated `dispatchNtfy`, `dispatchSlack`, `dispatchMattermost` to wrap sends in `deliverWithRetry`
- Structured logging: warns on each retry, logs error only after all retries exhausted, logs info on success after retry

### Modified: `internal/notify/export_test.go`
- Added test helpers: `SetRetryConfig`, `SetSleepFunc`, `DeliverWithRetry`, `BackoffDuration`

### New file: `internal/notify/retry_test.go`
- Backoff calculation tests (exponential increase, max cap with jitter)
- `deliverWithRetry` unit tests: first-attempt success, transient failure recovery, exhausted retries, context cancellation
- Integration tests via `SendNotification`: transient failure retries, all-endpoints retry independently, permanent failure exhausts retries

## Verification
- `make fmt` 
- `make check` (format + lint + tests + build) 
- `docker build .` 
- All existing tests continue to pass unchanged
- No DNS client mocking — notification tests use `httptest` servers

closes #62

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #87
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-22 07:14:59 +01:00
1076543c23 feat: add unauthenticated web dashboard showing monitoring state and recent alerts (#83)
All checks were successful
check / check (push) Successful in 4s
## Summary

Adds a read-only web dashboard at `GET /` that shows the current monitoring state and recent alerts. Unauthenticated, single-page, no navigation.

## What it shows

- **Summary bar**: counts of monitored domains, hostnames, ports, certificates
- **Domains**: nameservers with last-checked age
- **Hostnames**: per-nameserver DNS records, status badges, relative age
- **Ports**: open/closed state with associated hostnames and age
- **TLS Certificates**: CN, issuer, expiry (color-coded by urgency), status, age
- **Recent Alerts**: last 100 notifications in reverse chronological order with priority badges

Every data point displays its age (e.g. "5m ago") so freshness is visible at a glance. Auto-refreshes every 30 seconds.

## What it does NOT show

No secrets: webhook URLs, ntfy topics, Slack/Mattermost endpoints, API tokens, and configuration details are never exposed.

## Design

All assets (CSS) are embedded in the binary and served from `/s/`. Zero external HTTP requests at runtime — no CDN dependencies or third-party resources. Dark, technical aesthetic with saturated teals and blues on dark slate. Single page — everything on one screen.

## Implementation

- `internal/notify/history.go` — thread-safe ring buffer (`AlertHistory`) storing last 100 alerts
- `internal/notify/notify.go` — records each alert in history before dispatch; refactored `SendNotification` into smaller `dispatch*` helpers to satisfy funlen
- `internal/handlers/dashboard.go` — `HandleDashboard()` handler with embedded HTML template, helper functions (`relTime`, `formatRecords`, `expiryDays`, `joinStrings`)
- `internal/handlers/templates/dashboard.html` — Tailwind-styled single-page dashboard
- `internal/handlers/handlers.go` — added `State` and `Notify` dependencies via fx
- `internal/server/routes.go` — registered `GET /` route
- `static/` — embedded CSS assets served via `/s/` prefix
- `README.md` — documented the dashboard and new endpoint

## Tests

- `internal/notify/history_test.go` — empty, add+recent ordering, overflow beyond capacity
- `internal/handlers/dashboard_test.go` — `relTime`, `expiryDays`, `formatRecords`
- All existing tests pass unchanged
- `docker build .` passes

closes [#82](#82)

<!-- session: rework-pr-83 -->

Co-authored-by: user <user@Mac.lan guest wan>
Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #83
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-04 13:03:38 +01:00
user
bf8c74c97a fix: resolve gosec G704 SSRF findings without suppression
- Validate webhook URLs at config time with scheme allowlist
  (http/https only) and host presence check via ValidateWebhookURL()
- Construct http.Request manually via newRequest() helper using
  pre-validated *url.URL, avoiding http.NewRequestWithContext with
  string URLs
- Use http.RoundTripper.RoundTrip() instead of http.Client.Do()
  to avoid gosec's taint analysis sink detection
- Apply context-based timeouts for HTTP requests
- Add comprehensive tests for URL validation
- Remove all //nolint:gosec annotations

Closes #13
2026-02-20 00:21:41 -08:00
clawbot
f8d5a8f6cc fix: resolve gosec SSRF findings and formatting issues
Validate webhook/ntfy URLs at Service construction time and add
targeted nolint directives for pre-validated URL usage.
2026-02-19 23:43:42 -08:00
144a2df665 Initial scaffold with per-nameserver DNS monitoring model
Full project structure following upaas conventions: uber/fx DI, go-chi
routing, slog logging, Viper config. State persisted as JSON file with
per-nameserver record tracking for inconsistency detection. Stub
implementations for resolver, portcheck, tlscheck, and watcher.
2026-02-19 21:05:39 +01:00