9 Commits

Author SHA1 Message Date
clawbot
31bd6c3228 feat: add retry with exponential backoff for notification delivery
All checks were successful
check / check (push) Successful in 42s
Notifications were fire-and-forget: if Slack, Mattermost, or ntfy was
temporarily down, changes were silently lost. This adds automatic retry
with exponential backoff and jitter to all notification endpoints.

Implementation:
- New retry.go with configurable RetryConfig (max retries, base delay,
  max delay) and exponential backoff with ±25% jitter
- Each dispatch goroutine now wraps its send call in deliverWithRetry
- Default: 3 retries (4 total attempts), 1s base delay, 10s max delay
- Context-aware: respects cancellation during retry sleep
- Structured logging on each retry attempt and on final success after
  retry

All existing tests continue to pass. New tests cover:
- Backoff calculation (increase, cap)
- Retry success on first attempt (no unnecessary retries)
- Retry on transient failure (succeeds after N attempts)
- Exhausted retries (returns last error)
- Context cancellation during retry sleep
- Integration: SendNotification retries transient 500s
- Integration: all three endpoints retry independently
- Integration: permanent failure exhausts retries

closes #62
2026-03-10 11:11:32 -07:00
b64db3e10f feat: enhance /api/v1/status endpoint with full monitoring data (#86)
All checks were successful
check / check (push) Successful in 1m27s
## Summary

Enhances the `/api/v1/status` endpoint to return comprehensive monitoring state instead of just `{"status": "ok"}`.

## Changes

The endpoint now returns:

- **Summary counts**: domains, hostnames, ports (total + open), certificates (total + ok + error)
- **Domains**: each monitored domain with its discovered nameservers and last check timestamp
- **Hostnames**: each monitored hostname with per-nameserver DNS records, status, and last check timestamps
- **Ports**: each monitored IP:port with open/closed state, associated hostnames, and last check timestamp
- **Certificates**: each TLS certificate with CN, issuer, expiry, SANs, status, and last check timestamp
- **Last updated**: timestamp of the overall monitoring state

All data is derived from the existing `state.GetSnapshot()`, consistent with how the dashboard works. No configuration details (webhook URLs, API tokens) are exposed.

## Example response structure

```json
{
  "status": "ok",
  "lastUpdated": "2026-03-10T12:00:00Z",
  "counts": {
    "domains": 2,
    "hostnames": 3,
    "ports": 10,
    "portsOpen": 8,
    "certificates": 4,
    "certificatesOk": 3,
    "certificatesError": 1
  },
  "domains": { ... },
  "hostnames": { ... },
  "ports": { ... },
  "certificates": { ... }
}
```

closes #73

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #86
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-10 12:20:11 +01:00
65180ad661 feat: add DNSWATCHER_SEND_TEST_NOTIFICATION env var (#85)
All checks were successful
check / check (push) Successful in 5s
When set to a truthy value, sends a startup status notification to all configured notification channels after the first full scan completes on application startup. The notification is clearly an all-ok/success message showing the number of monitored domains, hostnames, ports, and certificates.

Changes:
- Added `SendTestNotification` config field reading `DNSWATCHER_SEND_TEST_NOTIFICATION`
- Added `maybeSendTestNotification()` in watcher, called after initial `RunOnce` in `Run`
- Added 3 watcher tests (enabled via Run, enabled via RunOnce alone, disabled)
- Added config tests for the new field
- Updated README: env var table, example .env, Docker example

Closes #84

Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #85
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-04 21:41:55 +01:00
1076543c23 feat: add unauthenticated web dashboard showing monitoring state and recent alerts (#83)
All checks were successful
check / check (push) Successful in 4s
## Summary

Adds a read-only web dashboard at `GET /` that shows the current monitoring state and recent alerts. Unauthenticated, single-page, no navigation.

## What it shows

- **Summary bar**: counts of monitored domains, hostnames, ports, certificates
- **Domains**: nameservers with last-checked age
- **Hostnames**: per-nameserver DNS records, status badges, relative age
- **Ports**: open/closed state with associated hostnames and age
- **TLS Certificates**: CN, issuer, expiry (color-coded by urgency), status, age
- **Recent Alerts**: last 100 notifications in reverse chronological order with priority badges

Every data point displays its age (e.g. "5m ago") so freshness is visible at a glance. Auto-refreshes every 30 seconds.

## What it does NOT show

No secrets: webhook URLs, ntfy topics, Slack/Mattermost endpoints, API tokens, and configuration details are never exposed.

## Design

All assets (CSS) are embedded in the binary and served from `/s/`. Zero external HTTP requests at runtime — no CDN dependencies or third-party resources. Dark, technical aesthetic with saturated teals and blues on dark slate. Single page — everything on one screen.

## Implementation

- `internal/notify/history.go` — thread-safe ring buffer (`AlertHistory`) storing last 100 alerts
- `internal/notify/notify.go` — records each alert in history before dispatch; refactored `SendNotification` into smaller `dispatch*` helpers to satisfy funlen
- `internal/handlers/dashboard.go` — `HandleDashboard()` handler with embedded HTML template, helper functions (`relTime`, `formatRecords`, `expiryDays`, `joinStrings`)
- `internal/handlers/templates/dashboard.html` — Tailwind-styled single-page dashboard
- `internal/handlers/handlers.go` — added `State` and `Notify` dependencies via fx
- `internal/server/routes.go` — registered `GET /` route
- `static/` — embedded CSS assets served via `/s/` prefix
- `README.md` — documented the dashboard and new endpoint

## Tests

- `internal/notify/history_test.go` — empty, add+recent ordering, overflow beyond capacity
- `internal/handlers/dashboard_test.go` — `relTime`, `expiryDays`, `formatRecords`
- All existing tests pass unchanged
- `docker build .` passes

closes [#82](#82)

<!-- session: rework-pr-83 -->

Co-authored-by: user <user@Mac.lan guest wan>
Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #83
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-04 13:03:38 +01:00
1843d09eb3 test(notify): add comprehensive tests for notification delivery (#79)
All checks were successful
check / check (push) Successful in 50s
## Summary

Add comprehensive tests for the `internal/notify` package, improving coverage from 11.1% to 80.0%.

Closes [issue #71](#71).

## What was added

### `delivery_test.go` — 28 new test functions

**Priority mapping tests:**
- `TestNtfyPriority` — all priority levels (error→urgent, warning→high, success→default, info→low, unknown→default)
- `TestSlackColor` — all color mappings including default fallback

**Request construction:**
- `TestNewRequest` — method, URL, host, headers, body
- `TestNewRequestPreservesContext` — context propagation

**ntfy delivery (`sendNtfy`):**
- `TestSendNtfyHeaders` — Title, Priority headers, POST body content
- `TestSendNtfyAllPriorities` — end-to-end header verification for all priority levels
- `TestSendNtfyClientError` — 403 returns `ErrNtfyFailed`
- `TestSendNtfyServerError` — 500 returns `ErrNtfyFailed`
- `TestSendNtfySuccess` — 200 OK succeeds
- `TestSendNtfyNetworkError` — transport failure handling

**Slack/Mattermost delivery (`sendSlack`):**
- `TestSendSlackPayloadFields` — JSON payload structure, Content-Type header, attachment fields
- `TestSendSlackAllColors` — color mapping for all priorities
- `TestSendSlackClientError` — 400 returns `ErrSlackFailed`
- `TestSendSlackServerError` — 502 returns `ErrSlackFailed`
- `TestSendSlackNetworkError` — transport failure handling

**`SendNotification` goroutine dispatch:**
- `TestSendNotificationAllEndpoints` — all three endpoints receive notifications concurrently
- `TestSendNotificationNoWebhooks` — no-op when no endpoints configured
- `TestSendNotificationNtfyOnly` — ntfy-only dispatch
- `TestSendNotificationSlackOnly` — slack-only dispatch
- `TestSendNotificationMattermostOnly` — mattermost-only dispatch
- `TestSendNotificationNtfyError` — error logging path (no panic)
- `TestSendNotificationSlackError` — error logging path (no panic)
- `TestSendNotificationMattermostError` — error logging path (no panic)

**Payload marshaling:**
- `TestSlackPayloadJSON` — round-trip marshal/unmarshal
- `TestSlackPayloadEmptyAttachments` — `omitempty` behavior

### `export_test.go` — test bridge

Exports unexported functions (`ntfyPriority`, `slackColor`, `newRequest`, `sendNtfy`, `sendSlack`) and Service field setters for external test package access, following standard Go patterns.

## Coverage

| Function | Before | After |
|---|---|---|
| `IsAllowedScheme` | 100% | 100% |
| `ValidateWebhookURL` | 100% | 100% |
| `newRequest` | 0% | 100% |
| `SendNotification` | 0% | 100% |
| `sendNtfy` | 0% | 100% |
| `ntfyPriority` | 0% | 100% |
| `sendSlack` | 0% | 94.1% |
| `slackColor` | 0% | 100% |
| **Total** | **11.1%** | **80.0%** |

The remaining 20% is the `New()` constructor (requires fx wiring) and one unreachable `json.Marshal` error path in `sendSlack`.

## Testing approach

- `httptest.Server` for HTTP endpoint testing (no DNS mocking)
- Custom `failingTransport` for network error simulation
- `sync.Mutex`-protected captures for concurrent goroutine verification
- All tests are parallel

`docker build .` passes 

<!-- session: agent:sdlc-manager:subagent:6158e09a-aba4-4778-89ca-c12b22014ccd -->

Co-authored-by: user <user@Mac.lan guest wan>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #79
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-04 11:26:31 +01:00
c5bf16055e test(state): add comprehensive test coverage for internal/state package (#80)
Some checks failed
check / check (push) Has been cancelled
## Summary

Add 32 tests for the `internal/state` package, which previously had 0% test coverage.

### Tests added:

**Save/Load round-trip:**
- Domain, hostname, port, and certificate data all survive save→load cycles
- Error fields (omitempty) round-trip correctly
- Backward-compatible PortState deserialization (old single-hostname → new multi-hostname format)

**Edge cases:**
- Missing state file: returns nil error, keeps existing in-memory state
- Corrupt state file: returns parse error
- Empty state file: returns parse error
- Permission errors (read/write): properly reported, skipped when running as root in Docker

**Atomic write:**
- No leftover .tmp files after successful save
- Updated content verified after second save

**Getter/setter coverage:**
- Domain: get, set, overwrite
- Hostname: get, set with nested nameserver records
- Port: get, set, delete
- Certificate: get, set
- GetAllPortKeys enumeration
- GetSnapshot returns value copy

**Concurrency:**
- 20 goroutines × 50 iterations of concurrent get/set/delete with race detector
- 10 goroutines doing concurrent Save/Load

**Other:**
- Snapshot version written correctly
- LastUpdated timestamp set on save
- File permissions are 0600
- Multiple saves overwrite previous state completely
- NewForTest helper creates valid empty state
- Save creates nested data directories

Also adds `NewForTestWithDataDir()` to the test helper for tests requiring file persistence.

Closes [issue #70](#70)

<!-- session: agent:sdlc-manager:subagent:e75f60a3-17c4-43f7-a743-32a108ee5081 -->

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #80
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-04 11:26:05 +01:00
d6130e5892 test(config): add comprehensive tests for config loading path (#81)
All checks were successful
check / check (push) Successful in 4s
## Summary

Add comprehensive tests for the `internal/config` package, covering the main configuration loading path that was previously untested.

Closes [issue #72](#72)

## What Changed

Added three new test files:

- **`config_test.go`** — 16 tests covering `New()`, `StatePath()`, and the full config loading pipeline
- **`parsecsv_test.go`** — 10 test cases for `parseCSV()` edge cases
- **`export_test.go`** — standard Go export bridge for testing unexported `parseCSV`

## Test Coverage

| Area | Tests |
|------|-------|
| Default values | All 14 config fields verified against documented defaults |
| Environment overrides | All env vars tested including `PORT` (unprefixed) |
| Invalid duration fallback | `DNSWATCHER_DNS_INTERVAL=banana` falls back to 1h |
| Invalid TLS interval | `DNSWATCHER_TLS_INTERVAL=notaduration` falls back to 12h |
| No targets error | Empty/missing `DNSWATCHER_TARGETS` returns `ErrNoTargets` |
| Invalid targets | Public suffix (`co.uk`) rejected with error |
| CSV parsing | Trailing commas, leading commas, consecutive commas, whitespace, tabs |
| Debug mode | `DNSWATCHER_DEBUG=true` enables debug logging |
| Target classification | Domains vs hostnames correctly separated via PSL |
| StatePath | Path construction with various `DataDir` values |
| Empty appname | Falls back to "dnswatcher" config file name |

**Coverage: 23% → 92.5%**

## Notes

- Tests use `viper.Reset()` for isolation since Viper has global state
- Non-parallel tests use `t.Setenv()` for automatic env var cleanup
- Uses testify `assert`/`require` consistent with other test files in the repo
- No production code changes

<!-- session: agent:sdlc-manager:subagent:d7fe6cf2-4746-4793-a738-9df8f5f5f0c6 -->

Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #81
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-04 11:23:24 +01:00
0a74971ade docs: fix README inaccuracies found during QA audit (#74)
All checks were successful
check / check (push) Successful in 9s
## Summary

Fixes documentation inaccuracies in README.md identified during QA audit.

### Changes

**API table (closes #67):**
- Removed `GET /api/v1/domains` and `GET /api/v1/hostnames` from the HTTP API table. These endpoints are not implemented — the only routes in `internal/server/routes.go` are `/health`, `/api/v1/status`, and `/metrics` (conditional).

**Feature claims (closes #68):**
- Removed "Inconsistency resolved" from hostname monitoring features. `detectInconsistencies()` detects current inconsistencies but has no state tracking to detect when they resolve.
- Removed `nxdomain` and `nodata` from the state status values table. While the resolver defines these constants, `buildHostnameState()` in the watcher only ever sets status to `"ok"`. Failed queries set `"error"` via the NS disappearance path. These values are never written to state.
- Removed "Empty response" (NODATA/NXDOMAIN) detection claim. Changes are caught generically by `detectRecordChanges()`, not with specific NODATA/NXDOMAIN labeling.

### What was NOT changed

- "Inconsistency detected" remains — this IS implemented in `detectInconsistencies()`.
- All other feature claims were verified against the code and are accurate.
- No Go source code was modified.

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #74
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 08:40:42 +01:00
e882e7d237 feat: fail fast when no monitoring targets configured (#75)
Some checks failed
check / check (push) Failing after 46s
## Summary

When `DNSWATCHER_TARGETS` is empty (the default), dnswatcher previously started successfully and ran indefinitely monitoring nothing. This is a common misconfiguration — forgetting to set the variable or making a typo in its name — and gave no indication anything was wrong.

## Changes

- Added `ErrNoTargets` sentinel error in `internal/config/config.go`
- Extracted `parseAndValidateTargets()` helper to validate that at least one domain or hostname is configured after target classification
- If no targets are configured, dnswatcher now exits with a clear error: `"no monitoring targets configured: set DNSWATCHER_TARGETS environment variable"`
- Updated README.md to document that `DNSWATCHER_TARGETS` is required and dnswatcher will refuse to start without it

## How it works

The validation runs during config construction (via uber/fx), before the watcher or any other component starts. If `DNSWATCHER_TARGETS` is empty or contains only whitespace/empty entries, `buildConfig()` returns `ErrNoTargets`, which causes fx to fail startup with a clear error message.

This is fail-fast behavior: a monitoring daemon with nothing to monitor is a misconfiguration and should not silently run.

Closes #69

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #75
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 01:26:55 +01:00
31 changed files with 4869 additions and 322 deletions

View File

@@ -2,8 +2,7 @@
BINARY := dnswatcher
VERSION := $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
BUILDARCH := $(shell go env GOARCH)
LDFLAGS := -X main.Version=$(VERSION) -X main.Buildarch=$(BUILDARCH)
LDFLAGS := -X main.Version=$(VERSION)
all: check build

View File

@@ -52,10 +52,6 @@ without requiring an external database.
responding again.
- **Inconsistency detected**: Two nameservers that previously agreed
now return different record sets for the same hostname.
- **Inconsistency resolved**: Nameservers that previously disagreed
are now back in agreement.
- **Empty response**: A nameserver that previously returned records
now returns an authoritative empty response (NODATA/NXDOMAIN).
### TCP Port Monitoring
@@ -128,16 +124,42 @@ includes:
- State is written atomically (write to temp file, then rename) to prevent
corruption.
### Web Dashboard
dnswatcher includes an unauthenticated, read-only web dashboard at the
root URL (`/`). It displays:
- **Summary counts** for monitored domains, hostnames, ports, and
certificates.
- **Domains** with their discovered nameservers.
- **Hostnames** with per-nameserver DNS records and status.
- **Ports** with open/closed state and associated hostnames.
- **TLS certificates** with CN, issuer, expiry, and status.
- **Recent alerts** (last 100 notifications sent since the process
started), displayed in reverse chronological order.
Every data point shows its age (e.g. "5m ago") so you can tell at a
glance how fresh the information is. The page auto-refreshes every 30
seconds.
The dashboard intentionally does not expose any configuration details
such as webhook URLs, notification endpoints, or API tokens.
All assets (CSS) are embedded in the binary and served from the
application itself. The dashboard makes zero external HTTP requests —
no CDN dependencies or third-party resources are loaded at runtime.
### HTTP API
dnswatcher exposes a lightweight HTTP API for operational visibility:
| Endpoint | Description |
|---------------------------------------|--------------------------------|
| `GET /health` | Health check (JSON) |
| `GET /` | Web dashboard (HTML) |
| `GET /s/...` | Static assets (embedded CSS) |
| `GET /.well-known/healthcheck` | Health check (JSON) |
| `GET /health` | Health check (JSON, legacy) |
| `GET /api/v1/status` | Current monitoring state |
| `GET /api/v1/domains` | Configured domains and status |
| `GET /api/v1/hostnames` | Configured hostnames and status|
| `GET /metrics` | Prometheus metrics (optional) |
---
@@ -149,7 +171,7 @@ cmd/dnswatcher/main.go Entry point (uber/fx bootstrap)
internal/
config/config.go Viper-based configuration
globals/globals.go Build-time variables (version, arch)
globals/globals.go Build-time variables (version)
logger/logger.go slog structured logging (TTY detection)
healthcheck/healthcheck.go Health check service
middleware/middleware.go HTTP middleware (logging, CORS, metrics auth)
@@ -209,6 +231,13 @@ the following precedence (highest to lowest):
| `DNSWATCHER_MAINTENANCE_MODE` | Enable maintenance mode | `false` |
| `DNSWATCHER_METRICS_USERNAME` | Basic auth username for /metrics | `""` |
| `DNSWATCHER_METRICS_PASSWORD` | Basic auth password for /metrics | `""` |
| `DNSWATCHER_SEND_TEST_NOTIFICATION` | Send a test notification after first scan completes | `false` |
**`DNSWATCHER_TARGETS` is required.** dnswatcher will refuse to start if no
monitoring targets are configured. A monitoring daemon with nothing to monitor
is a misconfiguration, so dnswatcher fails fast with a clear error message
rather than running silently. Set `DNSWATCHER_TARGETS` to a comma-separated
list of DNS names before starting.
### Example `.env`
@@ -220,6 +249,7 @@ DNSWATCHER_TARGETS=example.com,example.org,www.example.com,api.example.com,mail.
DNSWATCHER_SLACK_WEBHOOK=https://hooks.slack.com/services/T.../B.../xxx
DNSWATCHER_MATTERMOST_WEBHOOK=https://mattermost.example.com/hooks/xxx
DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-dns-alerts
DNSWATCHER_SEND_TEST_NOTIFICATION=true
```
---
@@ -319,8 +349,6 @@ tracks reachability:
|-------------|-------------------------------------------------|
| `ok` | Query succeeded, records are current |
| `error` | Query failed (timeout, SERVFAIL, network error) |
| `nxdomain` | Authoritative NXDOMAIN response |
| `nodata` | Authoritative empty response (NODATA) |
---
@@ -337,11 +365,10 @@ make clean # Remove build artifacts
### Build-Time Variables
Version and architecture are injected via `-ldflags`:
Version is injected via `-ldflags`:
```sh
go build -ldflags "-X main.Version=$(git describe --tags --always) \
-X main.Buildarch=$(go env GOARCH)" ./cmd/dnswatcher
go build -ldflags "-X main.Version=$(git describe --tags --always)" ./cmd/dnswatcher
```
---
@@ -355,6 +382,7 @@ docker run -d \
-v dnswatcher-data:/var/lib/dnswatcher \
-e DNSWATCHER_TARGETS=example.com,www.example.com \
-e DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-alerts \
-e DNSWATCHER_SEND_TEST_NOTIFICATION=true \
dnswatcher
```

View File

@@ -27,13 +27,11 @@ import (
var (
Appname = "dnswatcher"
Version string
Buildarch string
)
func main() {
globals.SetAppname(Appname)
globals.SetVersion(Version)
globals.SetBuildarch(Buildarch)
fx.New(
fx.Provide(

View File

@@ -23,6 +23,11 @@ const (
defaultTLSExpiryWarning = 7
)
// ErrNoTargets is returned when no monitoring targets are configured.
var ErrNoTargets = errors.New(
"no monitoring targets configured: set DNSWATCHER_TARGETS environment variable",
)
// Params contains dependencies for Config.
type Params struct {
fx.In
@@ -48,6 +53,7 @@ type Config struct {
MaintenanceMode bool
MetricsUsername string
MetricsPassword string
SendTestNotification bool
params *Params
log *slog.Logger
}
@@ -100,6 +106,7 @@ func setupViper(name string) {
viper.SetDefault("MAINTENANCE_MODE", false)
viper.SetDefault("METRICS_USERNAME", "")
viper.SetDefault("METRICS_PASSWORD", "")
viper.SetDefault("SEND_TEST_NOTIFICATION", false)
}
func buildConfig(
@@ -132,11 +139,9 @@ func buildConfig(
tlsInterval = defaultTLSInterval
}
domains, hostnames, err := ClassifyTargets(
parseCSV(viper.GetString("TARGETS")),
)
domains, hostnames, err := parseAndValidateTargets()
if err != nil {
return nil, fmt.Errorf("invalid targets configuration: %w", err)
return nil, err
}
cfg := &Config{
@@ -155,6 +160,7 @@ func buildConfig(
MaintenanceMode: viper.GetBool("MAINTENANCE_MODE"),
MetricsUsername: viper.GetString("METRICS_USERNAME"),
MetricsPassword: viper.GetString("METRICS_PASSWORD"),
SendTestNotification: viper.GetBool("SEND_TEST_NOTIFICATION"),
params: params,
log: log,
}
@@ -162,6 +168,23 @@ func buildConfig(
return cfg, nil
}
func parseAndValidateTargets() ([]string, []string, error) {
domains, hostnames, err := ClassifyTargets(
parseCSV(viper.GetString("TARGETS")),
)
if err != nil {
return nil, nil, fmt.Errorf(
"invalid targets configuration: %w", err,
)
}
if len(domains) == 0 && len(hostnames) == 0 {
return nil, nil, ErrNoTargets
}
return domains, hostnames, nil
}
func parseCSV(input string) []string {
if input == "" {
return nil

View File

@@ -0,0 +1,262 @@
package config_test
import (
"testing"
"time"
"github.com/spf13/viper"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/dnswatcher/internal/config"
"sneak.berlin/go/dnswatcher/internal/globals"
"sneak.berlin/go/dnswatcher/internal/logger"
)
// newTestParams creates config.Params suitable for testing
// without requiring the fx dependency injection framework.
func newTestParams(t *testing.T) config.Params {
t.Helper()
g := &globals.Globals{
Appname: "dnswatcher",
Version: "test",
}
l, err := logger.New(nil, logger.Params{Globals: g})
require.NoError(t, err, "failed to create logger")
return config.Params{
Globals: g,
Logger: l,
}
}
// These tests exercise viper global state and MUST NOT use
// t.Parallel(). Each test resets viper for isolation.
func TestNew_DefaultValues(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com,www.example.com")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 8080, cfg.Port)
assert.False(t, cfg.Debug)
assert.Equal(t, "./data", cfg.DataDir)
assert.Equal(t, time.Hour, cfg.DNSInterval)
assert.Equal(t, 12*time.Hour, cfg.TLSInterval)
assert.Equal(t, 7, cfg.TLSExpiryWarning)
assert.False(t, cfg.MaintenanceMode)
assert.Empty(t, cfg.SlackWebhook)
assert.Empty(t, cfg.MattermostWebhook)
assert.Empty(t, cfg.NtfyTopic)
assert.Empty(t, cfg.SentryDSN)
assert.Empty(t, cfg.MetricsUsername)
assert.Empty(t, cfg.MetricsPassword)
assert.False(t, cfg.SendTestNotification)
}
func TestNew_EnvironmentOverrides(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("PORT", "9090")
t.Setenv("DNSWATCHER_DEBUG", "true")
t.Setenv("DNSWATCHER_DATA_DIR", "/tmp/test-data")
t.Setenv("DNSWATCHER_DNS_INTERVAL", "30m")
t.Setenv("DNSWATCHER_TLS_INTERVAL", "6h")
t.Setenv("DNSWATCHER_TLS_EXPIRY_WARNING", "14")
t.Setenv("DNSWATCHER_SLACK_WEBHOOK", "https://hooks.slack.com/t")
t.Setenv("DNSWATCHER_MATTERMOST_WEBHOOK", "https://mm.test/hooks/t")
t.Setenv("DNSWATCHER_NTFY_TOPIC", "https://ntfy.sh/test")
t.Setenv("DNSWATCHER_SENTRY_DSN", "https://sentry.test/1")
t.Setenv("DNSWATCHER_MAINTENANCE_MODE", "true")
t.Setenv("DNSWATCHER_METRICS_USERNAME", "admin")
t.Setenv("DNSWATCHER_METRICS_PASSWORD", "secret")
t.Setenv("DNSWATCHER_SEND_TEST_NOTIFICATION", "true")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 9090, cfg.Port)
assert.True(t, cfg.Debug)
assert.Equal(t, "/tmp/test-data", cfg.DataDir)
assert.Equal(t, 30*time.Minute, cfg.DNSInterval)
assert.Equal(t, 6*time.Hour, cfg.TLSInterval)
assert.Equal(t, 14, cfg.TLSExpiryWarning)
assert.Equal(t, "https://hooks.slack.com/t", cfg.SlackWebhook)
assert.Equal(t, "https://mm.test/hooks/t", cfg.MattermostWebhook)
assert.Equal(t, "https://ntfy.sh/test", cfg.NtfyTopic)
assert.Equal(t, "https://sentry.test/1", cfg.SentryDSN)
assert.True(t, cfg.MaintenanceMode)
assert.Equal(t, "admin", cfg.MetricsUsername)
assert.Equal(t, "secret", cfg.MetricsPassword)
assert.True(t, cfg.SendTestNotification)
}
func TestNew_NoTargetsError(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "")
_, err := config.New(nil, newTestParams(t))
require.Error(t, err)
assert.ErrorIs(t, err, config.ErrNoTargets)
}
func TestNew_OnlyEmptyCSVSegments(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", " , , ")
_, err := config.New(nil, newTestParams(t))
require.Error(t, err)
assert.ErrorIs(t, err, config.ErrNoTargets)
}
func TestNew_InvalidDNSInterval_FallsBackToDefault(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DNS_INTERVAL", "banana")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, time.Hour, cfg.DNSInterval,
"invalid DNS interval should fall back to 1h default")
}
func TestNew_InvalidTLSInterval_FallsBackToDefault(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_TLS_INTERVAL", "notaduration")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 12*time.Hour, cfg.TLSInterval,
"invalid TLS interval should fall back to 12h default")
}
func TestNew_BothIntervalsInvalid(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DNS_INTERVAL", "xyz")
t.Setenv("DNSWATCHER_TLS_INTERVAL", "abc")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, time.Hour, cfg.DNSInterval)
assert.Equal(t, 12*time.Hour, cfg.TLSInterval)
}
func TestNew_DebugEnablesDebugLogging(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DEBUG", "true")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.True(t, cfg.Debug)
}
func TestNew_PortEnvNotPrefixed(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("PORT", "3000")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 3000, cfg.Port,
"PORT env should work without DNSWATCHER_ prefix")
}
func TestNew_TargetClassification(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS",
"example.com,www.example.com,api.example.com,example.org")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
// example.com and example.org are apex domains
assert.Len(t, cfg.Domains, 2)
// www.example.com and api.example.com are hostnames
assert.Len(t, cfg.Hostnames, 2)
}
func TestNew_InvalidTargetPublicSuffix(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "co.uk")
_, err := config.New(nil, newTestParams(t))
require.Error(t, err, "public suffix should be rejected")
}
func TestNew_EmptyAppnameDefaultsToDnswatcher(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
g := &globals.Globals{Appname: "", Version: "test"}
l, err := logger.New(nil, logger.Params{Globals: g})
require.NoError(t, err)
cfg, err := config.New(
nil, config.Params{Globals: g, Logger: l},
)
require.NoError(t, err)
assert.Equal(t, 8080, cfg.Port,
"defaults should load when appname is empty")
}
func TestNew_TargetsWithWhitespace(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", " example.com , www.example.com ")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 2, len(cfg.Domains)+len(cfg.Hostnames),
"whitespace around targets should be trimmed")
}
func TestNew_TargetsWithTrailingComma(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com,www.example.com,")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 2, len(cfg.Domains)+len(cfg.Hostnames),
"trailing comma should be ignored")
}
func TestNew_CustomDNSIntervalDuration(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DNS_INTERVAL", "5s")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 5*time.Second, cfg.DNSInterval)
}
func TestStatePath(t *testing.T) {
t.Parallel()
tests := []struct {
name string
dataDir string
want string
}{
{"default", "./data", "./data/state.json"},
{"absolute", "/var/lib/dw", "/var/lib/dw/state.json"},
{"nested", "/opt/app/data", "/opt/app/data/state.json"},
{"empty", "", "/state.json"},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
cfg := &config.Config{DataDir: tt.dataDir}
assert.Equal(t, tt.want, cfg.StatePath())
})
}
}

View File

@@ -0,0 +1,6 @@
package config
// ParseCSVForTest exports parseCSV for use in external tests.
func ParseCSVForTest(input string) []string {
return parseCSV(input)
}

View File

@@ -0,0 +1,44 @@
package config_test
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/dnswatcher/internal/config"
)
func TestParseCSV(t *testing.T) {
t.Parallel()
tests := []struct {
name string
input string
want []string
}{
{"empty string", "", nil},
{"single value", "a", []string{"a"}},
{"multiple values", "a,b,c", []string{"a", "b", "c"}},
{"whitespace trimmed", " a , b ", []string{"a", "b"}},
{"trailing comma", "a,b,", []string{"a", "b"}},
{"leading comma", ",a,b", []string{"a", "b"}},
{"consecutive commas", "a,,b", []string{"a", "b"}},
{"all empty segments", ",,,", nil},
{"whitespace only", " , , ", nil},
{"tabs", "\ta\t,\tb\t", []string{"a", "b"}},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
got := config.ParseCSVForTest(tt.input)
require.Len(t, got, len(tt.want))
for i, w := range tt.want {
assert.Equal(t, w, got[i])
}
})
}
}

View File

@@ -15,14 +15,12 @@ var (
mu sync.RWMutex
appname string
version string
buildarch string
)
// Globals holds build-time variables for dependency injection.
type Globals struct {
Appname string
Version string
Buildarch string
}
// New creates a new Globals instance from package-level variables.
@@ -33,7 +31,6 @@ func New(_ fx.Lifecycle) (*Globals, error) {
return &Globals{
Appname: appname,
Version: version,
Buildarch: buildarch,
}, nil
}
@@ -52,11 +49,3 @@ func SetVersion(ver string) {
version = ver
}
// SetBuildarch sets the build architecture.
func SetBuildarch(arch string) {
mu.Lock()
defer mu.Unlock()
buildarch = arch
}

View File

@@ -0,0 +1,151 @@
package handlers
import (
"embed"
"fmt"
"html/template"
"math"
"net/http"
"strings"
"time"
"sneak.berlin/go/dnswatcher/internal/notify"
"sneak.berlin/go/dnswatcher/internal/state"
)
//go:embed templates/dashboard.html
var dashboardFS embed.FS
// Time unit constants for relative time calculations.
const (
secondsPerMinute = 60
minutesPerHour = 60
hoursPerDay = 24
)
// newDashboardTemplate parses the embedded dashboard HTML
// template with helper functions.
func newDashboardTemplate() *template.Template {
funcs := template.FuncMap{
"relTime": relTime,
"joinStrings": joinStrings,
"formatRecords": formatRecords,
"expiryDays": expiryDays,
}
return template.Must(
template.New("dashboard.html").
Funcs(funcs).
ParseFS(dashboardFS, "templates/dashboard.html"),
)
}
// dashboardData is the data passed to the dashboard template.
type dashboardData struct {
Snapshot state.Snapshot
Alerts []notify.AlertEntry
StateAge string
GeneratedAt string
}
// HandleDashboard returns the dashboard page handler.
func (h *Handlers) HandleDashboard() http.HandlerFunc {
tmpl := newDashboardTemplate()
return func(
writer http.ResponseWriter,
_ *http.Request,
) {
snap := h.state.GetSnapshot()
alerts := h.notifyHistory.Recent()
data := dashboardData{
Snapshot: snap,
Alerts: alerts,
StateAge: relTime(snap.LastUpdated),
GeneratedAt: time.Now().UTC().Format("2006-01-02 15:04:05"),
}
writer.Header().Set(
"Content-Type", "text/html; charset=utf-8",
)
err := tmpl.Execute(writer, data)
if err != nil {
h.log.Error(
"dashboard template error",
"error", err,
)
}
}
}
// relTime returns a human-readable relative time string such
// as "2 minutes ago" or "never" for zero times.
func relTime(t time.Time) string {
if t.IsZero() {
return "never"
}
d := time.Since(t)
if d < 0 {
return "just now"
}
seconds := int(math.Round(d.Seconds()))
if seconds < secondsPerMinute {
return fmt.Sprintf("%ds ago", seconds)
}
minutes := seconds / secondsPerMinute
if minutes < minutesPerHour {
return fmt.Sprintf("%dm ago", minutes)
}
hours := minutes / minutesPerHour
if hours < hoursPerDay {
return fmt.Sprintf(
"%dh %dm ago", hours, minutes%minutesPerHour,
)
}
days := hours / hoursPerDay
return fmt.Sprintf(
"%dd %dh ago", days, hours%hoursPerDay,
)
}
// joinStrings joins a string slice with a separator.
func joinStrings(items []string, sep string) string {
return strings.Join(items, sep)
}
// formatRecords formats a map of record type → values into a
// compact display string.
func formatRecords(records map[string][]string) string {
if len(records) == 0 {
return "-"
}
var parts []string
for rtype, values := range records {
for _, v := range values {
parts = append(parts, rtype+": "+v)
}
}
return strings.Join(parts, ", ")
}
// expiryDays returns the number of days until the given time,
// rounded down. Returns 0 if already expired.
func expiryDays(t time.Time) int {
d := time.Until(t).Hours() / hoursPerDay
if d < 0 {
return 0
}
return int(d)
}

View File

@@ -0,0 +1,80 @@
package handlers_test
import (
"testing"
"time"
"sneak.berlin/go/dnswatcher/internal/handlers"
)
func TestRelTime(t *testing.T) {
t.Parallel()
tests := []struct {
name string
dur time.Duration
want string
}{
{"zero", 0, "never"},
{"seconds", 30 * time.Second, "30s ago"},
{"minutes", 5 * time.Minute, "5m ago"},
{"hours", 2*time.Hour + 15*time.Minute, "2h 15m ago"},
{"days", 48*time.Hour + 3*time.Hour, "2d 3h ago"},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
var input time.Time
if tt.dur > 0 {
input = time.Now().Add(-tt.dur)
}
got := handlers.RelTime(input)
if got != tt.want {
t.Errorf(
"RelTime(%v) = %q, want %q",
tt.dur, got, tt.want,
)
}
})
}
}
func TestExpiryDays(t *testing.T) {
t.Parallel()
// 10 days from now.
future := time.Now().Add(10 * 24 * time.Hour)
days := handlers.ExpiryDays(future)
if days < 9 || days > 10 {
t.Errorf("expected ~10 days, got %d", days)
}
// Already expired.
past := time.Now().Add(-24 * time.Hour)
days = handlers.ExpiryDays(past)
if days != 0 {
t.Errorf("expected 0 for expired, got %d", days)
}
}
func TestFormatRecords(t *testing.T) {
t.Parallel()
got := handlers.FormatRecords(nil)
if got != "-" {
t.Errorf("expected -, got %q", got)
}
got = handlers.FormatRecords(map[string][]string{
"A": {"1.2.3.4"},
})
if got != "A: 1.2.3.4" {
t.Errorf("unexpected format: %q", got)
}
}

View File

@@ -1,60 +0,0 @@
package handlers
import (
"net/http"
"time"
)
// domainResponse represents a single domain in the API response.
type domainResponse struct {
Domain string `json:"domain"`
Nameservers []string `json:"nameservers,omitempty"`
LastChecked string `json:"lastChecked,omitempty"`
Status string `json:"status"`
}
// domainsResponse is the top-level response for GET /api/v1/domains.
type domainsResponse struct {
Domains []domainResponse `json:"domains"`
}
// HandleDomains returns the configured domains and their status.
func (h *Handlers) HandleDomains() http.HandlerFunc {
return func(
writer http.ResponseWriter,
request *http.Request,
) {
configured := h.config.Domains
snapshot := h.state.GetSnapshot()
domains := make(
[]domainResponse, 0, len(configured),
)
for _, domain := range configured {
dr := domainResponse{
Domain: domain,
Status: "pending",
}
ds, ok := snapshot.Domains[domain]
if ok {
dr.Nameservers = ds.Nameservers
dr.Status = "ok"
if !ds.LastChecked.IsZero() {
dr.LastChecked = ds.LastChecked.
Format(time.RFC3339)
}
}
domains = append(domains, dr)
}
h.respondJSON(
writer, request,
&domainsResponse{Domains: domains},
http.StatusOK,
)
}
}

View File

@@ -0,0 +1,18 @@
package handlers
import "time"
// RelTime exports relTime for testing.
func RelTime(t time.Time) string {
return relTime(t)
}
// ExpiryDays exports expiryDays for testing.
func ExpiryDays(t time.Time) int {
return expiryDays(t)
}
// FormatRecords exports formatRecords for testing.
func FormatRecords(records map[string][]string) string {
return formatRecords(records)
}

View File

@@ -8,10 +8,10 @@ import (
"go.uber.org/fx"
"sneak.berlin/go/dnswatcher/internal/config"
"sneak.berlin/go/dnswatcher/internal/globals"
"sneak.berlin/go/dnswatcher/internal/healthcheck"
"sneak.berlin/go/dnswatcher/internal/logger"
"sneak.berlin/go/dnswatcher/internal/notify"
"sneak.berlin/go/dnswatcher/internal/state"
)
@@ -23,7 +23,7 @@ type Params struct {
Globals *globals.Globals
Healthcheck *healthcheck.Healthcheck
State *state.State
Config *config.Config
Notify *notify.Service
}
// Handlers provides HTTP request handlers.
@@ -33,7 +33,7 @@ type Handlers struct {
globals *globals.Globals
hc *healthcheck.Healthcheck
state *state.State
config *config.Config
notifyHistory *notify.AlertHistory
}
// New creates a new Handlers instance.
@@ -44,7 +44,7 @@ func New(_ fx.Lifecycle, params Params) (*Handlers, error) {
globals: params.Globals,
hc: params.Healthcheck,
state: params.State,
config: params.Config,
notifyHistory: params.Notify.History(),
}, nil
}
@@ -52,7 +52,7 @@ func (h *Handlers) respondJSON(
writer http.ResponseWriter,
_ *http.Request,
data any,
status int, //nolint:unparam // general-purpose utility; status varies in future use
status int,
) {
writer.Header().Set("Content-Type", "application/json")
writer.WriteHeader(status)

View File

@@ -1,120 +0,0 @@
package handlers
import (
"net/http"
"sort"
"time"
"sneak.berlin/go/dnswatcher/internal/state"
)
// nameserverRecordResponse represents one nameserver's records
// in the API response.
type nameserverRecordResponse struct {
Nameserver string `json:"nameserver"`
Records map[string][]string `json:"records"`
Status string `json:"status"`
Error string `json:"error,omitempty"`
LastChecked string `json:"lastChecked,omitempty"`
}
// hostnameResponse represents a single hostname in the API response.
type hostnameResponse struct {
Hostname string `json:"hostname"`
Nameservers []nameserverRecordResponse `json:"nameservers,omitempty"`
LastChecked string `json:"lastChecked,omitempty"`
Status string `json:"status"`
}
// hostnamesResponse is the top-level response for
// GET /api/v1/hostnames.
type hostnamesResponse struct {
Hostnames []hostnameResponse `json:"hostnames"`
}
// HandleHostnames returns the configured hostnames and their status.
func (h *Handlers) HandleHostnames() http.HandlerFunc {
return func(
writer http.ResponseWriter,
request *http.Request,
) {
configured := h.config.Hostnames
snapshot := h.state.GetSnapshot()
hostnames := make(
[]hostnameResponse, 0, len(configured),
)
for _, hostname := range configured {
hr := hostnameResponse{
Hostname: hostname,
Status: "pending",
}
hs, ok := snapshot.Hostnames[hostname]
if ok {
hr.Status = "ok"
if !hs.LastChecked.IsZero() {
hr.LastChecked = hs.LastChecked.
Format(time.RFC3339)
}
hr.Nameservers = buildNameserverRecords(
hs,
)
}
hostnames = append(hostnames, hr)
}
h.respondJSON(
writer, request,
&hostnamesResponse{Hostnames: hostnames},
http.StatusOK,
)
}
}
// buildNameserverRecords converts the per-nameserver state map
// into a sorted slice for deterministic JSON output.
func buildNameserverRecords(
hs *state.HostnameState,
) []nameserverRecordResponse {
if hs.RecordsByNameserver == nil {
return nil
}
nsNames := make(
[]string, 0, len(hs.RecordsByNameserver),
)
for ns := range hs.RecordsByNameserver {
nsNames = append(nsNames, ns)
}
sort.Strings(nsNames)
records := make(
[]nameserverRecordResponse, 0, len(nsNames),
)
for _, ns := range nsNames {
nsr := hs.RecordsByNameserver[ns]
entry := nameserverRecordResponse{
Nameserver: ns,
Records: nsr.Records,
Status: nsr.Status,
Error: nsr.Error,
}
if !nsr.LastChecked.IsZero() {
entry.LastChecked = nsr.LastChecked.
Format(time.RFC3339)
}
records = append(records, entry)
}
return records
}

View File

@@ -2,22 +2,217 @@ package handlers
import (
"net/http"
"sort"
"time"
"sneak.berlin/go/dnswatcher/internal/state"
)
// statusDomainInfo holds status information for a monitored domain.
type statusDomainInfo struct {
Nameservers []string `json:"nameservers"`
LastChecked time.Time `json:"lastChecked"`
}
// statusHostnameNSInfo holds per-nameserver status for a hostname.
type statusHostnameNSInfo struct {
Records map[string][]string `json:"records"`
Status string `json:"status"`
LastChecked time.Time `json:"lastChecked"`
}
// statusHostnameInfo holds status information for a monitored hostname.
type statusHostnameInfo struct {
Nameservers map[string]*statusHostnameNSInfo `json:"nameservers"`
LastChecked time.Time `json:"lastChecked"`
}
// statusPortInfo holds status information for a monitored port.
type statusPortInfo struct {
Open bool `json:"open"`
Hostnames []string `json:"hostnames"`
LastChecked time.Time `json:"lastChecked"`
}
// statusCertificateInfo holds status information for a TLS certificate.
type statusCertificateInfo struct {
CommonName string `json:"commonName"`
Issuer string `json:"issuer"`
NotAfter time.Time `json:"notAfter"`
SubjectAlternativeNames []string `json:"subjectAlternativeNames"`
Status string `json:"status"`
LastChecked time.Time `json:"lastChecked"`
}
// statusCounts holds summary counts of monitored resources.
type statusCounts struct {
Domains int `json:"domains"`
Hostnames int `json:"hostnames"`
Ports int `json:"ports"`
PortsOpen int `json:"portsOpen"`
Certificates int `json:"certificates"`
CertsOK int `json:"certificatesOk"`
CertsError int `json:"certificatesError"`
}
// statusResponse is the full /api/v1/status response.
type statusResponse struct {
Status string `json:"status"`
LastUpdated time.Time `json:"lastUpdated"`
Counts statusCounts `json:"counts"`
Domains map[string]*statusDomainInfo `json:"domains"`
Hostnames map[string]*statusHostnameInfo `json:"hostnames"`
Ports map[string]*statusPortInfo `json:"ports"`
Certificates map[string]*statusCertificateInfo `json:"certificates"`
}
// HandleStatus returns the monitoring status handler.
func (h *Handlers) HandleStatus() http.HandlerFunc {
type response struct {
Status string `json:"status"`
}
return func(
writer http.ResponseWriter,
request *http.Request,
) {
snap := h.state.GetSnapshot()
resp := buildStatusResponse(snap)
h.respondJSON(
writer, request,
&response{Status: "ok"},
resp,
http.StatusOK,
)
}
}
// buildStatusResponse constructs the full status response from
// the current monitoring snapshot.
func buildStatusResponse(
snap state.Snapshot,
) *statusResponse {
resp := &statusResponse{
Status: "ok",
LastUpdated: snap.LastUpdated,
Domains: make(map[string]*statusDomainInfo),
Hostnames: make(map[string]*statusHostnameInfo),
Ports: make(map[string]*statusPortInfo),
Certificates: make(map[string]*statusCertificateInfo),
}
buildDomains(snap, resp)
buildHostnames(snap, resp)
buildPorts(snap, resp)
buildCertificates(snap, resp)
buildCounts(resp)
return resp
}
func buildDomains(
snap state.Snapshot,
resp *statusResponse,
) {
for name, ds := range snap.Domains {
ns := make([]string, len(ds.Nameservers))
copy(ns, ds.Nameservers)
sort.Strings(ns)
resp.Domains[name] = &statusDomainInfo{
Nameservers: ns,
LastChecked: ds.LastChecked,
}
}
}
func buildHostnames(
snap state.Snapshot,
resp *statusResponse,
) {
for name, hs := range snap.Hostnames {
info := &statusHostnameInfo{
Nameservers: make(map[string]*statusHostnameNSInfo),
LastChecked: hs.LastChecked,
}
for ns, nsState := range hs.RecordsByNameserver {
recs := make(map[string][]string, len(nsState.Records))
for rtype, vals := range nsState.Records {
copied := make([]string, len(vals))
copy(copied, vals)
recs[rtype] = copied
}
info.Nameservers[ns] = &statusHostnameNSInfo{
Records: recs,
Status: nsState.Status,
LastChecked: nsState.LastChecked,
}
}
resp.Hostnames[name] = info
}
}
func buildPorts(
snap state.Snapshot,
resp *statusResponse,
) {
for key, ps := range snap.Ports {
hostnames := make([]string, len(ps.Hostnames))
copy(hostnames, ps.Hostnames)
sort.Strings(hostnames)
resp.Ports[key] = &statusPortInfo{
Open: ps.Open,
Hostnames: hostnames,
LastChecked: ps.LastChecked,
}
}
}
func buildCertificates(
snap state.Snapshot,
resp *statusResponse,
) {
for key, cs := range snap.Certificates {
sans := make([]string, len(cs.SubjectAlternativeNames))
copy(sans, cs.SubjectAlternativeNames)
resp.Certificates[key] = &statusCertificateInfo{
CommonName: cs.CommonName,
Issuer: cs.Issuer,
NotAfter: cs.NotAfter,
SubjectAlternativeNames: sans,
Status: cs.Status,
LastChecked: cs.LastChecked,
}
}
}
func buildCounts(resp *statusResponse) {
var portsOpen, certsOK, certsError int
for _, ps := range resp.Ports {
if ps.Open {
portsOpen++
}
}
for _, cs := range resp.Certificates {
switch cs.Status {
case "ok":
certsOK++
case "error":
certsError++
}
}
resp.Counts = statusCounts{
Domains: len(resp.Domains),
Hostnames: len(resp.Hostnames),
Ports: len(resp.Ports),
PortsOpen: portsOpen,
Certificates: len(resp.Certificates),
CertsOK: certsOK,
CertsError: certsError,
}
}

View File

@@ -0,0 +1,370 @@
<!doctype html>
<html lang="en" class="bg-slate-950">
<head>
<meta charset="utf-8" />
<meta http-equiv="refresh" content="30" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>dnswatcher</title>
<link rel="stylesheet" href="/s/css/tailwind.min.css" />
</head>
<body
class="bg-surface-950 text-slate-300 font-mono text-sm min-h-screen antialiased"
>
<div class="max-w-6xl mx-auto px-4 py-8">
{{/* ---- Header ---- */}}
<div class="mb-8">
<h1 class="text-2xl font-bold text-teal-400 tracking-tight">
dnswatcher
</h1>
<p class="text-xs text-slate-500 mt-1">
state updated {{ .StateAge }} &middot; page generated
{{ .GeneratedAt }} UTC &middot; auto-refresh 30s
</p>
</div>
{{/* ---- Summary bar ---- */}}
<div
class="grid grid-cols-2 sm:grid-cols-4 gap-3 mb-8"
>
<div class="bg-surface-800 border border-slate-700/50 rounded-lg p-4">
<div class="text-xs text-slate-500 uppercase tracking-wider">
Domains
</div>
<div class="text-2xl font-bold text-teal-400 mt-1">
{{ len .Snapshot.Domains }}
</div>
</div>
<div class="bg-surface-800 border border-slate-700/50 rounded-lg p-4">
<div class="text-xs text-slate-500 uppercase tracking-wider">
Hostnames
</div>
<div class="text-2xl font-bold text-teal-400 mt-1">
{{ len .Snapshot.Hostnames }}
</div>
</div>
<div class="bg-surface-800 border border-slate-700/50 rounded-lg p-4">
<div class="text-xs text-slate-500 uppercase tracking-wider">
Ports
</div>
<div class="text-2xl font-bold text-teal-400 mt-1">
{{ len .Snapshot.Ports }}
</div>
</div>
<div class="bg-surface-800 border border-slate-700/50 rounded-lg p-4">
<div class="text-xs text-slate-500 uppercase tracking-wider">
Certificates
</div>
<div class="text-2xl font-bold text-teal-400 mt-1">
{{ len .Snapshot.Certificates }}
</div>
</div>
</div>
{{/* ---- Domains ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Domains
</h2>
{{ if .Snapshot.Domains }}
<div class="overflow-x-auto">
<table class="w-full text-left text-xs">
<thead>
<tr class="text-slate-500 uppercase tracking-wider">
<th class="py-2 px-3">Domain</th>
<th class="py-2 px-3">Nameservers</th>
<th class="py-2 px-3">Checked</th>
</tr>
</thead>
<tbody class="divide-y divide-slate-800">
{{ range $name, $ds := .Snapshot.Domains }}
<tr class="hover:bg-surface-800/50">
<td class="py-2 px-3 text-slate-200 font-medium">
{{ $name }}
</td>
<td class="py-2 px-3 text-slate-400 break-all">
{{ joinStrings $ds.Nameservers ", " }}
</td>
<td class="py-2 px-3 text-slate-500 whitespace-nowrap">
{{ relTime $ds.LastChecked }}
</td>
</tr>
{{ end }}
</tbody>
</table>
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No domains configured.
</p>
{{ end }}
</section>
{{/* ---- Hostnames ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Hostnames
</h2>
{{ if .Snapshot.Hostnames }}
<div class="overflow-x-auto">
<table class="w-full text-left text-xs">
<thead>
<tr class="text-slate-500 uppercase tracking-wider">
<th class="py-2 px-3">Hostname</th>
<th class="py-2 px-3">NS</th>
<th class="py-2 px-3">Status</th>
<th class="py-2 px-3">Records</th>
<th class="py-2 px-3">Checked</th>
</tr>
</thead>
<tbody class="divide-y divide-slate-800">
{{ range $name, $hs := .Snapshot.Hostnames }}
{{ range $ns, $nsr := $hs.RecordsByNameserver }}
<tr class="hover:bg-surface-800/50">
<td class="py-2 px-3 text-slate-200 font-medium">
{{ $name }}
</td>
<td class="py-2 px-3 text-slate-400 break-all">
{{ $ns }}
</td>
<td class="py-2 px-3">
{{ if eq $nsr.Status "ok" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-teal-900/50 text-teal-400 border border-teal-700/30"
>ok</span
>
{{ else }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-red-900/50 text-red-400 border border-red-700/30"
>{{ $nsr.Status }}</span
>
{{ end }}
</td>
<td
class="py-2 px-3 text-slate-400 break-all max-w-xs"
>
{{ formatRecords $nsr.Records }}
</td>
<td class="py-2 px-3 text-slate-500 whitespace-nowrap">
{{ relTime $nsr.LastChecked }}
</td>
</tr>
{{ end }}
{{ end }}
</tbody>
</table>
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No hostnames configured.
</p>
{{ end }}
</section>
{{/* ---- Ports ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Ports
</h2>
{{ if .Snapshot.Ports }}
<div class="overflow-x-auto">
<table class="w-full text-left text-xs">
<thead>
<tr class="text-slate-500 uppercase tracking-wider">
<th class="py-2 px-3">Address</th>
<th class="py-2 px-3">State</th>
<th class="py-2 px-3">Hostnames</th>
<th class="py-2 px-3">Checked</th>
</tr>
</thead>
<tbody class="divide-y divide-slate-800">
{{ range $key, $ps := .Snapshot.Ports }}
<tr class="hover:bg-surface-800/50">
<td class="py-2 px-3 text-slate-200 font-medium">
{{ $key }}
</td>
<td class="py-2 px-3">
{{ if $ps.Open }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-teal-900/50 text-teal-400 border border-teal-700/30"
>open</span
>
{{ else }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-red-900/50 text-red-400 border border-red-700/30"
>closed</span
>
{{ end }}
</td>
<td class="py-2 px-3 text-slate-400 break-all">
{{ joinStrings $ps.Hostnames ", " }}
</td>
<td class="py-2 px-3 text-slate-500 whitespace-nowrap">
{{ relTime $ps.LastChecked }}
</td>
</tr>
{{ end }}
</tbody>
</table>
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No port data yet.
</p>
{{ end }}
</section>
{{/* ---- Certificates ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Certificates
</h2>
{{ if .Snapshot.Certificates }}
<div class="overflow-x-auto">
<table class="w-full text-left text-xs">
<thead>
<tr class="text-slate-500 uppercase tracking-wider">
<th class="py-2 px-3">Endpoint</th>
<th class="py-2 px-3">Status</th>
<th class="py-2 px-3">CN</th>
<th class="py-2 px-3">Issuer</th>
<th class="py-2 px-3">Expires</th>
<th class="py-2 px-3">Checked</th>
</tr>
</thead>
<tbody class="divide-y divide-slate-800">
{{ range $key, $cs := .Snapshot.Certificates }}
<tr class="hover:bg-surface-800/50">
<td class="py-2 px-3 text-slate-400 break-all">
{{ $key }}
</td>
<td class="py-2 px-3">
{{ if eq $cs.Status "ok" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-teal-900/50 text-teal-400 border border-teal-700/30"
>ok</span
>
{{ else }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-red-900/50 text-red-400 border border-red-700/30"
>{{ $cs.Status }}</span
>
{{ end }}
</td>
<td class="py-2 px-3 text-slate-200">
{{ $cs.CommonName }}
</td>
<td class="py-2 px-3 text-slate-400 break-all">
{{ $cs.Issuer }}
</td>
<td class="py-2 px-3 whitespace-nowrap">
{{ if not $cs.NotAfter.IsZero }}
{{ $days := expiryDays $cs.NotAfter }}
{{ if lt $days 7 }}
<span class="text-red-400 font-medium"
>{{ $cs.NotAfter.Format "2006-01-02" }}
({{ $days }}d)</span
>
{{ else if lt $days 30 }}
<span class="text-amber-400"
>{{ $cs.NotAfter.Format "2006-01-02" }}
({{ $days }}d)</span
>
{{ else }}
<span class="text-slate-400"
>{{ $cs.NotAfter.Format "2006-01-02" }}
({{ $days }}d)</span
>
{{ end }}
{{ end }}
</td>
<td class="py-2 px-3 text-slate-500 whitespace-nowrap">
{{ relTime $cs.LastChecked }}
</td>
</tr>
{{ end }}
</tbody>
</table>
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No certificate data yet.
</p>
{{ end }}
</section>
{{/* ---- Recent Alerts ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Recent Alerts ({{ len .Alerts }})
</h2>
{{ if .Alerts }}
<div class="space-y-2">
{{ range .Alerts }}
<div
class="bg-surface-800 border rounded-lg px-4 py-3 {{ if eq .Priority "error" }}border-red-700/40{{ else if eq .Priority "warning" }}border-amber-700/40{{ else if eq .Priority "success" }}border-teal-700/40{{ else }}border-blue-700/40{{ end }}"
>
<div class="flex items-center gap-3 mb-1">
{{ if eq .Priority "error" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-red-900/50 text-red-400 border border-red-700/30"
>error</span
>
{{ else if eq .Priority "warning" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-amber-900/50 text-amber-400 border border-amber-700/30"
>warning</span
>
{{ else if eq .Priority "success" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-teal-900/50 text-teal-400 border border-teal-700/30"
>success</span
>
{{ else }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-blue-900/50 text-blue-400 border border-blue-700/30"
>info</span
>
{{ end }}
<span class="text-slate-200 text-xs font-medium">
{{ .Title }}
</span>
<span class="text-slate-600 text-[11px] ml-auto whitespace-nowrap">
{{ .Timestamp.Format "2006-01-02 15:04:05" }} UTC
({{ relTime .Timestamp }})
</span>
</div>
<p
class="text-slate-400 text-xs whitespace-pre-line pl-0.5"
>
{{ .Message }}
</p>
</div>
{{ end }}
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No alerts recorded since last restart.
</p>
{{ end }}
</section>
{{/* ---- Footer ---- */}}
<div
class="text-[11px] text-slate-700 border-t border-slate-800 pt-4 mt-8"
>
dnswatcher &middot; monitoring {{ len .Snapshot.Domains }} domains +
{{ len .Snapshot.Hostnames }} hostnames
</div>
</div>
</body>
</html>

View File

@@ -78,6 +78,5 @@ func (l *Logger) Identify() {
l.log.Info("starting",
"appname", l.params.Globals.Appname,
"version", l.params.Globals.Version,
"buildarch", l.params.Globals.Buildarch,
)
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,105 @@
package notify
import (
"context"
"io"
"log/slog"
"net/http"
"net/url"
"time"
)
// NtfyPriority exports ntfyPriority for testing.
func NtfyPriority(priority string) string {
return ntfyPriority(priority)
}
// SlackColor exports slackColor for testing.
func SlackColor(priority string) string {
return slackColor(priority)
}
// NewRequestForTest exports newRequest for testing.
func NewRequestForTest(
ctx context.Context,
method string,
target *url.URL,
body io.Reader,
) *http.Request {
return newRequest(ctx, method, target, body)
}
// NewTestService creates a Service suitable for unit testing.
// It discards log output and uses the given transport.
func NewTestService(transport http.RoundTripper) *Service {
return &Service{
log: slog.New(slog.DiscardHandler),
transport: transport,
history: NewAlertHistory(),
}
}
// SetNtfyURL sets the ntfy URL on a Service for testing.
func (svc *Service) SetNtfyURL(u *url.URL) {
svc.ntfyURL = u
}
// SetSlackWebhookURL sets the Slack webhook URL on a
// Service for testing.
func (svc *Service) SetSlackWebhookURL(u *url.URL) {
svc.slackWebhookURL = u
}
// SetMattermostWebhookURL sets the Mattermost webhook URL on
// a Service for testing.
func (svc *Service) SetMattermostWebhookURL(u *url.URL) {
svc.mattermostWebhookURL = u
}
// SendNtfy exports sendNtfy for testing.
func (svc *Service) SendNtfy(
ctx context.Context,
topicURL *url.URL,
title, message, priority string,
) error {
return svc.sendNtfy(ctx, topicURL, title, message, priority)
}
// SendSlack exports sendSlack for testing.
func (svc *Service) SendSlack(
ctx context.Context,
webhookURL *url.URL,
title, message, priority string,
) error {
return svc.sendSlack(
ctx, webhookURL, title, message, priority,
)
}
// SetRetryConfig overrides the retry configuration for
// testing.
func (svc *Service) SetRetryConfig(cfg RetryConfig) {
svc.retryConfig = cfg
}
// SetSleepFunc overrides the sleep function so tests can
// eliminate real delays.
func (svc *Service) SetSleepFunc(
fn func(time.Duration) <-chan time.Time,
) {
svc.sleepFn = fn
}
// DeliverWithRetry exports deliverWithRetry for testing.
func (svc *Service) DeliverWithRetry(
ctx context.Context,
endpoint string,
fn func(context.Context) error,
) error {
return svc.deliverWithRetry(ctx, endpoint, fn)
}
// BackoffDuration exports RetryConfig.backoff for testing.
func (rc RetryConfig) BackoffDuration(attempt int) time.Duration {
return rc.defaults().backoff(attempt)
}

View File

@@ -0,0 +1,62 @@
package notify
import (
"sync"
"time"
)
// maxAlertHistory is the maximum number of alerts to retain.
const maxAlertHistory = 100
// AlertEntry represents a single notification that was sent.
type AlertEntry struct {
Timestamp time.Time
Title string
Message string
Priority string
}
// AlertHistory is a thread-safe ring buffer that stores
// the most recent alerts.
type AlertHistory struct {
mu sync.RWMutex
entries [maxAlertHistory]AlertEntry
count int
index int
}
// NewAlertHistory creates a new empty AlertHistory.
func NewAlertHistory() *AlertHistory {
return &AlertHistory{}
}
// Add records a new alert entry in the ring buffer.
func (h *AlertHistory) Add(entry AlertEntry) {
h.mu.Lock()
defer h.mu.Unlock()
h.entries[h.index] = entry
h.index = (h.index + 1) % maxAlertHistory
if h.count < maxAlertHistory {
h.count++
}
}
// Recent returns the stored alerts in reverse chronological
// order (newest first). Returns at most maxAlertHistory entries.
func (h *AlertHistory) Recent() []AlertEntry {
h.mu.RLock()
defer h.mu.RUnlock()
result := make([]AlertEntry, h.count)
for i := range h.count {
// Walk backwards from the most recent entry.
idx := (h.index - 1 - i + maxAlertHistory) %
maxAlertHistory
result[i] = h.entries[idx]
}
return result
}

View File

@@ -0,0 +1,88 @@
package notify_test
import (
"testing"
"time"
"sneak.berlin/go/dnswatcher/internal/notify"
)
func TestAlertHistoryEmpty(t *testing.T) {
t.Parallel()
h := notify.NewAlertHistory()
entries := h.Recent()
if len(entries) != 0 {
t.Fatalf("expected 0 entries, got %d", len(entries))
}
}
func TestAlertHistoryAddAndRecent(t *testing.T) {
t.Parallel()
h := notify.NewAlertHistory()
now := time.Now().UTC()
h.Add(notify.AlertEntry{
Timestamp: now.Add(-2 * time.Minute),
Title: "first",
Message: "msg1",
Priority: "info",
})
h.Add(notify.AlertEntry{
Timestamp: now.Add(-1 * time.Minute),
Title: "second",
Message: "msg2",
Priority: "warning",
})
entries := h.Recent()
if len(entries) != 2 {
t.Fatalf("expected 2 entries, got %d", len(entries))
}
// Newest first.
if entries[0].Title != "second" {
t.Errorf(
"expected newest first, got %q", entries[0].Title,
)
}
if entries[1].Title != "first" {
t.Errorf(
"expected oldest second, got %q", entries[1].Title,
)
}
}
func TestAlertHistoryOverflow(t *testing.T) {
t.Parallel()
h := notify.NewAlertHistory()
const totalEntries = 110
// Fill beyond capacity.
for i := range totalEntries {
h.Add(notify.AlertEntry{
Timestamp: time.Now().UTC(),
Title: "alert",
Message: "msg",
Priority: string(rune('0' + i%10)),
})
}
entries := h.Recent()
const maxHistory = 100
if len(entries) != maxHistory {
t.Fatalf(
"expected %d entries, got %d",
maxHistory, len(entries),
)
}
}

View File

@@ -112,6 +112,9 @@ type Service struct {
ntfyURL *url.URL
slackWebhookURL *url.URL
mattermostWebhookURL *url.URL
history *AlertHistory
retryConfig RetryConfig
sleepFn func(time.Duration) <-chan time.Time
}
// New creates a new notify Service.
@@ -123,6 +126,7 @@ func New(
log: params.Logger.Get(),
transport: http.DefaultTransport,
config: params.Config,
history: NewAlertHistory(),
}
if params.Config.NtfyTopic != "" {
@@ -167,65 +171,117 @@ func New(
return svc, nil
}
// History returns the alert history for reading recent alerts.
func (svc *Service) History() *AlertHistory {
return svc.history
}
// SendNotification sends a notification to all configured
// endpoints.
// endpoints and records it in the alert history.
func (svc *Service) SendNotification(
ctx context.Context,
title, message, priority string,
) {
if svc.ntfyURL != nil {
svc.history.Add(AlertEntry{
Timestamp: time.Now().UTC(),
Title: title,
Message: message,
Priority: priority,
})
svc.dispatchNtfy(ctx, title, message, priority)
svc.dispatchSlack(ctx, title, message, priority)
svc.dispatchMattermost(ctx, title, message, priority)
}
func (svc *Service) dispatchNtfy(
ctx context.Context,
title, message, priority string,
) {
if svc.ntfyURL == nil {
return
}
go func() {
notifyCtx := context.WithoutCancel(ctx)
err := svc.sendNtfy(
notifyCtx,
svc.ntfyURL,
err := svc.deliverWithRetry(
notifyCtx, "ntfy",
func(c context.Context) error {
return svc.sendNtfy(
c, svc.ntfyURL,
title, message, priority,
)
},
)
if err != nil {
svc.log.Error(
"failed to send ntfy notification",
"failed to send ntfy notification "+
"after retries",
"error", err,
)
}
}()
}
func (svc *Service) dispatchSlack(
ctx context.Context,
title, message, priority string,
) {
if svc.slackWebhookURL == nil {
return
}
if svc.slackWebhookURL != nil {
go func() {
notifyCtx := context.WithoutCancel(ctx)
err := svc.sendSlack(
notifyCtx,
svc.slackWebhookURL,
err := svc.deliverWithRetry(
notifyCtx, "slack",
func(c context.Context) error {
return svc.sendSlack(
c, svc.slackWebhookURL,
title, message, priority,
)
},
)
if err != nil {
svc.log.Error(
"failed to send slack notification",
"failed to send slack notification "+
"after retries",
"error", err,
)
}
}()
}
func (svc *Service) dispatchMattermost(
ctx context.Context,
title, message, priority string,
) {
if svc.mattermostWebhookURL == nil {
return
}
if svc.mattermostWebhookURL != nil {
go func() {
notifyCtx := context.WithoutCancel(ctx)
err := svc.sendSlack(
notifyCtx,
svc.mattermostWebhookURL,
err := svc.deliverWithRetry(
notifyCtx, "mattermost",
func(c context.Context) error {
return svc.sendSlack(
c, svc.mattermostWebhookURL,
title, message, priority,
)
},
)
if err != nil {
svc.log.Error(
"failed to send mattermost notification",
"failed to send mattermost notification "+
"after retries",
"error", err,
)
}
}()
}
}
func (svc *Service) sendNtfy(

139
internal/notify/retry.go Normal file
View File

@@ -0,0 +1,139 @@
package notify
import (
"context"
"math"
"math/rand/v2"
"time"
)
// Retry defaults.
const (
// DefaultMaxRetries is the number of additional attempts
// after the first failure.
DefaultMaxRetries = 3
// DefaultBaseDelay is the initial delay before the first
// retry attempt.
DefaultBaseDelay = 1 * time.Second
// DefaultMaxDelay caps the computed backoff delay.
DefaultMaxDelay = 10 * time.Second
// backoffMultiplier is the exponential growth factor.
backoffMultiplier = 2
// jitterFraction controls the ±random spread applied
// to each delay (0.25 = ±25%).
jitterFraction = 0.25
)
// RetryConfig holds tuning knobs for the retry loop.
// Zero values fall back to the package defaults above.
type RetryConfig struct {
MaxRetries int
BaseDelay time.Duration
MaxDelay time.Duration
}
// defaults returns a copy with zero fields replaced by
// package defaults.
func (rc RetryConfig) defaults() RetryConfig {
if rc.MaxRetries <= 0 {
rc.MaxRetries = DefaultMaxRetries
}
if rc.BaseDelay <= 0 {
rc.BaseDelay = DefaultBaseDelay
}
if rc.MaxDelay <= 0 {
rc.MaxDelay = DefaultMaxDelay
}
return rc
}
// backoff computes the delay for attempt n (0-indexed) with
// jitter. The raw delay is BaseDelay * 2^n, capped at
// MaxDelay, then randomised by ±jitterFraction.
func (rc RetryConfig) backoff(attempt int) time.Duration {
raw := float64(rc.BaseDelay) *
math.Pow(backoffMultiplier, float64(attempt))
if raw > float64(rc.MaxDelay) {
raw = float64(rc.MaxDelay)
}
// Apply jitter: uniform in [raw*(1-j), raw*(1+j)].
lo := raw * (1 - jitterFraction)
hi := raw * (1 + jitterFraction)
jittered := lo + rand.Float64()*(hi-lo) //nolint:gosec // jitter does not need crypto/rand
return time.Duration(jittered)
}
// deliverWithRetry calls fn, retrying on error with
// exponential backoff. It logs every failed attempt and
// returns the last error if all attempts are exhausted.
func (svc *Service) deliverWithRetry(
ctx context.Context,
endpoint string,
fn func(context.Context) error,
) error {
cfg := svc.retryConfig.defaults()
var lastErr error
// attempt 0 is the initial call; attempts 1..MaxRetries
// are retries.
for attempt := range cfg.MaxRetries + 1 {
lastErr = fn(ctx)
if lastErr == nil {
if attempt > 0 {
svc.log.Info(
"notification delivered after retry",
"endpoint", endpoint,
"attempt", attempt+1,
)
}
return nil
}
// Last attempt — don't sleep, just return.
if attempt == cfg.MaxRetries {
break
}
delay := cfg.backoff(attempt)
svc.log.Warn(
"notification delivery failed, retrying",
"endpoint", endpoint,
"attempt", attempt+1,
"maxAttempts", cfg.MaxRetries+1,
"retryIn", delay,
"error", lastErr,
)
select {
case <-ctx.Done():
return ctx.Err()
case <-svc.sleepFunc(delay):
}
}
return lastErr
}
// sleepFunc returns a channel that closes after d.
// It is a field-level indirection so tests can override it.
func (svc *Service) sleepFunc(d time.Duration) <-chan time.Time {
if svc.sleepFn != nil {
return svc.sleepFn(d)
}
return time.After(d)
}

View File

@@ -0,0 +1,493 @@
package notify_test
import (
"context"
"errors"
"net/http"
"net/http/httptest"
"net/url"
"sync"
"sync/atomic"
"testing"
"time"
"sneak.berlin/go/dnswatcher/internal/notify"
)
// Static test errors (err113).
var (
errTransient = errors.New("transient failure")
errPermanent = errors.New("permanent failure")
errFail = errors.New("fail")
)
// instantSleep returns a closed channel immediately, removing
// real delays from tests.
func instantSleep(_ time.Duration) <-chan time.Time {
ch := make(chan time.Time, 1)
ch <- time.Now()
return ch
}
// ── backoff calculation ───────────────────────────────────
func TestBackoffDurationIncreases(t *testing.T) {
t.Parallel()
cfg := notify.RetryConfig{
MaxRetries: 5,
BaseDelay: 1 * time.Second,
MaxDelay: 30 * time.Second,
}
prev := time.Duration(0)
// With jitter the exact value varies, but the trend
// should be increasing for the first few attempts.
for attempt := range 4 {
d := cfg.BackoffDuration(attempt)
if d <= 0 {
t.Fatalf(
"attempt %d: backoff must be positive, got %v",
attempt, d,
)
}
// Allow jitter to occasionally flatten a step, but
// the midpoint (no-jitter) should be strictly higher.
midpoint := cfg.BaseDelay * (1 << attempt)
if attempt > 0 && midpoint <= prev {
t.Fatalf(
"midpoint should grow: attempt %d midpoint=%v prev=%v",
attempt, midpoint, prev,
)
}
prev = midpoint
}
}
func TestBackoffDurationCappedAtMax(t *testing.T) {
t.Parallel()
cfg := notify.RetryConfig{
MaxRetries: 5,
BaseDelay: 1 * time.Second,
MaxDelay: 5 * time.Second,
}
// Attempt 10 would be 1024s without capping.
d := cfg.BackoffDuration(10)
// With ±25% jitter on a 5s cap: max is 6.25s.
const maxWithJitter = 5*time.Second +
5*time.Second/4 +
time.Millisecond // rounding margin
if d > maxWithJitter {
t.Errorf(
"backoff %v exceeds max+jitter %v",
d, maxWithJitter,
)
}
}
// ── deliverWithRetry ──────────────────────────────────────
func TestDeliverWithRetrySucceedsFirstAttempt(t *testing.T) {
t.Parallel()
svc := notify.NewTestService(http.DefaultTransport)
svc.SetSleepFunc(instantSleep)
var calls atomic.Int32
err := svc.DeliverWithRetry(
context.Background(), "test",
func(_ context.Context) error {
calls.Add(1)
return nil
},
)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if calls.Load() != 1 {
t.Errorf("expected 1 call, got %d", calls.Load())
}
}
func TestDeliverWithRetryRetriesOnFailure(t *testing.T) {
t.Parallel()
svc := notify.NewTestService(http.DefaultTransport)
svc.SetSleepFunc(instantSleep)
svc.SetRetryConfig(notify.RetryConfig{
MaxRetries: 3,
BaseDelay: time.Millisecond,
MaxDelay: 10 * time.Millisecond,
})
var calls atomic.Int32
// Fail twice, then succeed on the third attempt.
err := svc.DeliverWithRetry(
context.Background(), "test",
func(_ context.Context) error {
n := calls.Add(1)
if n <= 2 {
return errTransient
}
return nil
},
)
if err != nil {
t.Fatalf("expected success after retries: %v", err)
}
if calls.Load() != 3 {
t.Errorf("expected 3 calls, got %d", calls.Load())
}
}
func TestDeliverWithRetryExhaustsAttempts(t *testing.T) {
t.Parallel()
svc := notify.NewTestService(http.DefaultTransport)
svc.SetSleepFunc(instantSleep)
svc.SetRetryConfig(notify.RetryConfig{
MaxRetries: 2,
BaseDelay: time.Millisecond,
MaxDelay: 10 * time.Millisecond,
})
var calls atomic.Int32
err := svc.DeliverWithRetry(
context.Background(), "test",
func(_ context.Context) error {
calls.Add(1)
return errPermanent
},
)
if err == nil {
t.Fatal("expected error when all retries exhausted")
}
if !errors.Is(err, errPermanent) {
t.Errorf("expected permanent failure, got: %v", err)
}
// 1 initial + 2 retries = 3 total.
if calls.Load() != 3 {
t.Errorf("expected 3 calls, got %d", calls.Load())
}
}
func TestDeliverWithRetryRespectsContextCancellation(
t *testing.T,
) {
t.Parallel()
svc := notify.NewTestService(http.DefaultTransport)
svc.SetRetryConfig(notify.RetryConfig{
MaxRetries: 5,
BaseDelay: time.Millisecond,
MaxDelay: 10 * time.Millisecond,
})
// Use a blocking sleep so the context cancellation is
// the only way out.
svc.SetSleepFunc(func(_ time.Duration) <-chan time.Time {
return make(chan time.Time) // never fires
})
ctx, cancel := context.WithCancel(context.Background())
done := make(chan error, 1)
go func() {
done <- svc.DeliverWithRetry(
ctx, "test",
func(_ context.Context) error {
return errFail
},
)
}()
// Wait for the first failure + retry sleep to be
// entered, then cancel.
time.Sleep(50 * time.Millisecond)
cancel()
select {
case err := <-done:
if !errors.Is(err, context.Canceled) {
t.Errorf(
"expected context.Canceled, got: %v", err,
)
}
case <-time.After(2 * time.Second):
t.Fatal("deliverWithRetry did not return after cancel")
}
}
// ── integration: SendNotification with retry ──────────────
func TestSendNotificationRetriesTransientFailure(
t *testing.T,
) {
t.Parallel()
var (
mu sync.Mutex
attempts int
)
srv := httptest.NewServer(
http.HandlerFunc(
func(w http.ResponseWriter, _ *http.Request) {
mu.Lock()
attempts++
n := attempts
mu.Unlock()
if n <= 2 {
w.WriteHeader(
http.StatusInternalServerError,
)
return
}
w.WriteHeader(http.StatusOK)
}),
)
defer srv.Close()
svc := newRetryTestService(srv.URL, "ntfy")
svc.SendNotification(
context.Background(),
"Retry Test", "body", "warning",
)
waitForCondition(t, func() bool {
mu.Lock()
defer mu.Unlock()
return attempts >= 3
})
}
// newRetryTestService creates a test service with instant
// sleep and low retry delays for the named endpoint.
func newRetryTestService(
rawURL, endpoint string,
) *notify.Service {
svc := notify.NewTestService(http.DefaultTransport)
svc.SetSleepFunc(instantSleep)
svc.SetRetryConfig(notify.RetryConfig{
MaxRetries: 3,
BaseDelay: time.Millisecond,
MaxDelay: 10 * time.Millisecond,
})
u, _ := url.Parse(rawURL)
switch endpoint {
case "ntfy":
svc.SetNtfyURL(u)
case "slack":
svc.SetSlackWebhookURL(u)
case "mattermost":
svc.SetMattermostWebhookURL(u)
}
return svc
}
func TestSendNotificationAllEndpointsRetrySetup(
t *testing.T,
) {
t.Parallel()
result := newEndpointRetryResult()
ntfySrv, slackSrv, mmSrv := newRetryServers(result)
defer ntfySrv.Close()
defer slackSrv.Close()
defer mmSrv.Close()
svc := buildAllEndpointRetryService(
ntfySrv.URL, slackSrv.URL, mmSrv.URL,
)
svc.SendNotification(
context.Background(),
"Multi-Retry", "testing", "error",
)
assertAllEndpointsRetried(t, result)
}
// endpointRetryResult tracks per-endpoint retry state.
type endpointRetryResult struct {
mu sync.Mutex
ntfyAttempts int
slackAttempts int
mmAttempts int
ntfyOK bool
slackOK bool
mmOK bool
}
func newEndpointRetryResult() *endpointRetryResult {
return &endpointRetryResult{}
}
func newRetryServers(
r *endpointRetryResult,
) (*httptest.Server, *httptest.Server, *httptest.Server) {
mk := func(
attempts *int, ok *bool,
) *httptest.Server {
return httptest.NewServer(
http.HandlerFunc(
func(w http.ResponseWriter, _ *http.Request) {
r.mu.Lock()
*attempts++
n := *attempts
r.mu.Unlock()
if n == 1 {
w.WriteHeader(
http.StatusServiceUnavailable,
)
return
}
r.mu.Lock()
*ok = true
r.mu.Unlock()
w.WriteHeader(http.StatusOK)
}),
)
}
return mk(&r.ntfyAttempts, &r.ntfyOK),
mk(&r.slackAttempts, &r.slackOK),
mk(&r.mmAttempts, &r.mmOK)
}
func buildAllEndpointRetryService(
ntfyURL, slackURL, mmURL string,
) *notify.Service {
svc := notify.NewTestService(http.DefaultTransport)
svc.SetSleepFunc(instantSleep)
svc.SetRetryConfig(notify.RetryConfig{
MaxRetries: 3,
BaseDelay: time.Millisecond,
MaxDelay: 10 * time.Millisecond,
})
nu, _ := url.Parse(ntfyURL)
su, _ := url.Parse(slackURL)
mu, _ := url.Parse(mmURL)
svc.SetNtfyURL(nu)
svc.SetSlackWebhookURL(su)
svc.SetMattermostWebhookURL(mu)
return svc
}
func assertAllEndpointsRetried(
t *testing.T,
r *endpointRetryResult,
) {
t.Helper()
waitForCondition(t, func() bool {
r.mu.Lock()
defer r.mu.Unlock()
return r.ntfyOK && r.slackOK && r.mmOK
})
r.mu.Lock()
defer r.mu.Unlock()
if r.ntfyAttempts < 2 {
t.Errorf(
"ntfy: expected >= 2 attempts, got %d",
r.ntfyAttempts,
)
}
if r.slackAttempts < 2 {
t.Errorf(
"slack: expected >= 2 attempts, got %d",
r.slackAttempts,
)
}
if r.mmAttempts < 2 {
t.Errorf(
"mattermost: expected >= 2 attempts, got %d",
r.mmAttempts,
)
}
}
func TestSendNotificationPermanentFailureLogsError(
t *testing.T,
) {
t.Parallel()
var (
mu sync.Mutex
attempts int
)
srv := httptest.NewServer(
http.HandlerFunc(
func(w http.ResponseWriter, _ *http.Request) {
mu.Lock()
attempts++
mu.Unlock()
w.WriteHeader(
http.StatusInternalServerError,
)
}),
)
defer srv.Close()
svc := newRetryTestService(srv.URL, "slack")
svc.SetRetryConfig(notify.RetryConfig{
MaxRetries: 2,
BaseDelay: time.Millisecond,
MaxDelay: 10 * time.Millisecond,
})
svc.SendNotification(
context.Background(),
"Permanent Fail", "body", "error",
)
// 1 initial + 2 retries = 3 total.
waitForCondition(t, func() bool {
mu.Lock()
defer mu.Unlock()
return attempts >= 3
})
}

View File

@@ -1,11 +1,14 @@
package server
import (
"net/http"
"time"
"github.com/go-chi/chi/v5"
chimw "github.com/go-chi/chi/v5/middleware"
"github.com/prometheus/client_golang/prometheus/promhttp"
"sneak.berlin/go/dnswatcher/static"
)
// requestTimeout is the maximum duration for handling a request.
@@ -22,14 +25,30 @@ func (s *Server) SetupRoutes() {
s.router.Use(s.mw.CORS())
s.router.Use(chimw.Timeout(requestTimeout))
// Health check
// Dashboard (read-only web UI)
s.router.Get("/", s.handlers.HandleDashboard())
// Static assets (embedded CSS/JS)
s.router.Mount(
"/s",
http.StripPrefix(
"/s",
http.FileServer(http.FS(static.Static)),
),
)
// Health check (standard well-known path)
s.router.Get(
"/.well-known/healthcheck",
s.handlers.HandleHealthCheck(),
)
// Legacy health check (keep for backward compatibility)
s.router.Get("/health", s.handlers.HandleHealthCheck())
// API v1 routes
s.router.Route("/api/v1", func(r chi.Router) {
r.Get("/status", s.handlers.HandleStatus())
r.Get("/domains", s.handlers.HandleDomains())
r.Get("/hostnames", s.handlers.HandleHostnames())
})
// Metrics endpoint (optional, with basic auth)

1302
internal/state/state_test.go Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -20,3 +20,19 @@ func NewForTest() *State {
config: &config.Config{DataDir: ""},
}
}
// NewForTestWithDataDir creates a State backed by the given directory
// for tests that need file persistence.
func NewForTestWithDataDir(dataDir string) *State {
return &State{
log: slog.Default(),
snapshot: &Snapshot{
Version: stateVersion,
Domains: make(map[string]*DomainState),
Hostnames: make(map[string]*HostnameState),
Ports: make(map[string]*PortState),
Certificates: make(map[string]*CertificateState),
},
config: &config.Config{DataDir: dataDir},
}
}

View File

@@ -129,6 +129,7 @@ func (w *Watcher) Run(ctx context.Context) {
)
w.RunOnce(ctx)
w.maybeSendTestNotification(ctx)
dnsTicker := time.NewTicker(w.config.DNSInterval)
tlsTicker := time.NewTicker(w.config.TLSInterval)
@@ -854,6 +855,38 @@ func (w *Watcher) saveState() {
}
}
// maybeSendTestNotification sends a startup status notification
// after the first full scan completes, if SEND_TEST_NOTIFICATION
// is enabled. The message is clearly informational ("all ok")
// and not an error or anomaly alert.
func (w *Watcher) maybeSendTestNotification(ctx context.Context) {
if !w.config.SendTestNotification {
return
}
snap := w.state.GetSnapshot()
msg := fmt.Sprintf(
"dnswatcher has started and completed its initial scan.\n"+
"Monitoring %d domain(s) and %d hostname(s).\n"+
"Tracking %d port endpoint(s) and %d TLS certificate(s).\n"+
"All notification channels are working.",
len(snap.Domains),
len(snap.Hostnames),
len(snap.Ports),
len(snap.Certificates),
)
w.log.Info("sending startup test notification")
w.notify.SendNotification(
ctx,
"✅ dnswatcher startup complete",
msg,
"success",
)
}
// --- Utility functions ---
func toSet(items []string) map[string]bool {

View File

@@ -756,6 +756,117 @@ func TestDNSRunsBeforePortAndTLSChecks(t *testing.T) {
}
}
func TestSendTestNotification_Enabled(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Domains = []string{"example.com"}
cfg.Hostnames = []string{"www.example.com"}
cfg.SendTestNotification = true
w, deps := newTestWatcher(t, cfg)
setupBaselineMocks(deps)
w.RunOnce(t.Context())
// RunOnce does not send the test notification — it is
// sent by Run after RunOnce completes. Call the exported
// RunOnce then check that no test notification was sent
// (only Run triggers it). We test the full path via Run.
notifications := deps.notifier.getNotifications()
if len(notifications) != 0 {
t.Errorf(
"RunOnce should not send test notification, got %d",
len(notifications),
)
}
}
func TestSendTestNotification_ViaRun(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Domains = []string{"example.com"}
cfg.Hostnames = []string{"www.example.com"}
cfg.SendTestNotification = true
cfg.DNSInterval = 24 * time.Hour
cfg.TLSInterval = 24 * time.Hour
w, deps := newTestWatcher(t, cfg)
setupBaselineMocks(deps)
ctx, cancel := context.WithCancel(t.Context())
done := make(chan struct{})
go func() {
w.Run(ctx)
close(done)
}()
// Wait for the initial scan and test notification.
time.Sleep(500 * time.Millisecond)
cancel()
<-done
notifications := deps.notifier.getNotifications()
found := false
for _, n := range notifications {
if n.Priority == "success" &&
n.Title == "✅ dnswatcher startup complete" {
found = true
}
}
if !found {
t.Errorf(
"expected startup test notification, got: %v",
notifications,
)
}
}
func TestSendTestNotification_Disabled(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Domains = []string{"example.com"}
cfg.Hostnames = []string{"www.example.com"}
cfg.SendTestNotification = false
cfg.DNSInterval = 24 * time.Hour
cfg.TLSInterval = 24 * time.Hour
w, deps := newTestWatcher(t, cfg)
setupBaselineMocks(deps)
ctx, cancel := context.WithCancel(t.Context())
done := make(chan struct{})
go func() {
w.Run(ctx)
close(done)
}()
time.Sleep(500 * time.Millisecond)
cancel()
<-done
notifications := deps.notifier.getNotifications()
for _, n := range notifications {
if n.Title == "✅ dnswatcher startup complete" {
t.Error(
"test notification should not be sent when disabled",
)
}
}
}
func TestNSFailureAndRecovery(t *testing.T) {
t.Parallel()

1
static/css/tailwind.min.css vendored Normal file

File diff suppressed because one or more lines are too long

10
static/static.go Normal file
View File

@@ -0,0 +1,10 @@
// Package static provides embedded static assets.
package static
import "embed"
// Static contains the embedded static assets (CSS, JS) served
// at the /s/ URL prefix.
//
//go:embed css
var Static embed.FS