17 Commits

Author SHA1 Message Date
b64db3e10f feat: enhance /api/v1/status endpoint with full monitoring data (#86)
All checks were successful
check / check (push) Successful in 1m27s
## Summary

Enhances the `/api/v1/status` endpoint to return comprehensive monitoring state instead of just `{"status": "ok"}`.

## Changes

The endpoint now returns:

- **Summary counts**: domains, hostnames, ports (total + open), certificates (total + ok + error)
- **Domains**: each monitored domain with its discovered nameservers and last check timestamp
- **Hostnames**: each monitored hostname with per-nameserver DNS records, status, and last check timestamps
- **Ports**: each monitored IP:port with open/closed state, associated hostnames, and last check timestamp
- **Certificates**: each TLS certificate with CN, issuer, expiry, SANs, status, and last check timestamp
- **Last updated**: timestamp of the overall monitoring state

All data is derived from the existing `state.GetSnapshot()`, consistent with how the dashboard works. No configuration details (webhook URLs, API tokens) are exposed.

## Example response structure

```json
{
  "status": "ok",
  "lastUpdated": "2026-03-10T12:00:00Z",
  "counts": {
    "domains": 2,
    "hostnames": 3,
    "ports": 10,
    "portsOpen": 8,
    "certificates": 4,
    "certificatesOk": 3,
    "certificatesError": 1
  },
  "domains": { ... },
  "hostnames": { ... },
  "ports": { ... },
  "certificates": { ... }
}
```

closes #73

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #86
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-10 12:20:11 +01:00
65180ad661 feat: add DNSWATCHER_SEND_TEST_NOTIFICATION env var (#85)
All checks were successful
check / check (push) Successful in 5s
When set to a truthy value, sends a startup status notification to all configured notification channels after the first full scan completes on application startup. The notification is clearly an all-ok/success message showing the number of monitored domains, hostnames, ports, and certificates.

Changes:
- Added `SendTestNotification` config field reading `DNSWATCHER_SEND_TEST_NOTIFICATION`
- Added `maybeSendTestNotification()` in watcher, called after initial `RunOnce` in `Run`
- Added 3 watcher tests (enabled via Run, enabled via RunOnce alone, disabled)
- Added config tests for the new field
- Updated README: env var table, example .env, Docker example

Closes #84

Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #85
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-04 21:41:55 +01:00
1076543c23 feat: add unauthenticated web dashboard showing monitoring state and recent alerts (#83)
All checks were successful
check / check (push) Successful in 4s
## Summary

Adds a read-only web dashboard at `GET /` that shows the current monitoring state and recent alerts. Unauthenticated, single-page, no navigation.

## What it shows

- **Summary bar**: counts of monitored domains, hostnames, ports, certificates
- **Domains**: nameservers with last-checked age
- **Hostnames**: per-nameserver DNS records, status badges, relative age
- **Ports**: open/closed state with associated hostnames and age
- **TLS Certificates**: CN, issuer, expiry (color-coded by urgency), status, age
- **Recent Alerts**: last 100 notifications in reverse chronological order with priority badges

Every data point displays its age (e.g. "5m ago") so freshness is visible at a glance. Auto-refreshes every 30 seconds.

## What it does NOT show

No secrets: webhook URLs, ntfy topics, Slack/Mattermost endpoints, API tokens, and configuration details are never exposed.

## Design

All assets (CSS) are embedded in the binary and served from `/s/`. Zero external HTTP requests at runtime — no CDN dependencies or third-party resources. Dark, technical aesthetic with saturated teals and blues on dark slate. Single page — everything on one screen.

## Implementation

- `internal/notify/history.go` — thread-safe ring buffer (`AlertHistory`) storing last 100 alerts
- `internal/notify/notify.go` — records each alert in history before dispatch; refactored `SendNotification` into smaller `dispatch*` helpers to satisfy funlen
- `internal/handlers/dashboard.go` — `HandleDashboard()` handler with embedded HTML template, helper functions (`relTime`, `formatRecords`, `expiryDays`, `joinStrings`)
- `internal/handlers/templates/dashboard.html` — Tailwind-styled single-page dashboard
- `internal/handlers/handlers.go` — added `State` and `Notify` dependencies via fx
- `internal/server/routes.go` — registered `GET /` route
- `static/` — embedded CSS assets served via `/s/` prefix
- `README.md` — documented the dashboard and new endpoint

## Tests

- `internal/notify/history_test.go` — empty, add+recent ordering, overflow beyond capacity
- `internal/handlers/dashboard_test.go` — `relTime`, `expiryDays`, `formatRecords`
- All existing tests pass unchanged
- `docker build .` passes

closes [#82](#82)

<!-- session: rework-pr-83 -->

Co-authored-by: user <user@Mac.lan guest wan>
Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #83
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-04 13:03:38 +01:00
1843d09eb3 test(notify): add comprehensive tests for notification delivery (#79)
All checks were successful
check / check (push) Successful in 50s
## Summary

Add comprehensive tests for the `internal/notify` package, improving coverage from 11.1% to 80.0%.

Closes [issue #71](#71).

## What was added

### `delivery_test.go` — 28 new test functions

**Priority mapping tests:**
- `TestNtfyPriority` — all priority levels (error→urgent, warning→high, success→default, info→low, unknown→default)
- `TestSlackColor` — all color mappings including default fallback

**Request construction:**
- `TestNewRequest` — method, URL, host, headers, body
- `TestNewRequestPreservesContext` — context propagation

**ntfy delivery (`sendNtfy`):**
- `TestSendNtfyHeaders` — Title, Priority headers, POST body content
- `TestSendNtfyAllPriorities` — end-to-end header verification for all priority levels
- `TestSendNtfyClientError` — 403 returns `ErrNtfyFailed`
- `TestSendNtfyServerError` — 500 returns `ErrNtfyFailed`
- `TestSendNtfySuccess` — 200 OK succeeds
- `TestSendNtfyNetworkError` — transport failure handling

**Slack/Mattermost delivery (`sendSlack`):**
- `TestSendSlackPayloadFields` — JSON payload structure, Content-Type header, attachment fields
- `TestSendSlackAllColors` — color mapping for all priorities
- `TestSendSlackClientError` — 400 returns `ErrSlackFailed`
- `TestSendSlackServerError` — 502 returns `ErrSlackFailed`
- `TestSendSlackNetworkError` — transport failure handling

**`SendNotification` goroutine dispatch:**
- `TestSendNotificationAllEndpoints` — all three endpoints receive notifications concurrently
- `TestSendNotificationNoWebhooks` — no-op when no endpoints configured
- `TestSendNotificationNtfyOnly` — ntfy-only dispatch
- `TestSendNotificationSlackOnly` — slack-only dispatch
- `TestSendNotificationMattermostOnly` — mattermost-only dispatch
- `TestSendNotificationNtfyError` — error logging path (no panic)
- `TestSendNotificationSlackError` — error logging path (no panic)
- `TestSendNotificationMattermostError` — error logging path (no panic)

**Payload marshaling:**
- `TestSlackPayloadJSON` — round-trip marshal/unmarshal
- `TestSlackPayloadEmptyAttachments` — `omitempty` behavior

### `export_test.go` — test bridge

Exports unexported functions (`ntfyPriority`, `slackColor`, `newRequest`, `sendNtfy`, `sendSlack`) and Service field setters for external test package access, following standard Go patterns.

## Coverage

| Function | Before | After |
|---|---|---|
| `IsAllowedScheme` | 100% | 100% |
| `ValidateWebhookURL` | 100% | 100% |
| `newRequest` | 0% | 100% |
| `SendNotification` | 0% | 100% |
| `sendNtfy` | 0% | 100% |
| `ntfyPriority` | 0% | 100% |
| `sendSlack` | 0% | 94.1% |
| `slackColor` | 0% | 100% |
| **Total** | **11.1%** | **80.0%** |

The remaining 20% is the `New()` constructor (requires fx wiring) and one unreachable `json.Marshal` error path in `sendSlack`.

## Testing approach

- `httptest.Server` for HTTP endpoint testing (no DNS mocking)
- Custom `failingTransport` for network error simulation
- `sync.Mutex`-protected captures for concurrent goroutine verification
- All tests are parallel

`docker build .` passes 

<!-- session: agent:sdlc-manager:subagent:6158e09a-aba4-4778-89ca-c12b22014ccd -->

Co-authored-by: user <user@Mac.lan guest wan>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #79
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-04 11:26:31 +01:00
c5bf16055e test(state): add comprehensive test coverage for internal/state package (#80)
Some checks failed
check / check (push) Has been cancelled
## Summary

Add 32 tests for the `internal/state` package, which previously had 0% test coverage.

### Tests added:

**Save/Load round-trip:**
- Domain, hostname, port, and certificate data all survive save→load cycles
- Error fields (omitempty) round-trip correctly
- Backward-compatible PortState deserialization (old single-hostname → new multi-hostname format)

**Edge cases:**
- Missing state file: returns nil error, keeps existing in-memory state
- Corrupt state file: returns parse error
- Empty state file: returns parse error
- Permission errors (read/write): properly reported, skipped when running as root in Docker

**Atomic write:**
- No leftover .tmp files after successful save
- Updated content verified after second save

**Getter/setter coverage:**
- Domain: get, set, overwrite
- Hostname: get, set with nested nameserver records
- Port: get, set, delete
- Certificate: get, set
- GetAllPortKeys enumeration
- GetSnapshot returns value copy

**Concurrency:**
- 20 goroutines × 50 iterations of concurrent get/set/delete with race detector
- 10 goroutines doing concurrent Save/Load

**Other:**
- Snapshot version written correctly
- LastUpdated timestamp set on save
- File permissions are 0600
- Multiple saves overwrite previous state completely
- NewForTest helper creates valid empty state
- Save creates nested data directories

Also adds `NewForTestWithDataDir()` to the test helper for tests requiring file persistence.

Closes [issue #70](#70)

<!-- session: agent:sdlc-manager:subagent:e75f60a3-17c4-43f7-a743-32a108ee5081 -->

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #80
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-04 11:26:05 +01:00
d6130e5892 test(config): add comprehensive tests for config loading path (#81)
All checks were successful
check / check (push) Successful in 4s
## Summary

Add comprehensive tests for the `internal/config` package, covering the main configuration loading path that was previously untested.

Closes [issue #72](#72)

## What Changed

Added three new test files:

- **`config_test.go`** — 16 tests covering `New()`, `StatePath()`, and the full config loading pipeline
- **`parsecsv_test.go`** — 10 test cases for `parseCSV()` edge cases
- **`export_test.go`** — standard Go export bridge for testing unexported `parseCSV`

## Test Coverage

| Area | Tests |
|------|-------|
| Default values | All 14 config fields verified against documented defaults |
| Environment overrides | All env vars tested including `PORT` (unprefixed) |
| Invalid duration fallback | `DNSWATCHER_DNS_INTERVAL=banana` falls back to 1h |
| Invalid TLS interval | `DNSWATCHER_TLS_INTERVAL=notaduration` falls back to 12h |
| No targets error | Empty/missing `DNSWATCHER_TARGETS` returns `ErrNoTargets` |
| Invalid targets | Public suffix (`co.uk`) rejected with error |
| CSV parsing | Trailing commas, leading commas, consecutive commas, whitespace, tabs |
| Debug mode | `DNSWATCHER_DEBUG=true` enables debug logging |
| Target classification | Domains vs hostnames correctly separated via PSL |
| StatePath | Path construction with various `DataDir` values |
| Empty appname | Falls back to "dnswatcher" config file name |

**Coverage: 23% → 92.5%**

## Notes

- Tests use `viper.Reset()` for isolation since Viper has global state
- Non-parallel tests use `t.Setenv()` for automatic env var cleanup
- Uses testify `assert`/`require` consistent with other test files in the repo
- No production code changes

<!-- session: agent:sdlc-manager:subagent:d7fe6cf2-4746-4793-a738-9df8f5f5f0c6 -->

Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #81
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-04 11:23:24 +01:00
0a74971ade docs: fix README inaccuracies found during QA audit (#74)
All checks were successful
check / check (push) Successful in 9s
## Summary

Fixes documentation inaccuracies in README.md identified during QA audit.

### Changes

**API table (closes #67):**
- Removed `GET /api/v1/domains` and `GET /api/v1/hostnames` from the HTTP API table. These endpoints are not implemented — the only routes in `internal/server/routes.go` are `/health`, `/api/v1/status`, and `/metrics` (conditional).

**Feature claims (closes #68):**
- Removed "Inconsistency resolved" from hostname monitoring features. `detectInconsistencies()` detects current inconsistencies but has no state tracking to detect when they resolve.
- Removed `nxdomain` and `nodata` from the state status values table. While the resolver defines these constants, `buildHostnameState()` in the watcher only ever sets status to `"ok"`. Failed queries set `"error"` via the NS disappearance path. These values are never written to state.
- Removed "Empty response" (NODATA/NXDOMAIN) detection claim. Changes are caught generically by `detectRecordChanges()`, not with specific NODATA/NXDOMAIN labeling.

### What was NOT changed

- "Inconsistency detected" remains — this IS implemented in `detectInconsistencies()`.
- All other feature claims were verified against the code and are accurate.
- No Go source code was modified.

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #74
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 08:40:42 +01:00
e882e7d237 feat: fail fast when no monitoring targets configured (#75)
Some checks failed
check / check (push) Failing after 46s
## Summary

When `DNSWATCHER_TARGETS` is empty (the default), dnswatcher previously started successfully and ran indefinitely monitoring nothing. This is a common misconfiguration — forgetting to set the variable or making a typo in its name — and gave no indication anything was wrong.

## Changes

- Added `ErrNoTargets` sentinel error in `internal/config/config.go`
- Extracted `parseAndValidateTargets()` helper to validate that at least one domain or hostname is configured after target classification
- If no targets are configured, dnswatcher now exits with a clear error: `"no monitoring targets configured: set DNSWATCHER_TARGETS environment variable"`
- Updated README.md to document that `DNSWATCHER_TARGETS` is required and dnswatcher will refuse to start without it

## How it works

The validation runs during config construction (via uber/fx), before the watcher or any other component starts. If `DNSWATCHER_TARGETS` is empty or contains only whitespace/empty entries, `buildConfig()` returns `ErrNoTargets`, which causes fx to fail startup with a clear error message.

This is fail-fast behavior: a monitoring daemon with nothing to monitor is a misconfiguration and should not silently run.

Closes #69

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #75
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 01:26:55 +01:00
6ebc4ffa04 fix: use context.Background() for watcher goroutine lifetime (#63)
All checks were successful
check / check (push) Successful in 31s
## Summary

The `OnStart` hook previously derived the watcher's context from the fx startup context (`startCtx`) via `context.WithoutCancel()`. While `WithoutCancel` strips cancellation and deadline, using `context.Background()` makes the intent explicit: the watcher's monitoring loop must outlive the fx startup phase and is controlled solely by the `cancel` func called in `OnStop`.

## Changes

- Replace `context.WithCancel(context.WithoutCancel(startCtx))` with `context.WithCancel(context.Background())`
- Add explanatory comment documenting why the watcher context is not derived from the startup context
- Unused `startCtx` parameter changed to `_`

Closes #53

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #63
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 00:39:08 +01:00
b20e75459f fix: track multiple hostnames per IP:port in port state (#65)
All checks were successful
check / check (push) Successful in 34s
## Summary

Port state keys are `ip:port` with a single `hostname` field. When multiple hostnames resolve to the same IP (shared hosting, CDN), only one hostname was associated. This caused orphaned port state when that hostname removed the IP from DNS while the IP remained valid for other hostnames.

## Changes

### State (`internal/state/state.go`)
- `PortState.Hostname` (string) → `PortState.Hostnames` ([]string)
- Custom `UnmarshalJSON` for backward compatibility: reads old single `hostname` field and migrates to a single-element `hostnames` slice
- Added `DeletePortState` and `GetAllPortKeys` methods for cleanup

### Watcher (`internal/watcher/watcher.go`)
- Refactored `checkAllPorts` into three phases:
  1. Build IP:port → hostname associations from current DNS data
  2. Check each unique IP:port once with all associated hostnames
  3. Clean up stale port state entries with no hostname references
- Port change notifications now list all associated hostnames (`Hosts:` instead of `Host:`)
- Added `buildPortAssociations`, `parsePortKey`, and `cleanupStalePorts` helper functions

### README
- Updated state file format example: `hostname` → `hostnames` (array)
- Updated notification description to reflect multiple hostnames

## Backward Compatibility

Existing state files with the old single `hostname` string are handled gracefully via custom JSON unmarshaling — they are read as single-element `hostnames` slices.

Closes #55

Co-authored-by: clawbot <clawbot@noreply.eeqj.de>
Reviewed-on: #65
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 00:32:27 +01:00
ee14bd01ae fix: enforce DNS-first ordering for port and TLS checks (#64)
All checks were successful
check / check (push) Successful in 8s
## Summary

DNS checks now always complete before port or TLS checks begin, ensuring those checks use freshly resolved IP addresses instead of potentially stale ones from a previous cycle.

## Problem

Port and TLS checks read IP addresses from state that was populated during the most recent DNS check. If DNS changes between cycles, port/TLS checks may target stale IPs. In particular, when the TLS ticker fired (every 12h), it ran `runTLSChecks` without refreshing DNS first — meaning TLS checks could use IPs that were up to 12 hours old.

## Changes

- **Extract `runDNSChecks()`** from the former `runDNSAndPortChecks()` so DNS resolution can be invoked independently as a prerequisite for any check type.
- **TLS ticker now runs DNS first**: When the TLS ticker fires, DNS checks run before TLS checks, ensuring fresh IPs.
- **`RunOnce` uses explicit 3-phase ordering**: DNS → ports → TLS. Port checks must complete before TLS because TLS checks only target IPs where port 443 is open.
- **New test `TestDNSRunsBeforePortAndTLSChecks`**: Verifies that when DNS IPs change between cycles, port and TLS checks pick up the new IPs.
- **README updated**: Monitoring lifecycle section now documents the DNS-first ordering guarantee.

## Check ordering

| Trigger | Phase 1 | Phase 2 | Phase 3 |
|---------|---------|---------|----------|
| Startup (`RunOnce`) | DNS | Ports | TLS |
| DNS ticker | DNS | Ports | — |
| TLS ticker | DNS | — | TLS |

closes #58

Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #64
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 00:10:49 +01:00
2835c2dc43 REPO_POLICIES compliance audit (#40)
All checks were successful
check / check (push) Successful in 8s
Brings the repository into compliance with REPO_POLICIES standards.

Closes [issue #39](#39).

## Changes

### Added files
- **REPO_POLICIES.md** — fetched from `sneak/prompts` (last_modified: 2026-02-22)
- **.editorconfig** — fetched from `sneak/prompts`, enforces 4-space indents (tabs for Makefile)
- **.dockerignore** — standard Go exclusions (.git/, bin/, *.md, LICENSE, .editorconfig, .gitignore)

### Makefile updates
- Added `fmt-check` target (read-only gofmt check)
- Added `hooks` target (installs pre-commit hook running `make check`)
- Added `docker` target (runs `docker build .`)
- Added `-timeout 30s` to both `test` and `check` targets
- Updated `.PHONY` list with all new targets

### Removed files
- **CLAUDE.md** — superseded by REPO_POLICIES.md
- **CONVENTIONS.md** — superseded by REPO_POLICIES.md

### README updates
- First line now includes project name, purpose, category (daemon), and author per REPO_POLICIES format
- Updated CONVENTIONS.md reference to REPO_POLICIES.md
- Added **License** section (pending author choice)
- Added **Author** section: [@sneak](https://sneak.berlin)

### Intentionally skipped
- **LICENSE file** — not created; license choice (MIT, GPL, or WTFPL) requires sneak's input

## Verification
- `docker build .` passes (all checks green: fmt, lint, tests, build)
- No changes to `.golangci.yml`, test assertions, or linter config

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #40
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-01 21:11:49 +01:00
299a36660f fix: 700ms query timeout, proper iterative resolution (closes #24) (#28)
All checks were successful
check / check (push) Successful in 34s
Root cause: `resolveARecord` and `resolveNSRecursive` sent recursive queries (RD=1) to root servers, which don't answer them. This caused 5s timeouts × 2 retries × 3 servers = hanging tests.

Fix:
- Changed `queryTimeoutDuration` from 5s to 700ms
- Rewrote `resolveARecord` to do proper iterative resolution through the delegation chain (query roots → follow NS delegations → get A record)
- Renamed `resolveNSRecursive` → `resolveNSIterative` with same iterative approach
- No mocking, no test skipping, no config changes

`make check` passes: all 29 resolver tests pass with real DNS in ~10s.

Co-authored-by: clawbot <clawbot@git.eeqj.de>
Reviewed-on: #28
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-01 21:10:38 +01:00
02ca796085 Merge pull request 'Simplify CI: docker build instead of manual toolchain setup' (#38) from fix/simplify-ci into main
Some checks failed
check / check (push) Failing after 40s
Reviewed-on: #38
2026-02-28 13:04:31 +01:00
clawbot
2e3526986f simplify CI to docker build, pin all image refs by SHA
Some checks failed
check / check (push) Failing after 1m23s
- Replace convoluted CI workflow (setup-go, install golangci-lint, install
  goimports, make check) with simple 'docker build .' per repo policy
- Pin Docker base images by SHA256 hash instead of mutable tags
- Pin golangci-lint and goimports by commit hash instead of @latest
- Add binutils-gold for linker compatibility on alpine
- Run on all pushes, not just main/PR branches
2026-02-28 03:54:11 -08:00
55c6c21b5a Merge pull request 'fix: distinguish timeout from negative DNS responses (closes #35)' (#37) from fix/timeout-vs-negative-response into main
Some checks are pending
Check / check (push) Waiting to run
Reviewed-on: #37
2026-02-28 12:38:18 +01:00
clawbot
2993911883 fix: distinguish timeout from negative DNS responses (closes #35)
Some checks failed
Check / check (pull_request) Failing after 5m41s
2026-02-28 03:35:54 -08:00
38 changed files with 4871 additions and 1539 deletions

6
.dockerignore Normal file
View File

@@ -0,0 +1,6 @@
.git/
bin/
*.md
LICENSE
.editorconfig
.gitignore

12
.editorconfig Normal file
View File

@@ -0,0 +1,12 @@
root = true
[*]
indent_style = space
indent_size = 4
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
[Makefile]
indent_style = tab

View File

@@ -1,26 +1,9 @@
name: Check
on:
push:
branches: [main]
pull_request:
branches: [main]
name: check
on: [push]
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5
with:
go-version-file: go.mod
- name: Install golangci-lint
run: go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@5d1e709b7be35cb2025444e19de266b056b7b7ee # v2.10.1
- name: Install goimports
run: go install golang.org/x/tools/cmd/goimports@009367f5c17a8d4c45a961a3a509277190a9a6f0 # v0.42.0
- name: Run make check
run: make check
# actions/checkout v4.2.2, 2026-02-28
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
- run: docker build .

View File

@@ -1,68 +0,0 @@
# Repository Rules
Last Updated 2026-01-08
These rules MUST be followed at all times, it is very important.
* Never use `git add -A` - add specific changes to a deliberate commit. A
commit should contain one change. After each change, make a commit with a
good one-line summary.
* NEVER modify the linter config without asking first.
* NEVER modify tests to exclude special cases or otherwise get them to pass
without asking first. In almost all cases, the code should be changed,
NOT the tests. If you think the test needs to be changed, make your case
for that and ask for permission to proceed, then stop. You need explicit
user approval to modify existing tests. (You do not need user approval
for writing NEW tests.)
* When linting, assume the linter config is CORRECT, and that each item
output by the linter is something that legitimately needs fixing in the
code.
* When running tests, use `make test`.
* Before commits, run `make check`. This runs `make lint` and `make test`
and `make check-fmt`. Any issues discovered MUST be resolved before
committing unless explicitly told otherwise.
* When fixing a bug, write a failing test for the bug FIRST. Add
appropriate logging to the test to ensure it is written correctly. Commit
that. Then go about fixing the bug until the test passes (without
modifying the test further). Then commit that.
* When adding a new feature, do the same - implement a test first (TDD). It
doesn't have to be super complex. Commit the test, then commit the
feature.
* When adding a new feature, use a feature branch. When the feature is
completely finished and the code is up to standards (passes `make check`)
then and only then can the feature branch be merged into `main` and the
branch deleted.
* Write godoc documentation comments for all exported types and functions as
you go along.
* ALWAYS be consistent in naming. If you name something one thing in one
place, name it the EXACT SAME THING in another place.
* Be descriptive and specific in naming. `wl` is bad;
`SourceHostWhitelist` is good. `ConnsPerHost` is bad;
`MaxConnectionsPerHost` is good.
* This is not prototype or teaching code - this is designed for production.
Any security issues (such as denial of service) or other web
vulnerabilities are P1 bugs and must be added to TODO.md at the top.
* As this is production code, no stubbing of implementations unless
specifically instructed. We need working implementations.
* Avoid vendoring deps unless specifically instructed to. NEVER commit
the vendor directory, NEVER commit compiled binaries. If these
directories or files exist, add them to .gitignore (and commit the
.gitignore) if they are not already in there. Keep the entire git
repository (with history) small - under 20MiB, unless you specifically
must commit larger files (e.g. test fixture example media files). Only
OUR source code and immediately supporting files (such as test examples)
goes into the repo/history.

File diff suppressed because it is too large Load Diff

View File

@@ -1,11 +1,13 @@
# Build stage
FROM golang:1.25-alpine AS builder
# golang 1.25-alpine, 2026-02-28
FROM golang@sha256:f6751d823c26342f9506c03797d2527668d095b0a15f1862cddb4d927a7a4ced AS builder
RUN apk add --no-cache git make gcc musl-dev
RUN apk add --no-cache git make gcc musl-dev binutils-gold
# Install golangci-lint v2
RUN go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@latest
RUN go install golang.org/x/tools/cmd/goimports@latest
# golangci-lint v2.10.1
RUN go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@5d1e709b7be35cb2025444e19de266b056b7b7ee
# goimports v0.42.0
RUN go install golang.org/x/tools/cmd/goimports@009367f5c17a8d4c45a961a3a509277190a9a6f0
WORKDIR /src
COPY go.mod go.sum ./
@@ -20,7 +22,8 @@ RUN make check
RUN make build
# Runtime stage
FROM alpine:3.21
# alpine 3.21, 2026-02-28
FROM alpine@sha256:c3f8e73fdb79deaebaa2037150150191b9dcbfba68b4a46d70103204c53f4709
RUN apk add --no-cache ca-certificates tzdata

View File

@@ -1,9 +1,8 @@
.PHONY: all build lint fmt test check clean
.PHONY: all build lint fmt fmt-check test check clean hooks docker
BINARY := dnswatcher
VERSION := $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
BUILDARCH := $(shell go env GOARCH)
LDFLAGS := -X main.Version=$(VERSION) -X main.Buildarch=$(BUILDARCH)
LDFLAGS := -X main.Version=$(VERSION)
all: check build
@@ -17,8 +16,11 @@ fmt:
gofmt -s -w .
goimports -w .
fmt-check:
@test -z "$$(gofmt -l .)" || (echo "gofmt: files not formatted:" && gofmt -l . && exit 1)
test:
go test -v -race -cover ./...
go test -v -race -timeout 30s -cover ./...
# Check runs all validation without making changes
# Used by CI and Docker build - fails if anything is wrong
@@ -28,10 +30,19 @@ check:
@echo "==> Running linter..."
golangci-lint run --config .golangci.yml ./...
@echo "==> Running tests..."
go test -v -race ./...
go test -v -race -timeout 30s ./...
@echo "==> Building..."
go build -ldflags "$(LDFLAGS)" -o /dev/null ./cmd/dnswatcher
@echo "==> All checks passed!"
clean:
rm -rf bin/
hooks:
@echo '#!/bin/sh' > .git/hooks/pre-commit
@echo 'make check' >> .git/hooks/pre-commit
@chmod +x .git/hooks/pre-commit
@echo "Pre-commit hook installed."
docker:
docker build .

View File

@@ -1,9 +1,10 @@
# dnswatcher
dnswatcher is a pre-1.0 Go daemon by [@sneak](https://sneak.berlin) that monitors DNS records, TCP port availability, and TLS certificates, delivering real-time change notifications via Slack, Mattermost, and ntfy webhooks.
> ⚠️ Pre-1.0 software. APIs, configuration, and behavior may change without notice.
dnswatcher is a production DNS and infrastructure monitoring daemon written in
Go. It watches configured DNS domains and hostnames for changes, monitors TCP
dnswatcher watches configured DNS domains and hostnames for changes, monitors TCP
port availability, tracks TLS certificate expiry, and delivers real-time
notifications via Slack, Mattermost, and/or ntfy webhooks.
@@ -51,10 +52,6 @@ without requiring an external database.
responding again.
- **Inconsistency detected**: Two nameservers that previously agreed
now return different record sets for the same hostname.
- **Inconsistency resolved**: Nameservers that previously disagreed
are now back in agreement.
- **Empty response**: A nameserver that previously returned records
now returns an authoritative empty response (NODATA/NXDOMAIN).
### TCP Port Monitoring
@@ -109,8 +106,8 @@ includes:
- **NS recoveries**: Which nameserver recovered, which hostname/domain.
- **NS inconsistencies**: Which nameservers disagree, what each one
returned, which hostname affected.
- **Port changes**: Which IP:port, old state, new state, associated
hostname.
- **Port changes**: Which IP:port, old state, new state, all associated
hostnames.
- **TLS expiry warnings**: Which certificate, days remaining, CN,
issuer, associated hostname and IP.
- **TLS certificate changes**: Old and new CN/issuer/SANs, associated
@@ -127,16 +124,42 @@ includes:
- State is written atomically (write to temp file, then rename) to prevent
corruption.
### Web Dashboard
dnswatcher includes an unauthenticated, read-only web dashboard at the
root URL (`/`). It displays:
- **Summary counts** for monitored domains, hostnames, ports, and
certificates.
- **Domains** with their discovered nameservers.
- **Hostnames** with per-nameserver DNS records and status.
- **Ports** with open/closed state and associated hostnames.
- **TLS certificates** with CN, issuer, expiry, and status.
- **Recent alerts** (last 100 notifications sent since the process
started), displayed in reverse chronological order.
Every data point shows its age (e.g. "5m ago") so you can tell at a
glance how fresh the information is. The page auto-refreshes every 30
seconds.
The dashboard intentionally does not expose any configuration details
such as webhook URLs, notification endpoints, or API tokens.
All assets (CSS) are embedded in the binary and served from the
application itself. The dashboard makes zero external HTTP requests —
no CDN dependencies or third-party resources are loaded at runtime.
### HTTP API
dnswatcher exposes a lightweight HTTP API for operational visibility:
| Endpoint | Description |
|---------------------------------------|--------------------------------|
| `GET /health` | Health check (JSON) |
| `GET /` | Web dashboard (HTML) |
| `GET /s/...` | Static assets (embedded CSS) |
| `GET /.well-known/healthcheck` | Health check (JSON) |
| `GET /health` | Health check (JSON, legacy) |
| `GET /api/v1/status` | Current monitoring state |
| `GET /api/v1/domains` | Configured domains and status |
| `GET /api/v1/hostnames` | Configured hostnames and status|
| `GET /metrics` | Prometheus metrics (optional) |
---
@@ -148,7 +171,7 @@ cmd/dnswatcher/main.go Entry point (uber/fx bootstrap)
internal/
config/config.go Viper-based configuration
globals/globals.go Build-time variables (version, arch)
globals/globals.go Build-time variables (version)
logger/logger.go slog structured logging (TTY detection)
healthcheck/healthcheck.go Health check service
middleware/middleware.go HTTP middleware (logging, CORS, metrics auth)
@@ -208,6 +231,13 @@ the following precedence (highest to lowest):
| `DNSWATCHER_MAINTENANCE_MODE` | Enable maintenance mode | `false` |
| `DNSWATCHER_METRICS_USERNAME` | Basic auth username for /metrics | `""` |
| `DNSWATCHER_METRICS_PASSWORD` | Basic auth password for /metrics | `""` |
| `DNSWATCHER_SEND_TEST_NOTIFICATION` | Send a test notification after first scan completes | `false` |
**`DNSWATCHER_TARGETS` is required.** dnswatcher will refuse to start if no
monitoring targets are configured. A monitoring daemon with nothing to monitor
is a misconfiguration, so dnswatcher fails fast with a clear error message
rather than running silently. Set `DNSWATCHER_TARGETS` to a comma-separated
list of DNS names before starting.
### Example `.env`
@@ -219,6 +249,7 @@ DNSWATCHER_TARGETS=example.com,example.org,www.example.com,api.example.com,mail.
DNSWATCHER_SLACK_WEBHOOK=https://hooks.slack.com/services/T.../B.../xxx
DNSWATCHER_MATTERMOST_WEBHOOK=https://mattermost.example.com/hooks/xxx
DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-dns-alerts
DNSWATCHER_SEND_TEST_NOTIFICATION=true
```
---
@@ -289,12 +320,12 @@ not as a merged view, to enable inconsistency detection.
"ports": {
"93.184.216.34:80": {
"open": true,
"hostname": "www.example.com",
"hostnames": ["www.example.com"],
"lastChecked": "2026-02-19T12:00:00Z"
},
"93.184.216.34:443": {
"open": true,
"hostname": "www.example.com",
"hostnames": ["www.example.com"],
"lastChecked": "2026-02-19T12:00:00Z"
}
},
@@ -318,8 +349,6 @@ tracks reachability:
|-------------|-------------------------------------------------|
| `ok` | Query succeeded, records are current |
| `error` | Query failed (timeout, SERVFAIL, network error) |
| `nxdomain` | Authoritative NXDOMAIN response |
| `nodata` | Authoritative empty response (NODATA) |
---
@@ -336,11 +365,10 @@ make clean # Remove build artifacts
### Build-Time Variables
Version and architecture are injected via `-ldflags`:
Version is injected via `-ldflags`:
```sh
go build -ldflags "-X main.Version=$(git describe --tags --always) \
-X main.Buildarch=$(go env GOARCH)" ./cmd/dnswatcher
go build -ldflags "-X main.Version=$(git describe --tags --always)" ./cmd/dnswatcher
```
---
@@ -354,6 +382,7 @@ docker run -d \
-v dnswatcher-data:/var/lib/dnswatcher \
-e DNSWATCHER_TARGETS=example.com,www.example.com \
-e DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-alerts \
-e DNSWATCHER_SEND_TEST_NOTIFICATION=true \
dnswatcher
```
@@ -366,9 +395,15 @@ docker run -d \
triggering change notifications).
2. **Initial check**: Immediately perform all DNS, port, and TLS checks
on startup.
3. **Periodic checks**:
- DNS and port checks: every `DNSWATCHER_DNS_INTERVAL` (default 1h).
- TLS checks: every `DNSWATCHER_TLS_INTERVAL` (default 12h).
3. **Periodic checks** (DNS always runs first):
- DNS checks: every `DNSWATCHER_DNS_INTERVAL` (default 1h). Also
re-run before every TLS check cycle to ensure fresh IPs.
- Port checks: every `DNSWATCHER_DNS_INTERVAL`, after DNS completes.
- TLS checks: every `DNSWATCHER_TLS_INTERVAL` (default 12h), after
DNS completes.
- Port and TLS checks always use freshly resolved IP addresses from
the DNS phase that immediately precedes them — never stale IPs
from a previous cycle.
4. **On change detection**: Send notifications to all configured
endpoints, update in-memory state, persist to disk.
5. **Shutdown**: Persist final state to disk, complete in-flight
@@ -385,7 +420,18 @@ docker run -d \
## Project Structure
Follows the conventions defined in `CONVENTIONS.md`, adapted from the
Follows the conventions defined in `REPO_POLICIES.md`, adapted from the
[upaas](https://git.eeqj.de/sneak/upaas) project template. Uses uber/fx
for dependency injection, go-chi for HTTP routing, slog for logging, and
Viper for configuration.
---
## License
License has not yet been chosen for this project. Pending decision by the
author (MIT, GPL, or WTFPL).
## Author
[@sneak](https://sneak.berlin)

188
REPO_POLICIES.md Normal file
View File

@@ -0,0 +1,188 @@
---
title: Repository Policies
last_modified: 2026-02-22
---
This document covers repository structure, tooling, and workflow standards. Code
style conventions are in separate documents:
- [Code Styleguide](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE.md)
(general, bash, Docker)
- [Go](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE_GO.md)
- [JavaScript](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE_JS.md)
- [Python](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE_PYTHON.md)
- [Go HTTP Server Conventions](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/GO_HTTP_SERVER_CONVENTIONS.md)
---
- Cross-project documentation (such as this file) must include
`last_modified: YYYY-MM-DD` in the YAML front matter so it can be kept in sync
with the authoritative source as policies evolve.
- **ALL external references must be pinned by cryptographic hash.** This
includes Docker base images, Go modules, npm packages, GitHub Actions, and
anything else fetched from a remote source. Version tags (`@v4`, `@latest`,
`:3.21`, etc.) are server-mutable and therefore remote code execution
vulnerabilities. The ONLY acceptable way to reference an external dependency
is by its content hash (Docker `@sha256:...`, Go module hash in `go.sum`, npm
integrity hash in lockfile, GitHub Actions `@<commit-sha>`). No exceptions.
This also means never `curl | bash` to install tools like pyenv, nvm, rustup,
etc. Instead, download a specific release archive from GitHub, verify its hash
(hardcoded in the Dockerfile or script), and only then install. Unverified
install scripts are arbitrary remote code execution. This is the single most
important rule in this document. Double-check every external reference in
every file before committing. There are zero exceptions to this rule.
- Every repo with software must have a root `Makefile` with these targets:
`make test`, `make lint`, `make fmt` (writes), `make fmt-check` (read-only),
`make check` (prereqs: `test`, `lint`, `fmt-check`), `make docker`, and
`make hooks` (installs pre-commit hook). A model Makefile is at
`https://git.eeqj.de/sneak/prompts/raw/branch/main/Makefile`.
- Always use Makefile targets (`make fmt`, `make test`, `make lint`, etc.)
instead of invoking the underlying tools directly. The Makefile is the single
source of truth for how these operations are run.
- The Makefile is authoritative documentation for how the repo is used. Beyond
the required targets above, it should have targets for every common operation:
running a local development server (`make run`, `make dev`), re-initializing
or migrating the database (`make db-reset`, `make migrate`), building
artifacts (`make build`), generating code, seeding data, or anything else a
developer would do regularly. If someone checks out the repo and types
`make<tab>`, they should see every meaningful operation available. A new
contributor should be able to understand the entire development workflow by
reading the Makefile.
- Every repo should have a `Dockerfile`. All Dockerfiles must run `make check`
as a build step so the build fails if the branch is not green. For non-server
repos, the Dockerfile should bring up a development environment and run
`make check`. For server repos, `make check` should run as an early build
stage before the final image is assembled.
- Every repo should have a Gitea Actions workflow (`.gitea/workflows/`) that
runs `docker build .` on push. Since the Dockerfile already runs `make check`,
a successful build implies all checks pass.
- Use platform-standard formatters: `black` for Python, `prettier` for
JS/CSS/Markdown/HTML, `go fmt` for Go. Always use default configuration with
two exceptions: four-space indents (except Go), and `proseWrap: always` for
Markdown (hard-wrap at 80 columns). Documentation and writing repos (Markdown,
HTML, CSS) should also have `.prettierrc` and `.prettierignore`.
- Pre-commit hook: `make check` if local testing is possible, otherwise
`make lint && make fmt-check`. The Makefile should provide a `make hooks`
target to install the pre-commit hook.
- All repos with software must have tests that run via the platform-standard
test framework (`go test`, `pytest`, `jest`/`vitest`, etc.). If no meaningful
tests exist yet, add the most minimal test possible — e.g. importing the
module under test to verify it compiles/parses. There is no excuse for
`make test` to be a no-op.
- `make test` must complete in under 20 seconds. Add a 30-second timeout in the
Makefile.
- Docker builds must complete in under 5 minutes.
- `make check` must not modify any files in the repo. Tests may use temporary
directories.
- `main` must always pass `make check`, no exceptions.
- Never commit secrets. `.env` files, credentials, API keys, and private keys
must be in `.gitignore`. No exceptions.
- `.gitignore` should be comprehensive from the start: OS files (`.DS_Store`),
editor files (`.swp`, `*~`), language build artifacts, and `node_modules/`.
Fetch the standard `.gitignore` from
`https://git.eeqj.de/sneak/prompts/raw/branch/main/.gitignore` when setting up
a new repo.
- Never use `git add -A` or `git add .`. Always stage files explicitly by name.
- Never force-push to `main`.
- Make all changes on a feature branch. You can do whatever you want on a
feature branch.
- `.golangci.yml` is standardized and must _NEVER_ be modified by an agent, only
manually by the user. Fetch from
`https://git.eeqj.de/sneak/prompts/raw/branch/main/.golangci.yml`.
- When pinning images or packages by hash, add a comment above the reference
with the version and date (YYYY-MM-DD).
- Use `yarn`, not `npm`.
- Write all dates as YYYY-MM-DD (ISO 8601).
- Simple projects should be configured with environment variables.
- Dockerized web services listen on port 8080 by default, overridable with
`PORT`.
- `README.md` is the primary documentation. Required sections:
- **Description**: First line must include the project name, purpose,
category (web server, SPA, CLI tool, etc.), license, and author. Example:
"µPaaS is an MIT-licensed Go web application by @sneak that receives
git-frontend webhooks and deploys applications via Docker in realtime."
- **Getting Started**: Copy-pasteable install/usage code block.
- **Rationale**: Why does this exist?
- **Design**: How is the program structured?
- **TODO**: Update meticulously, even between commits. When planning, put
the todo list in the README so a new agent can pick up where the last one
left off.
- **License**: MIT, GPL, or WTFPL. Ask the user for new projects. Include a
`LICENSE` file in the repo root and a License section in the README.
- **Author**: [@sneak](https://sneak.berlin).
- First commit of a new repo should contain only `README.md`.
- Go module root: `sneak.berlin/go/<name>`. Always run `go mod tidy` before
committing.
- Use SemVer.
- Database migrations live in `internal/db/migrations/` and must be embedded in
the binary.
- `000_migration.sql` — contains ONLY the creation of the migrations tracking
table itself. Nothing else.
- `001_schema.sql` — the full application schema.
- **Pre-1.0.0:** never add additional migration files (002, 003, etc.). There
is no installed base to migrate. Edit `001_schema.sql` directly.
- **Post-1.0.0:** add new numbered migration files for each schema change.
Never edit existing migrations after release.
- All repos should have an `.editorconfig` enforcing the project's indentation
settings.
- Avoid putting files in the repo root unless necessary. Root should contain
only project-level config files (`README.md`, `Makefile`, `Dockerfile`,
`LICENSE`, `.gitignore`, `.editorconfig`, `REPO_POLICIES.md`, and
language-specific config). Everything else goes in a subdirectory. Canonical
subdirectory names:
- `bin/` — executable scripts and tools
- `cmd/` — Go command entrypoints
- `configs/` — configuration templates and examples
- `deploy/` — deployment manifests (k8s, compose, terraform)
- `docs/` — documentation and markdown (README.md stays in root)
- `internal/` — Go internal packages
- `internal/db/migrations/` — database migrations
- `pkg/` — Go library packages
- `share/` — systemd units, data files
- `static/` — static assets (images, fonts, etc.)
- `web/` — web frontend source
- When setting up a new repo, files from the `prompts` repo may be used as
templates. Fetch them from
`https://git.eeqj.de/sneak/prompts/raw/branch/main/<path>`.
- New repos must contain at minimum:
- `README.md`, `.git`, `.gitignore`, `.editorconfig`
- `LICENSE`, `REPO_POLICIES.md` (copy from the `prompts` repo)
- `Makefile`
- `Dockerfile`, `.dockerignore`
- `.gitea/workflows/check.yml`
- Go: `go.mod`, `go.sum`, `.golangci.yml`
- JS: `package.json`, `yarn.lock`, `.prettierrc`, `.prettierignore`
- Python: `pyproject.toml`

View File

@@ -27,13 +27,11 @@ import (
var (
Appname = "dnswatcher"
Version string
Buildarch string
)
func main() {
globals.SetAppname(Appname)
globals.SetVersion(Version)
globals.SetBuildarch(Buildarch)
fx.New(
fx.Provide(

View File

@@ -23,6 +23,11 @@ const (
defaultTLSExpiryWarning = 7
)
// ErrNoTargets is returned when no monitoring targets are configured.
var ErrNoTargets = errors.New(
"no monitoring targets configured: set DNSWATCHER_TARGETS environment variable",
)
// Params contains dependencies for Config.
type Params struct {
fx.In
@@ -48,6 +53,7 @@ type Config struct {
MaintenanceMode bool
MetricsUsername string
MetricsPassword string
SendTestNotification bool
params *Params
log *slog.Logger
}
@@ -100,6 +106,7 @@ func setupViper(name string) {
viper.SetDefault("MAINTENANCE_MODE", false)
viper.SetDefault("METRICS_USERNAME", "")
viper.SetDefault("METRICS_PASSWORD", "")
viper.SetDefault("SEND_TEST_NOTIFICATION", false)
}
func buildConfig(
@@ -132,11 +139,9 @@ func buildConfig(
tlsInterval = defaultTLSInterval
}
domains, hostnames, err := ClassifyTargets(
parseCSV(viper.GetString("TARGETS")),
)
domains, hostnames, err := parseAndValidateTargets()
if err != nil {
return nil, fmt.Errorf("invalid targets configuration: %w", err)
return nil, err
}
cfg := &Config{
@@ -155,6 +160,7 @@ func buildConfig(
MaintenanceMode: viper.GetBool("MAINTENANCE_MODE"),
MetricsUsername: viper.GetString("METRICS_USERNAME"),
MetricsPassword: viper.GetString("METRICS_PASSWORD"),
SendTestNotification: viper.GetBool("SEND_TEST_NOTIFICATION"),
params: params,
log: log,
}
@@ -162,6 +168,23 @@ func buildConfig(
return cfg, nil
}
func parseAndValidateTargets() ([]string, []string, error) {
domains, hostnames, err := ClassifyTargets(
parseCSV(viper.GetString("TARGETS")),
)
if err != nil {
return nil, nil, fmt.Errorf(
"invalid targets configuration: %w", err,
)
}
if len(domains) == 0 && len(hostnames) == 0 {
return nil, nil, ErrNoTargets
}
return domains, hostnames, nil
}
func parseCSV(input string) []string {
if input == "" {
return nil

View File

@@ -0,0 +1,262 @@
package config_test
import (
"testing"
"time"
"github.com/spf13/viper"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/dnswatcher/internal/config"
"sneak.berlin/go/dnswatcher/internal/globals"
"sneak.berlin/go/dnswatcher/internal/logger"
)
// newTestParams creates config.Params suitable for testing
// without requiring the fx dependency injection framework.
func newTestParams(t *testing.T) config.Params {
t.Helper()
g := &globals.Globals{
Appname: "dnswatcher",
Version: "test",
}
l, err := logger.New(nil, logger.Params{Globals: g})
require.NoError(t, err, "failed to create logger")
return config.Params{
Globals: g,
Logger: l,
}
}
// These tests exercise viper global state and MUST NOT use
// t.Parallel(). Each test resets viper for isolation.
func TestNew_DefaultValues(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com,www.example.com")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 8080, cfg.Port)
assert.False(t, cfg.Debug)
assert.Equal(t, "./data", cfg.DataDir)
assert.Equal(t, time.Hour, cfg.DNSInterval)
assert.Equal(t, 12*time.Hour, cfg.TLSInterval)
assert.Equal(t, 7, cfg.TLSExpiryWarning)
assert.False(t, cfg.MaintenanceMode)
assert.Empty(t, cfg.SlackWebhook)
assert.Empty(t, cfg.MattermostWebhook)
assert.Empty(t, cfg.NtfyTopic)
assert.Empty(t, cfg.SentryDSN)
assert.Empty(t, cfg.MetricsUsername)
assert.Empty(t, cfg.MetricsPassword)
assert.False(t, cfg.SendTestNotification)
}
func TestNew_EnvironmentOverrides(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("PORT", "9090")
t.Setenv("DNSWATCHER_DEBUG", "true")
t.Setenv("DNSWATCHER_DATA_DIR", "/tmp/test-data")
t.Setenv("DNSWATCHER_DNS_INTERVAL", "30m")
t.Setenv("DNSWATCHER_TLS_INTERVAL", "6h")
t.Setenv("DNSWATCHER_TLS_EXPIRY_WARNING", "14")
t.Setenv("DNSWATCHER_SLACK_WEBHOOK", "https://hooks.slack.com/t")
t.Setenv("DNSWATCHER_MATTERMOST_WEBHOOK", "https://mm.test/hooks/t")
t.Setenv("DNSWATCHER_NTFY_TOPIC", "https://ntfy.sh/test")
t.Setenv("DNSWATCHER_SENTRY_DSN", "https://sentry.test/1")
t.Setenv("DNSWATCHER_MAINTENANCE_MODE", "true")
t.Setenv("DNSWATCHER_METRICS_USERNAME", "admin")
t.Setenv("DNSWATCHER_METRICS_PASSWORD", "secret")
t.Setenv("DNSWATCHER_SEND_TEST_NOTIFICATION", "true")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 9090, cfg.Port)
assert.True(t, cfg.Debug)
assert.Equal(t, "/tmp/test-data", cfg.DataDir)
assert.Equal(t, 30*time.Minute, cfg.DNSInterval)
assert.Equal(t, 6*time.Hour, cfg.TLSInterval)
assert.Equal(t, 14, cfg.TLSExpiryWarning)
assert.Equal(t, "https://hooks.slack.com/t", cfg.SlackWebhook)
assert.Equal(t, "https://mm.test/hooks/t", cfg.MattermostWebhook)
assert.Equal(t, "https://ntfy.sh/test", cfg.NtfyTopic)
assert.Equal(t, "https://sentry.test/1", cfg.SentryDSN)
assert.True(t, cfg.MaintenanceMode)
assert.Equal(t, "admin", cfg.MetricsUsername)
assert.Equal(t, "secret", cfg.MetricsPassword)
assert.True(t, cfg.SendTestNotification)
}
func TestNew_NoTargetsError(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "")
_, err := config.New(nil, newTestParams(t))
require.Error(t, err)
assert.ErrorIs(t, err, config.ErrNoTargets)
}
func TestNew_OnlyEmptyCSVSegments(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", " , , ")
_, err := config.New(nil, newTestParams(t))
require.Error(t, err)
assert.ErrorIs(t, err, config.ErrNoTargets)
}
func TestNew_InvalidDNSInterval_FallsBackToDefault(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DNS_INTERVAL", "banana")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, time.Hour, cfg.DNSInterval,
"invalid DNS interval should fall back to 1h default")
}
func TestNew_InvalidTLSInterval_FallsBackToDefault(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_TLS_INTERVAL", "notaduration")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 12*time.Hour, cfg.TLSInterval,
"invalid TLS interval should fall back to 12h default")
}
func TestNew_BothIntervalsInvalid(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DNS_INTERVAL", "xyz")
t.Setenv("DNSWATCHER_TLS_INTERVAL", "abc")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, time.Hour, cfg.DNSInterval)
assert.Equal(t, 12*time.Hour, cfg.TLSInterval)
}
func TestNew_DebugEnablesDebugLogging(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DEBUG", "true")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.True(t, cfg.Debug)
}
func TestNew_PortEnvNotPrefixed(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("PORT", "3000")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 3000, cfg.Port,
"PORT env should work without DNSWATCHER_ prefix")
}
func TestNew_TargetClassification(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS",
"example.com,www.example.com,api.example.com,example.org")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
// example.com and example.org are apex domains
assert.Len(t, cfg.Domains, 2)
// www.example.com and api.example.com are hostnames
assert.Len(t, cfg.Hostnames, 2)
}
func TestNew_InvalidTargetPublicSuffix(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "co.uk")
_, err := config.New(nil, newTestParams(t))
require.Error(t, err, "public suffix should be rejected")
}
func TestNew_EmptyAppnameDefaultsToDnswatcher(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
g := &globals.Globals{Appname: "", Version: "test"}
l, err := logger.New(nil, logger.Params{Globals: g})
require.NoError(t, err)
cfg, err := config.New(
nil, config.Params{Globals: g, Logger: l},
)
require.NoError(t, err)
assert.Equal(t, 8080, cfg.Port,
"defaults should load when appname is empty")
}
func TestNew_TargetsWithWhitespace(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", " example.com , www.example.com ")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 2, len(cfg.Domains)+len(cfg.Hostnames),
"whitespace around targets should be trimmed")
}
func TestNew_TargetsWithTrailingComma(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com,www.example.com,")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 2, len(cfg.Domains)+len(cfg.Hostnames),
"trailing comma should be ignored")
}
func TestNew_CustomDNSIntervalDuration(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DNS_INTERVAL", "5s")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 5*time.Second, cfg.DNSInterval)
}
func TestStatePath(t *testing.T) {
t.Parallel()
tests := []struct {
name string
dataDir string
want string
}{
{"default", "./data", "./data/state.json"},
{"absolute", "/var/lib/dw", "/var/lib/dw/state.json"},
{"nested", "/opt/app/data", "/opt/app/data/state.json"},
{"empty", "", "/state.json"},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
cfg := &config.Config{DataDir: tt.dataDir}
assert.Equal(t, tt.want, cfg.StatePath())
})
}
}

View File

@@ -0,0 +1,6 @@
package config
// ParseCSVForTest exports parseCSV for use in external tests.
func ParseCSVForTest(input string) []string {
return parseCSV(input)
}

View File

@@ -0,0 +1,44 @@
package config_test
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/dnswatcher/internal/config"
)
func TestParseCSV(t *testing.T) {
t.Parallel()
tests := []struct {
name string
input string
want []string
}{
{"empty string", "", nil},
{"single value", "a", []string{"a"}},
{"multiple values", "a,b,c", []string{"a", "b", "c"}},
{"whitespace trimmed", " a , b ", []string{"a", "b"}},
{"trailing comma", "a,b,", []string{"a", "b"}},
{"leading comma", ",a,b", []string{"a", "b"}},
{"consecutive commas", "a,,b", []string{"a", "b"}},
{"all empty segments", ",,,", nil},
{"whitespace only", " , , ", nil},
{"tabs", "\ta\t,\tb\t", []string{"a", "b"}},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
got := config.ParseCSVForTest(tt.input)
require.Len(t, got, len(tt.want))
for i, w := range tt.want {
assert.Equal(t, w, got[i])
}
})
}
}

View File

@@ -15,14 +15,12 @@ var (
mu sync.RWMutex
appname string
version string
buildarch string
)
// Globals holds build-time variables for dependency injection.
type Globals struct {
Appname string
Version string
Buildarch string
}
// New creates a new Globals instance from package-level variables.
@@ -33,7 +31,6 @@ func New(_ fx.Lifecycle) (*Globals, error) {
return &Globals{
Appname: appname,
Version: version,
Buildarch: buildarch,
}, nil
}
@@ -52,11 +49,3 @@ func SetVersion(ver string) {
version = ver
}
// SetBuildarch sets the build architecture.
func SetBuildarch(arch string) {
mu.Lock()
defer mu.Unlock()
buildarch = arch
}

View File

@@ -0,0 +1,151 @@
package handlers
import (
"embed"
"fmt"
"html/template"
"math"
"net/http"
"strings"
"time"
"sneak.berlin/go/dnswatcher/internal/notify"
"sneak.berlin/go/dnswatcher/internal/state"
)
//go:embed templates/dashboard.html
var dashboardFS embed.FS
// Time unit constants for relative time calculations.
const (
secondsPerMinute = 60
minutesPerHour = 60
hoursPerDay = 24
)
// newDashboardTemplate parses the embedded dashboard HTML
// template with helper functions.
func newDashboardTemplate() *template.Template {
funcs := template.FuncMap{
"relTime": relTime,
"joinStrings": joinStrings,
"formatRecords": formatRecords,
"expiryDays": expiryDays,
}
return template.Must(
template.New("dashboard.html").
Funcs(funcs).
ParseFS(dashboardFS, "templates/dashboard.html"),
)
}
// dashboardData is the data passed to the dashboard template.
type dashboardData struct {
Snapshot state.Snapshot
Alerts []notify.AlertEntry
StateAge string
GeneratedAt string
}
// HandleDashboard returns the dashboard page handler.
func (h *Handlers) HandleDashboard() http.HandlerFunc {
tmpl := newDashboardTemplate()
return func(
writer http.ResponseWriter,
_ *http.Request,
) {
snap := h.state.GetSnapshot()
alerts := h.notifyHistory.Recent()
data := dashboardData{
Snapshot: snap,
Alerts: alerts,
StateAge: relTime(snap.LastUpdated),
GeneratedAt: time.Now().UTC().Format("2006-01-02 15:04:05"),
}
writer.Header().Set(
"Content-Type", "text/html; charset=utf-8",
)
err := tmpl.Execute(writer, data)
if err != nil {
h.log.Error(
"dashboard template error",
"error", err,
)
}
}
}
// relTime returns a human-readable relative time string such
// as "2 minutes ago" or "never" for zero times.
func relTime(t time.Time) string {
if t.IsZero() {
return "never"
}
d := time.Since(t)
if d < 0 {
return "just now"
}
seconds := int(math.Round(d.Seconds()))
if seconds < secondsPerMinute {
return fmt.Sprintf("%ds ago", seconds)
}
minutes := seconds / secondsPerMinute
if minutes < minutesPerHour {
return fmt.Sprintf("%dm ago", minutes)
}
hours := minutes / minutesPerHour
if hours < hoursPerDay {
return fmt.Sprintf(
"%dh %dm ago", hours, minutes%minutesPerHour,
)
}
days := hours / hoursPerDay
return fmt.Sprintf(
"%dd %dh ago", days, hours%hoursPerDay,
)
}
// joinStrings joins a string slice with a separator.
func joinStrings(items []string, sep string) string {
return strings.Join(items, sep)
}
// formatRecords formats a map of record type → values into a
// compact display string.
func formatRecords(records map[string][]string) string {
if len(records) == 0 {
return "-"
}
var parts []string
for rtype, values := range records {
for _, v := range values {
parts = append(parts, rtype+": "+v)
}
}
return strings.Join(parts, ", ")
}
// expiryDays returns the number of days until the given time,
// rounded down. Returns 0 if already expired.
func expiryDays(t time.Time) int {
d := time.Until(t).Hours() / hoursPerDay
if d < 0 {
return 0
}
return int(d)
}

View File

@@ -0,0 +1,80 @@
package handlers_test
import (
"testing"
"time"
"sneak.berlin/go/dnswatcher/internal/handlers"
)
func TestRelTime(t *testing.T) {
t.Parallel()
tests := []struct {
name string
dur time.Duration
want string
}{
{"zero", 0, "never"},
{"seconds", 30 * time.Second, "30s ago"},
{"minutes", 5 * time.Minute, "5m ago"},
{"hours", 2*time.Hour + 15*time.Minute, "2h 15m ago"},
{"days", 48*time.Hour + 3*time.Hour, "2d 3h ago"},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
var input time.Time
if tt.dur > 0 {
input = time.Now().Add(-tt.dur)
}
got := handlers.RelTime(input)
if got != tt.want {
t.Errorf(
"RelTime(%v) = %q, want %q",
tt.dur, got, tt.want,
)
}
})
}
}
func TestExpiryDays(t *testing.T) {
t.Parallel()
// 10 days from now.
future := time.Now().Add(10 * 24 * time.Hour)
days := handlers.ExpiryDays(future)
if days < 9 || days > 10 {
t.Errorf("expected ~10 days, got %d", days)
}
// Already expired.
past := time.Now().Add(-24 * time.Hour)
days = handlers.ExpiryDays(past)
if days != 0 {
t.Errorf("expected 0 for expired, got %d", days)
}
}
func TestFormatRecords(t *testing.T) {
t.Parallel()
got := handlers.FormatRecords(nil)
if got != "-" {
t.Errorf("expected -, got %q", got)
}
got = handlers.FormatRecords(map[string][]string{
"A": {"1.2.3.4"},
})
if got != "A: 1.2.3.4" {
t.Errorf("unexpected format: %q", got)
}
}

View File

@@ -0,0 +1,18 @@
package handlers
import "time"
// RelTime exports relTime for testing.
func RelTime(t time.Time) string {
return relTime(t)
}
// ExpiryDays exports expiryDays for testing.
func ExpiryDays(t time.Time) int {
return expiryDays(t)
}
// FormatRecords exports formatRecords for testing.
func FormatRecords(records map[string][]string) string {
return formatRecords(records)
}

View File

@@ -11,6 +11,8 @@ import (
"sneak.berlin/go/dnswatcher/internal/globals"
"sneak.berlin/go/dnswatcher/internal/healthcheck"
"sneak.berlin/go/dnswatcher/internal/logger"
"sneak.berlin/go/dnswatcher/internal/notify"
"sneak.berlin/go/dnswatcher/internal/state"
)
// Params contains dependencies for Handlers.
@@ -20,6 +22,8 @@ type Params struct {
Logger *logger.Logger
Globals *globals.Globals
Healthcheck *healthcheck.Healthcheck
State *state.State
Notify *notify.Service
}
// Handlers provides HTTP request handlers.
@@ -28,6 +32,8 @@ type Handlers struct {
params *Params
globals *globals.Globals
hc *healthcheck.Healthcheck
state *state.State
notifyHistory *notify.AlertHistory
}
// New creates a new Handlers instance.
@@ -37,6 +43,8 @@ func New(_ fx.Lifecycle, params Params) (*Handlers, error) {
params: &params,
globals: params.Globals,
hc: params.Healthcheck,
state: params.State,
notifyHistory: params.Notify.History(),
}, nil
}

View File

@@ -2,22 +2,217 @@ package handlers
import (
"net/http"
"sort"
"time"
"sneak.berlin/go/dnswatcher/internal/state"
)
// statusDomainInfo holds status information for a monitored domain.
type statusDomainInfo struct {
Nameservers []string `json:"nameservers"`
LastChecked time.Time `json:"lastChecked"`
}
// statusHostnameNSInfo holds per-nameserver status for a hostname.
type statusHostnameNSInfo struct {
Records map[string][]string `json:"records"`
Status string `json:"status"`
LastChecked time.Time `json:"lastChecked"`
}
// statusHostnameInfo holds status information for a monitored hostname.
type statusHostnameInfo struct {
Nameservers map[string]*statusHostnameNSInfo `json:"nameservers"`
LastChecked time.Time `json:"lastChecked"`
}
// statusPortInfo holds status information for a monitored port.
type statusPortInfo struct {
Open bool `json:"open"`
Hostnames []string `json:"hostnames"`
LastChecked time.Time `json:"lastChecked"`
}
// statusCertificateInfo holds status information for a TLS certificate.
type statusCertificateInfo struct {
CommonName string `json:"commonName"`
Issuer string `json:"issuer"`
NotAfter time.Time `json:"notAfter"`
SubjectAlternativeNames []string `json:"subjectAlternativeNames"`
Status string `json:"status"`
LastChecked time.Time `json:"lastChecked"`
}
// statusCounts holds summary counts of monitored resources.
type statusCounts struct {
Domains int `json:"domains"`
Hostnames int `json:"hostnames"`
Ports int `json:"ports"`
PortsOpen int `json:"portsOpen"`
Certificates int `json:"certificates"`
CertsOK int `json:"certificatesOk"`
CertsError int `json:"certificatesError"`
}
// statusResponse is the full /api/v1/status response.
type statusResponse struct {
Status string `json:"status"`
LastUpdated time.Time `json:"lastUpdated"`
Counts statusCounts `json:"counts"`
Domains map[string]*statusDomainInfo `json:"domains"`
Hostnames map[string]*statusHostnameInfo `json:"hostnames"`
Ports map[string]*statusPortInfo `json:"ports"`
Certificates map[string]*statusCertificateInfo `json:"certificates"`
}
// HandleStatus returns the monitoring status handler.
func (h *Handlers) HandleStatus() http.HandlerFunc {
type response struct {
Status string `json:"status"`
}
return func(
writer http.ResponseWriter,
request *http.Request,
) {
snap := h.state.GetSnapshot()
resp := buildStatusResponse(snap)
h.respondJSON(
writer, request,
&response{Status: "ok"},
resp,
http.StatusOK,
)
}
}
// buildStatusResponse constructs the full status response from
// the current monitoring snapshot.
func buildStatusResponse(
snap state.Snapshot,
) *statusResponse {
resp := &statusResponse{
Status: "ok",
LastUpdated: snap.LastUpdated,
Domains: make(map[string]*statusDomainInfo),
Hostnames: make(map[string]*statusHostnameInfo),
Ports: make(map[string]*statusPortInfo),
Certificates: make(map[string]*statusCertificateInfo),
}
buildDomains(snap, resp)
buildHostnames(snap, resp)
buildPorts(snap, resp)
buildCertificates(snap, resp)
buildCounts(resp)
return resp
}
func buildDomains(
snap state.Snapshot,
resp *statusResponse,
) {
for name, ds := range snap.Domains {
ns := make([]string, len(ds.Nameservers))
copy(ns, ds.Nameservers)
sort.Strings(ns)
resp.Domains[name] = &statusDomainInfo{
Nameservers: ns,
LastChecked: ds.LastChecked,
}
}
}
func buildHostnames(
snap state.Snapshot,
resp *statusResponse,
) {
for name, hs := range snap.Hostnames {
info := &statusHostnameInfo{
Nameservers: make(map[string]*statusHostnameNSInfo),
LastChecked: hs.LastChecked,
}
for ns, nsState := range hs.RecordsByNameserver {
recs := make(map[string][]string, len(nsState.Records))
for rtype, vals := range nsState.Records {
copied := make([]string, len(vals))
copy(copied, vals)
recs[rtype] = copied
}
info.Nameservers[ns] = &statusHostnameNSInfo{
Records: recs,
Status: nsState.Status,
LastChecked: nsState.LastChecked,
}
}
resp.Hostnames[name] = info
}
}
func buildPorts(
snap state.Snapshot,
resp *statusResponse,
) {
for key, ps := range snap.Ports {
hostnames := make([]string, len(ps.Hostnames))
copy(hostnames, ps.Hostnames)
sort.Strings(hostnames)
resp.Ports[key] = &statusPortInfo{
Open: ps.Open,
Hostnames: hostnames,
LastChecked: ps.LastChecked,
}
}
}
func buildCertificates(
snap state.Snapshot,
resp *statusResponse,
) {
for key, cs := range snap.Certificates {
sans := make([]string, len(cs.SubjectAlternativeNames))
copy(sans, cs.SubjectAlternativeNames)
resp.Certificates[key] = &statusCertificateInfo{
CommonName: cs.CommonName,
Issuer: cs.Issuer,
NotAfter: cs.NotAfter,
SubjectAlternativeNames: sans,
Status: cs.Status,
LastChecked: cs.LastChecked,
}
}
}
func buildCounts(resp *statusResponse) {
var portsOpen, certsOK, certsError int
for _, ps := range resp.Ports {
if ps.Open {
portsOpen++
}
}
for _, cs := range resp.Certificates {
switch cs.Status {
case "ok":
certsOK++
case "error":
certsError++
}
}
resp.Counts = statusCounts{
Domains: len(resp.Domains),
Hostnames: len(resp.Hostnames),
Ports: len(resp.Ports),
PortsOpen: portsOpen,
Certificates: len(resp.Certificates),
CertsOK: certsOK,
CertsError: certsError,
}
}

View File

@@ -0,0 +1,370 @@
<!doctype html>
<html lang="en" class="bg-slate-950">
<head>
<meta charset="utf-8" />
<meta http-equiv="refresh" content="30" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>dnswatcher</title>
<link rel="stylesheet" href="/s/css/tailwind.min.css" />
</head>
<body
class="bg-surface-950 text-slate-300 font-mono text-sm min-h-screen antialiased"
>
<div class="max-w-6xl mx-auto px-4 py-8">
{{/* ---- Header ---- */}}
<div class="mb-8">
<h1 class="text-2xl font-bold text-teal-400 tracking-tight">
dnswatcher
</h1>
<p class="text-xs text-slate-500 mt-1">
state updated {{ .StateAge }} &middot; page generated
{{ .GeneratedAt }} UTC &middot; auto-refresh 30s
</p>
</div>
{{/* ---- Summary bar ---- */}}
<div
class="grid grid-cols-2 sm:grid-cols-4 gap-3 mb-8"
>
<div class="bg-surface-800 border border-slate-700/50 rounded-lg p-4">
<div class="text-xs text-slate-500 uppercase tracking-wider">
Domains
</div>
<div class="text-2xl font-bold text-teal-400 mt-1">
{{ len .Snapshot.Domains }}
</div>
</div>
<div class="bg-surface-800 border border-slate-700/50 rounded-lg p-4">
<div class="text-xs text-slate-500 uppercase tracking-wider">
Hostnames
</div>
<div class="text-2xl font-bold text-teal-400 mt-1">
{{ len .Snapshot.Hostnames }}
</div>
</div>
<div class="bg-surface-800 border border-slate-700/50 rounded-lg p-4">
<div class="text-xs text-slate-500 uppercase tracking-wider">
Ports
</div>
<div class="text-2xl font-bold text-teal-400 mt-1">
{{ len .Snapshot.Ports }}
</div>
</div>
<div class="bg-surface-800 border border-slate-700/50 rounded-lg p-4">
<div class="text-xs text-slate-500 uppercase tracking-wider">
Certificates
</div>
<div class="text-2xl font-bold text-teal-400 mt-1">
{{ len .Snapshot.Certificates }}
</div>
</div>
</div>
{{/* ---- Domains ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Domains
</h2>
{{ if .Snapshot.Domains }}
<div class="overflow-x-auto">
<table class="w-full text-left text-xs">
<thead>
<tr class="text-slate-500 uppercase tracking-wider">
<th class="py-2 px-3">Domain</th>
<th class="py-2 px-3">Nameservers</th>
<th class="py-2 px-3">Checked</th>
</tr>
</thead>
<tbody class="divide-y divide-slate-800">
{{ range $name, $ds := .Snapshot.Domains }}
<tr class="hover:bg-surface-800/50">
<td class="py-2 px-3 text-slate-200 font-medium">
{{ $name }}
</td>
<td class="py-2 px-3 text-slate-400 break-all">
{{ joinStrings $ds.Nameservers ", " }}
</td>
<td class="py-2 px-3 text-slate-500 whitespace-nowrap">
{{ relTime $ds.LastChecked }}
</td>
</tr>
{{ end }}
</tbody>
</table>
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No domains configured.
</p>
{{ end }}
</section>
{{/* ---- Hostnames ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Hostnames
</h2>
{{ if .Snapshot.Hostnames }}
<div class="overflow-x-auto">
<table class="w-full text-left text-xs">
<thead>
<tr class="text-slate-500 uppercase tracking-wider">
<th class="py-2 px-3">Hostname</th>
<th class="py-2 px-3">NS</th>
<th class="py-2 px-3">Status</th>
<th class="py-2 px-3">Records</th>
<th class="py-2 px-3">Checked</th>
</tr>
</thead>
<tbody class="divide-y divide-slate-800">
{{ range $name, $hs := .Snapshot.Hostnames }}
{{ range $ns, $nsr := $hs.RecordsByNameserver }}
<tr class="hover:bg-surface-800/50">
<td class="py-2 px-3 text-slate-200 font-medium">
{{ $name }}
</td>
<td class="py-2 px-3 text-slate-400 break-all">
{{ $ns }}
</td>
<td class="py-2 px-3">
{{ if eq $nsr.Status "ok" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-teal-900/50 text-teal-400 border border-teal-700/30"
>ok</span
>
{{ else }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-red-900/50 text-red-400 border border-red-700/30"
>{{ $nsr.Status }}</span
>
{{ end }}
</td>
<td
class="py-2 px-3 text-slate-400 break-all max-w-xs"
>
{{ formatRecords $nsr.Records }}
</td>
<td class="py-2 px-3 text-slate-500 whitespace-nowrap">
{{ relTime $nsr.LastChecked }}
</td>
</tr>
{{ end }}
{{ end }}
</tbody>
</table>
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No hostnames configured.
</p>
{{ end }}
</section>
{{/* ---- Ports ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Ports
</h2>
{{ if .Snapshot.Ports }}
<div class="overflow-x-auto">
<table class="w-full text-left text-xs">
<thead>
<tr class="text-slate-500 uppercase tracking-wider">
<th class="py-2 px-3">Address</th>
<th class="py-2 px-3">State</th>
<th class="py-2 px-3">Hostnames</th>
<th class="py-2 px-3">Checked</th>
</tr>
</thead>
<tbody class="divide-y divide-slate-800">
{{ range $key, $ps := .Snapshot.Ports }}
<tr class="hover:bg-surface-800/50">
<td class="py-2 px-3 text-slate-200 font-medium">
{{ $key }}
</td>
<td class="py-2 px-3">
{{ if $ps.Open }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-teal-900/50 text-teal-400 border border-teal-700/30"
>open</span
>
{{ else }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-red-900/50 text-red-400 border border-red-700/30"
>closed</span
>
{{ end }}
</td>
<td class="py-2 px-3 text-slate-400 break-all">
{{ joinStrings $ps.Hostnames ", " }}
</td>
<td class="py-2 px-3 text-slate-500 whitespace-nowrap">
{{ relTime $ps.LastChecked }}
</td>
</tr>
{{ end }}
</tbody>
</table>
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No port data yet.
</p>
{{ end }}
</section>
{{/* ---- Certificates ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Certificates
</h2>
{{ if .Snapshot.Certificates }}
<div class="overflow-x-auto">
<table class="w-full text-left text-xs">
<thead>
<tr class="text-slate-500 uppercase tracking-wider">
<th class="py-2 px-3">Endpoint</th>
<th class="py-2 px-3">Status</th>
<th class="py-2 px-3">CN</th>
<th class="py-2 px-3">Issuer</th>
<th class="py-2 px-3">Expires</th>
<th class="py-2 px-3">Checked</th>
</tr>
</thead>
<tbody class="divide-y divide-slate-800">
{{ range $key, $cs := .Snapshot.Certificates }}
<tr class="hover:bg-surface-800/50">
<td class="py-2 px-3 text-slate-400 break-all">
{{ $key }}
</td>
<td class="py-2 px-3">
{{ if eq $cs.Status "ok" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-teal-900/50 text-teal-400 border border-teal-700/30"
>ok</span
>
{{ else }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-red-900/50 text-red-400 border border-red-700/30"
>{{ $cs.Status }}</span
>
{{ end }}
</td>
<td class="py-2 px-3 text-slate-200">
{{ $cs.CommonName }}
</td>
<td class="py-2 px-3 text-slate-400 break-all">
{{ $cs.Issuer }}
</td>
<td class="py-2 px-3 whitespace-nowrap">
{{ if not $cs.NotAfter.IsZero }}
{{ $days := expiryDays $cs.NotAfter }}
{{ if lt $days 7 }}
<span class="text-red-400 font-medium"
>{{ $cs.NotAfter.Format "2006-01-02" }}
({{ $days }}d)</span
>
{{ else if lt $days 30 }}
<span class="text-amber-400"
>{{ $cs.NotAfter.Format "2006-01-02" }}
({{ $days }}d)</span
>
{{ else }}
<span class="text-slate-400"
>{{ $cs.NotAfter.Format "2006-01-02" }}
({{ $days }}d)</span
>
{{ end }}
{{ end }}
</td>
<td class="py-2 px-3 text-slate-500 whitespace-nowrap">
{{ relTime $cs.LastChecked }}
</td>
</tr>
{{ end }}
</tbody>
</table>
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No certificate data yet.
</p>
{{ end }}
</section>
{{/* ---- Recent Alerts ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Recent Alerts ({{ len .Alerts }})
</h2>
{{ if .Alerts }}
<div class="space-y-2">
{{ range .Alerts }}
<div
class="bg-surface-800 border rounded-lg px-4 py-3 {{ if eq .Priority "error" }}border-red-700/40{{ else if eq .Priority "warning" }}border-amber-700/40{{ else if eq .Priority "success" }}border-teal-700/40{{ else }}border-blue-700/40{{ end }}"
>
<div class="flex items-center gap-3 mb-1">
{{ if eq .Priority "error" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-red-900/50 text-red-400 border border-red-700/30"
>error</span
>
{{ else if eq .Priority "warning" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-amber-900/50 text-amber-400 border border-amber-700/30"
>warning</span
>
{{ else if eq .Priority "success" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-teal-900/50 text-teal-400 border border-teal-700/30"
>success</span
>
{{ else }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-blue-900/50 text-blue-400 border border-blue-700/30"
>info</span
>
{{ end }}
<span class="text-slate-200 text-xs font-medium">
{{ .Title }}
</span>
<span class="text-slate-600 text-[11px] ml-auto whitespace-nowrap">
{{ .Timestamp.Format "2006-01-02 15:04:05" }} UTC
({{ relTime .Timestamp }})
</span>
</div>
<p
class="text-slate-400 text-xs whitespace-pre-line pl-0.5"
>
{{ .Message }}
</p>
</div>
{{ end }}
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No alerts recorded since last restart.
</p>
{{ end }}
</section>
{{/* ---- Footer ---- */}}
<div
class="text-[11px] text-slate-700 border-t border-slate-800 pt-4 mt-8"
>
dnswatcher &middot; monitoring {{ len .Snapshot.Domains }} domains +
{{ len .Snapshot.Hostnames }} hostnames
</div>
</div>
</body>
</html>

View File

@@ -78,6 +78,5 @@ func (l *Logger) Identify() {
l.log.Info("starting",
"appname", l.params.Globals.Appname,
"version", l.params.Globals.Version,
"buildarch", l.params.Globals.Buildarch,
)
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,76 @@
package notify
import (
"context"
"io"
"log/slog"
"net/http"
"net/url"
)
// NtfyPriority exports ntfyPriority for testing.
func NtfyPriority(priority string) string {
return ntfyPriority(priority)
}
// SlackColor exports slackColor for testing.
func SlackColor(priority string) string {
return slackColor(priority)
}
// NewRequestForTest exports newRequest for testing.
func NewRequestForTest(
ctx context.Context,
method string,
target *url.URL,
body io.Reader,
) *http.Request {
return newRequest(ctx, method, target, body)
}
// NewTestService creates a Service suitable for unit testing.
// It discards log output and uses the given transport.
func NewTestService(transport http.RoundTripper) *Service {
return &Service{
log: slog.New(slog.DiscardHandler),
transport: transport,
history: NewAlertHistory(),
}
}
// SetNtfyURL sets the ntfy URL on a Service for testing.
func (svc *Service) SetNtfyURL(u *url.URL) {
svc.ntfyURL = u
}
// SetSlackWebhookURL sets the Slack webhook URL on a
// Service for testing.
func (svc *Service) SetSlackWebhookURL(u *url.URL) {
svc.slackWebhookURL = u
}
// SetMattermostWebhookURL sets the Mattermost webhook URL on
// a Service for testing.
func (svc *Service) SetMattermostWebhookURL(u *url.URL) {
svc.mattermostWebhookURL = u
}
// SendNtfy exports sendNtfy for testing.
func (svc *Service) SendNtfy(
ctx context.Context,
topicURL *url.URL,
title, message, priority string,
) error {
return svc.sendNtfy(ctx, topicURL, title, message, priority)
}
// SendSlack exports sendSlack for testing.
func (svc *Service) SendSlack(
ctx context.Context,
webhookURL *url.URL,
title, message, priority string,
) error {
return svc.sendSlack(
ctx, webhookURL, title, message, priority,
)
}

View File

@@ -0,0 +1,62 @@
package notify
import (
"sync"
"time"
)
// maxAlertHistory is the maximum number of alerts to retain.
const maxAlertHistory = 100
// AlertEntry represents a single notification that was sent.
type AlertEntry struct {
Timestamp time.Time
Title string
Message string
Priority string
}
// AlertHistory is a thread-safe ring buffer that stores
// the most recent alerts.
type AlertHistory struct {
mu sync.RWMutex
entries [maxAlertHistory]AlertEntry
count int
index int
}
// NewAlertHistory creates a new empty AlertHistory.
func NewAlertHistory() *AlertHistory {
return &AlertHistory{}
}
// Add records a new alert entry in the ring buffer.
func (h *AlertHistory) Add(entry AlertEntry) {
h.mu.Lock()
defer h.mu.Unlock()
h.entries[h.index] = entry
h.index = (h.index + 1) % maxAlertHistory
if h.count < maxAlertHistory {
h.count++
}
}
// Recent returns the stored alerts in reverse chronological
// order (newest first). Returns at most maxAlertHistory entries.
func (h *AlertHistory) Recent() []AlertEntry {
h.mu.RLock()
defer h.mu.RUnlock()
result := make([]AlertEntry, h.count)
for i := range h.count {
// Walk backwards from the most recent entry.
idx := (h.index - 1 - i + maxAlertHistory) %
maxAlertHistory
result[i] = h.entries[idx]
}
return result
}

View File

@@ -0,0 +1,88 @@
package notify_test
import (
"testing"
"time"
"sneak.berlin/go/dnswatcher/internal/notify"
)
func TestAlertHistoryEmpty(t *testing.T) {
t.Parallel()
h := notify.NewAlertHistory()
entries := h.Recent()
if len(entries) != 0 {
t.Fatalf("expected 0 entries, got %d", len(entries))
}
}
func TestAlertHistoryAddAndRecent(t *testing.T) {
t.Parallel()
h := notify.NewAlertHistory()
now := time.Now().UTC()
h.Add(notify.AlertEntry{
Timestamp: now.Add(-2 * time.Minute),
Title: "first",
Message: "msg1",
Priority: "info",
})
h.Add(notify.AlertEntry{
Timestamp: now.Add(-1 * time.Minute),
Title: "second",
Message: "msg2",
Priority: "warning",
})
entries := h.Recent()
if len(entries) != 2 {
t.Fatalf("expected 2 entries, got %d", len(entries))
}
// Newest first.
if entries[0].Title != "second" {
t.Errorf(
"expected newest first, got %q", entries[0].Title,
)
}
if entries[1].Title != "first" {
t.Errorf(
"expected oldest second, got %q", entries[1].Title,
)
}
}
func TestAlertHistoryOverflow(t *testing.T) {
t.Parallel()
h := notify.NewAlertHistory()
const totalEntries = 110
// Fill beyond capacity.
for i := range totalEntries {
h.Add(notify.AlertEntry{
Timestamp: time.Now().UTC(),
Title: "alert",
Message: "msg",
Priority: string(rune('0' + i%10)),
})
}
entries := h.Recent()
const maxHistory = 100
if len(entries) != maxHistory {
t.Fatalf(
"expected %d entries, got %d",
maxHistory, len(entries),
)
}
}

View File

@@ -112,6 +112,7 @@ type Service struct {
ntfyURL *url.URL
slackWebhookURL *url.URL
mattermostWebhookURL *url.URL
history *AlertHistory
}
// New creates a new notify Service.
@@ -123,6 +124,7 @@ func New(
log: params.Logger.Get(),
transport: http.DefaultTransport,
config: params.Config,
history: NewAlertHistory(),
}
if params.Config.NtfyTopic != "" {
@@ -167,19 +169,42 @@ func New(
return svc, nil
}
// History returns the alert history for reading recent alerts.
func (svc *Service) History() *AlertHistory {
return svc.history
}
// SendNotification sends a notification to all configured
// endpoints.
// endpoints and records it in the alert history.
func (svc *Service) SendNotification(
ctx context.Context,
title, message, priority string,
) {
if svc.ntfyURL != nil {
svc.history.Add(AlertEntry{
Timestamp: time.Now().UTC(),
Title: title,
Message: message,
Priority: priority,
})
svc.dispatchNtfy(ctx, title, message, priority)
svc.dispatchSlack(ctx, title, message, priority)
svc.dispatchMattermost(ctx, title, message, priority)
}
func (svc *Service) dispatchNtfy(
ctx context.Context,
title, message, priority string,
) {
if svc.ntfyURL == nil {
return
}
go func() {
notifyCtx := context.WithoutCancel(ctx)
err := svc.sendNtfy(
notifyCtx,
svc.ntfyURL,
notifyCtx, svc.ntfyURL,
title, message, priority,
)
if err != nil {
@@ -191,13 +216,19 @@ func (svc *Service) SendNotification(
}()
}
if svc.slackWebhookURL != nil {
func (svc *Service) dispatchSlack(
ctx context.Context,
title, message, priority string,
) {
if svc.slackWebhookURL == nil {
return
}
go func() {
notifyCtx := context.WithoutCancel(ctx)
err := svc.sendSlack(
notifyCtx,
svc.slackWebhookURL,
notifyCtx, svc.slackWebhookURL,
title, message, priority,
)
if err != nil {
@@ -209,13 +240,19 @@ func (svc *Service) SendNotification(
}()
}
if svc.mattermostWebhookURL != nil {
func (svc *Service) dispatchMattermost(
ctx context.Context,
title, message, priority string,
) {
if svc.mattermostWebhookURL == nil {
return
}
go func() {
notifyCtx := context.WithoutCancel(ctx)
err := svc.sendSlack(
notifyCtx,
svc.mattermostWebhookURL,
notifyCtx, svc.mattermostWebhookURL,
title, message, priority,
)
if err != nil {
@@ -226,7 +263,6 @@ func (svc *Service) SendNotification(
}
}()
}
}
func (svc *Service) sendNtfy(
ctx context.Context,

View File

@@ -4,7 +4,6 @@ import (
"context"
"errors"
"fmt"
"math/rand"
"net"
"sort"
"strings"
@@ -42,22 +41,6 @@ func rootServerList() []string {
}
}
const maxRootServers = 3
// randomRootServers returns a shuffled subset of root servers.
func randomRootServers() []string {
all := rootServerList()
rand.Shuffle(len(all), func(i, j int) {
all[i], all[j] = all[j], all[i]
})
if len(all) > maxRootServers {
return all[:maxRootServers]
}
return all
}
func checkCtx(ctx context.Context) error {
err := ctx.Err()
if err != nil {
@@ -244,7 +227,7 @@ func (r *Resolver) followDelegation(
authNS := extractNSSet(resp.Ns)
if len(authNS) == 0 {
return r.resolveNSRecursive(ctx, domain)
return r.resolveNSIterative(ctx, domain)
}
glue := extractGlue(resp.Extra)
@@ -308,60 +291,84 @@ func (r *Resolver) resolveNSIPs(
return ips
}
// resolveNSRecursive queries for NS records using recursive
// resolution as a fallback for intercepted environments.
func (r *Resolver) resolveNSRecursive(
// resolveNSIterative queries for NS records using iterative
// resolution as a fallback when followDelegation finds no
// authoritative answer in the delegation chain.
func (r *Resolver) resolveNSIterative(
ctx context.Context,
domain string,
) ([]string, error) {
domain = dns.Fqdn(domain)
msg := new(dns.Msg)
msg.SetQuestion(domain, dns.TypeNS)
msg.RecursionDesired = true
for _, ip := range randomRootServers() {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
addr := net.JoinHostPort(ip, "53")
domain = dns.Fqdn(domain)
servers := rootServerList()
resp, _, err := r.client.ExchangeContext(ctx, msg, addr)
for range maxDelegation {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
resp, err := r.queryServers(
ctx, servers, domain, dns.TypeNS,
)
if err != nil {
continue
return nil, err
}
nsNames := extractNSSet(resp.Answer)
if len(nsNames) > 0 {
return nsNames, nil
}
// Follow delegation.
authNS := extractNSSet(resp.Ns)
if len(authNS) == 0 {
break
}
glue := extractGlue(resp.Extra)
nextServers := glueIPs(authNS, glue)
if len(nextServers) == 0 {
break
}
servers = nextServers
}
return nil, ErrNoNameservers
}
// resolveARecord resolves a hostname to IPv4 addresses.
// resolveARecord resolves a hostname to IPv4 addresses using
// iterative resolution through the delegation chain.
func (r *Resolver) resolveARecord(
ctx context.Context,
hostname string,
) ([]string, error) {
hostname = dns.Fqdn(hostname)
msg := new(dns.Msg)
msg.SetQuestion(hostname, dns.TypeA)
msg.RecursionDesired = true
for _, ip := range randomRootServers() {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
addr := net.JoinHostPort(ip, "53")
hostname = dns.Fqdn(hostname)
servers := rootServerList()
resp, _, err := r.client.ExchangeContext(ctx, msg, addr)
if err != nil {
continue
for range maxDelegation {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
resp, err := r.queryServers(
ctx, servers, hostname, dns.TypeA,
)
if err != nil {
return nil, fmt.Errorf(
"resolving %s: %w", hostname, err,
)
}
// Check for A records in the answer section.
var ips []string
for _, rr := range resp.Answer {
@@ -373,6 +380,24 @@ func (r *Resolver) resolveARecord(
if len(ips) > 0 {
return ips, nil
}
// Follow delegation if present.
authNS := extractNSSet(resp.Ns)
if len(authNS) == 0 {
break
}
glue := extractGlue(resp.Extra)
nextServers := glueIPs(authNS, glue)
if len(nextServers) == 0 {
// Resolve NS IPs iteratively — but guard
// against infinite recursion by using only
// already-resolved servers.
break
}
servers = nextServers
}
return nil, fmt.Errorf(
@@ -402,7 +427,7 @@ func (r *Resolver) FindAuthoritativeNameservers(
candidate := strings.Join(labels[i:], ".") + "."
nsNames, err := r.followDelegation(
ctx, candidate, randomRootServers(),
ctx, candidate, rootServerList(),
)
if err == nil && len(nsNames) > 0 {
sort.Strings(nsNames)
@@ -435,6 +460,23 @@ func (r *Resolver) QueryNameserver(
return r.queryAllTypes(ctx, nsHostname, nsIPs[0], hostname)
}
// QueryNameserverIP queries a nameserver by its IP address directly,
// bypassing NS hostname resolution.
func (r *Resolver) QueryNameserverIP(
ctx context.Context,
nsHostname string,
nsIP string,
hostname string,
) (*NameserverResponse, error) {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
hostname = dns.Fqdn(hostname)
return r.queryAllTypes(ctx, nsHostname, nsIP, hostname)
}
func (r *Resolver) queryAllTypes(
ctx context.Context,
nsHostname string,
@@ -462,6 +504,7 @@ func (r *Resolver) queryAllTypes(
type queryState struct {
gotNXDomain bool
gotSERVFAIL bool
gotTimeout bool
hasRecords bool
}
@@ -499,6 +542,10 @@ func (r *Resolver) querySingleType(
) {
msg, err := r.queryDNS(ctx, nsIP, hostname, qtype)
if err != nil {
if isTimeout(err) {
state.gotTimeout = true
}
return
}
@@ -536,12 +583,26 @@ func collectAnswerRecords(
}
}
// isTimeout checks whether an error is a network timeout.
func isTimeout(err error) bool {
var netErr net.Error
if errors.As(err, &netErr) {
return netErr.Timeout()
}
return false
}
func classifyResponse(resp *NameserverResponse, state queryState) {
switch {
case state.gotNXDomain && !state.hasRecords:
resp.Status = StatusNXDomain
case state.gotTimeout && !state.hasRecords:
resp.Status = StatusTimeout
resp.Error = "all queries timed out"
case state.gotSERVFAIL && !state.hasRecords:
resp.Status = StatusError
resp.Error = "server returned SERVFAIL"
case !state.hasRecords && !state.gotNXDomain:
resp.Status = StatusNoData
}

View File

@@ -17,6 +17,7 @@ const (
StatusError = "error"
StatusNXDomain = "nxdomain"
StatusNoData = "nodata"
StatusTimeout = "timeout"
)
// MaxCNAMEDepth is the maximum CNAME chain depth to follow.

View File

@@ -10,6 +10,7 @@ import (
"testing"
"time"
"github.com/miekg/dns"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
@@ -622,6 +623,59 @@ func TestQueryAllNameservers_ContextCanceled(t *testing.T) {
assert.Error(t, err)
}
// ----------------------------------------------------------------
// Timeout tests
// ----------------------------------------------------------------
func TestQueryNameserverIP_Timeout(t *testing.T) {
t.Parallel()
log := slog.New(slog.NewTextHandler(
os.Stderr,
&slog.HandlerOptions{Level: slog.LevelDebug},
))
r := resolver.NewFromLoggerWithClient(
log, &timeoutClient{},
)
ctx, cancel := context.WithTimeout(
context.Background(), 10*time.Second,
)
t.Cleanup(cancel)
// Query any IP — the client always returns a timeout error.
resp, err := r.QueryNameserverIP(
ctx, "unreachable.test.", "192.0.2.1",
"example.com",
)
require.NoError(t, err)
assert.Equal(t, resolver.StatusTimeout, resp.Status)
assert.NotEmpty(t, resp.Error)
}
// timeoutClient simulates DNS timeout errors for testing.
type timeoutClient struct{}
func (c *timeoutClient) ExchangeContext(
_ context.Context,
_ *dns.Msg,
_ string,
) (*dns.Msg, time.Duration, error) {
return nil, 0, &net.OpError{
Op: "read",
Net: "udp",
Err: &timeoutError{},
}
}
type timeoutError struct{}
func (e *timeoutError) Error() string { return "i/o timeout" }
func (e *timeoutError) Timeout() bool { return true }
func (e *timeoutError) Temporary() bool { return true }
func TestResolveIPAddresses_ContextCanceled(t *testing.T) {
t.Parallel()

View File

@@ -1,11 +1,14 @@
package server
import (
"net/http"
"time"
"github.com/go-chi/chi/v5"
chimw "github.com/go-chi/chi/v5/middleware"
"github.com/prometheus/client_golang/prometheus/promhttp"
"sneak.berlin/go/dnswatcher/static"
)
// requestTimeout is the maximum duration for handling a request.
@@ -22,7 +25,25 @@ func (s *Server) SetupRoutes() {
s.router.Use(s.mw.CORS())
s.router.Use(chimw.Timeout(requestTimeout))
// Health check
// Dashboard (read-only web UI)
s.router.Get("/", s.handlers.HandleDashboard())
// Static assets (embedded CSS/JS)
s.router.Mount(
"/s",
http.StripPrefix(
"/s",
http.FileServer(http.FS(static.Static)),
),
)
// Health check (standard well-known path)
s.router.Get(
"/.well-known/healthcheck",
s.handlers.HandleHealthCheck(),
)
// Legacy health check (keep for backward compatibility)
s.router.Get("/health", s.handlers.HandleHealthCheck())
// API v1 routes

View File

@@ -57,10 +57,49 @@ type HostnameState struct {
// PortState holds the monitoring state for a port.
type PortState struct {
Open bool `json:"open"`
Hostname string `json:"hostname"`
Hostnames []string `json:"hostnames"`
LastChecked time.Time `json:"lastChecked"`
}
// UnmarshalJSON implements custom unmarshaling to handle both
// the old single-hostname format and the new multi-hostname
// format for backward compatibility with existing state files.
func (ps *PortState) UnmarshalJSON(data []byte) error {
// Use an alias to prevent infinite recursion.
type portStateAlias struct {
Open bool `json:"open"`
Hostnames []string `json:"hostnames"`
LastChecked time.Time `json:"lastChecked"`
}
var alias portStateAlias
err := json.Unmarshal(data, &alias)
if err != nil {
return fmt.Errorf("unmarshaling port state: %w", err)
}
ps.Open = alias.Open
ps.Hostnames = alias.Hostnames
ps.LastChecked = alias.LastChecked
// If Hostnames is empty, try reading the old single-hostname
// format for backward compatibility.
if len(ps.Hostnames) == 0 {
var old struct {
Hostname string `json:"hostname"`
}
// Best-effort: ignore errors since the main unmarshal
// already succeeded.
if json.Unmarshal(data, &old) == nil && old.Hostname != "" {
ps.Hostnames = []string{old.Hostname}
}
}
return nil
}
// CertificateState holds TLS certificate monitoring state.
type CertificateState struct {
CommonName string `json:"commonName"`
@@ -263,6 +302,27 @@ func (s *State) GetPortState(key string) (*PortState, bool) {
return ps, ok
}
// DeletePortState removes a port state entry.
func (s *State) DeletePortState(key string) {
s.mu.Lock()
defer s.mu.Unlock()
delete(s.snapshot.Ports, key)
}
// GetAllPortKeys returns all port state keys.
func (s *State) GetAllPortKeys() []string {
s.mu.RLock()
defer s.mu.RUnlock()
keys := make([]string, 0, len(s.snapshot.Ports))
for k := range s.snapshot.Ports {
keys = append(keys, k)
}
return keys
}
// SetCertificateState updates the state for a certificate.
func (s *State) SetCertificateState(
key string,

1302
internal/state/state_test.go Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -20,3 +20,19 @@ func NewForTest() *State {
config: &config.Config{DataDir: ""},
}
}
// NewForTestWithDataDir creates a State backed by the given directory
// for tests that need file persistence.
func NewForTestWithDataDir(dataDir string) *State {
return &State{
log: slog.Default(),
snapshot: &Snapshot{
Version: stateVersion,
Domains: make(map[string]*DomainState),
Hostnames: make(map[string]*HostnameState),
Ports: make(map[string]*PortState),
Certificates: make(map[string]*CertificateState),
},
config: &config.Config{DataDir: dataDir},
}
}

View File

@@ -72,13 +72,15 @@ func New(
}
lifecycle.Append(fx.Hook{
OnStart: func(startCtx context.Context) error {
ctx, cancel := context.WithCancel(
context.WithoutCancel(startCtx),
)
OnStart: func(_ context.Context) error {
// Use context.Background() — the fx startup context
// expires after startup completes, so deriving from it
// would cancel the watcher immediately. The watcher's
// lifetime is controlled by w.cancel in OnStop.
ctx, cancel := context.WithCancel(context.Background())
w.cancel = cancel
go w.Run(ctx)
go w.Run(ctx) //nolint:contextcheck // intentionally not derived from startCtx
return nil
},
@@ -127,6 +129,7 @@ func (w *Watcher) Run(ctx context.Context) {
)
w.RunOnce(ctx)
w.maybeSendTestNotification(ctx)
dnsTicker := time.NewTicker(w.config.DNSInterval)
tlsTicker := time.NewTicker(w.config.TLSInterval)
@@ -141,9 +144,16 @@ func (w *Watcher) Run(ctx context.Context) {
return
case <-dnsTicker.C:
w.runDNSAndPortChecks(ctx)
w.runDNSChecks(ctx)
w.checkAllPorts(ctx)
w.saveState()
case <-tlsTicker.C:
// Run DNS first so TLS checks use freshly
// resolved IP addresses, not stale ones from
// a previous cycle.
w.runDNSChecks(ctx)
w.runTLSChecks(ctx)
w.saveState()
}
@@ -151,10 +161,26 @@ func (w *Watcher) Run(ctx context.Context) {
}
// RunOnce performs a single complete monitoring cycle.
// DNS checks run first so that port and TLS checks use
// freshly resolved IP addresses. Port checks run before
// TLS because TLS checks only target IPs with an open
// port 443.
func (w *Watcher) RunOnce(ctx context.Context) {
w.detectFirstRun()
w.runDNSAndPortChecks(ctx)
// Phase 1: DNS resolution must complete first so that
// subsequent checks use fresh IP addresses.
w.runDNSChecks(ctx)
// Phase 2: Port checks populate port state that TLS
// checks depend on (TLS only targets IPs where port
// 443 is open).
w.checkAllPorts(ctx)
// Phase 3: TLS checks use fresh DNS IPs and current
// port state.
w.runTLSChecks(ctx)
w.saveState()
w.firstRun = false
}
@@ -171,7 +197,11 @@ func (w *Watcher) detectFirstRun() {
}
}
func (w *Watcher) runDNSAndPortChecks(ctx context.Context) {
// runDNSChecks performs DNS resolution for all configured domains
// and hostnames, updating state with freshly resolved records.
// This must complete before port or TLS checks run so those
// checks operate on current IP addresses.
func (w *Watcher) runDNSChecks(ctx context.Context) {
for _, domain := range w.config.Domains {
w.checkDomain(ctx, domain)
}
@@ -179,8 +209,6 @@ func (w *Watcher) runDNSAndPortChecks(ctx context.Context) {
for _, hostname := range w.config.Hostnames {
w.checkHostname(ctx, hostname)
}
w.checkAllPorts(ctx)
}
func (w *Watcher) checkDomain(
@@ -448,24 +476,94 @@ func (w *Watcher) detectInconsistencies(
}
func (w *Watcher) checkAllPorts(ctx context.Context) {
for _, hostname := range w.config.Hostnames {
w.checkPortsForHostname(ctx, hostname)
// Phase 1: Build current IP:port → hostname associations
// from fresh DNS data.
associations := w.buildPortAssociations()
// Phase 2: Check each unique IP:port and update state
// with the full set of associated hostnames.
for key, hostnames := range associations {
ip, port := parsePortKey(key)
if port == 0 {
continue
}
for _, domain := range w.config.Domains {
w.checkPortsForHostname(ctx, domain)
}
w.checkSinglePort(ctx, ip, port, hostnames)
}
func (w *Watcher) checkPortsForHostname(
ctx context.Context,
hostname string,
) {
ips := w.collectIPs(hostname)
// Phase 3: Remove port state entries that no longer have
// any hostname referencing them.
w.cleanupStalePorts(associations)
}
// buildPortAssociations constructs a map from IP:port keys to
// the sorted set of hostnames currently resolving to that IP.
func (w *Watcher) buildPortAssociations() map[string][]string {
assoc := make(map[string]map[string]bool)
allNames := make(
[]string, 0,
len(w.config.Hostnames)+len(w.config.Domains),
)
allNames = append(allNames, w.config.Hostnames...)
allNames = append(allNames, w.config.Domains...)
for _, name := range allNames {
ips := w.collectIPs(name)
for _, ip := range ips {
for _, port := range monitoredPorts {
w.checkSinglePort(ctx, ip, port, hostname)
key := fmt.Sprintf("%s:%d", ip, port)
if assoc[key] == nil {
assoc[key] = make(map[string]bool)
}
assoc[key][name] = true
}
}
}
result := make(map[string][]string, len(assoc))
for key, set := range assoc {
hostnames := make([]string, 0, len(set))
for h := range set {
hostnames = append(hostnames, h)
}
sort.Strings(hostnames)
result[key] = hostnames
}
return result
}
// parsePortKey splits an "ip:port" key into its components.
func parsePortKey(key string) (string, int) {
lastColon := strings.LastIndex(key, ":")
if lastColon < 0 {
return key, 0
}
ip := key[:lastColon]
var p int
_, err := fmt.Sscanf(key[lastColon+1:], "%d", &p)
if err != nil {
return ip, 0
}
return ip, p
}
// cleanupStalePorts removes port state entries that are no
// longer referenced by any hostname in the current DNS data.
func (w *Watcher) cleanupStalePorts(
currentAssociations map[string][]string,
) {
for _, key := range w.state.GetAllPortKeys() {
if _, exists := currentAssociations[key]; !exists {
w.state.DeletePortState(key)
}
}
}
@@ -502,7 +600,7 @@ func (w *Watcher) checkSinglePort(
ctx context.Context,
ip string,
port int,
hostname string,
hostnames []string,
) {
result, err := w.portCheck.CheckPort(ctx, ip, port)
if err != nil {
@@ -527,8 +625,8 @@ func (w *Watcher) checkSinglePort(
}
msg := fmt.Sprintf(
"Host: %s\nAddress: %s\nPort now %s",
hostname, key, stateStr,
"Hosts: %s\nAddress: %s\nPort now %s",
strings.Join(hostnames, ", "), key, stateStr,
)
w.notify.SendNotification(
@@ -541,7 +639,7 @@ func (w *Watcher) checkSinglePort(
w.state.SetPortState(key, &state.PortState{
Open: result.Open,
Hostname: hostname,
Hostnames: hostnames,
LastChecked: now,
})
}
@@ -757,6 +855,38 @@ func (w *Watcher) saveState() {
}
}
// maybeSendTestNotification sends a startup status notification
// after the first full scan completes, if SEND_TEST_NOTIFICATION
// is enabled. The message is clearly informational ("all ok")
// and not an error or anomaly alert.
func (w *Watcher) maybeSendTestNotification(ctx context.Context) {
if !w.config.SendTestNotification {
return
}
snap := w.state.GetSnapshot()
msg := fmt.Sprintf(
"dnswatcher has started and completed its initial scan.\n"+
"Monitoring %d domain(s) and %d hostname(s).\n"+
"Tracking %d port endpoint(s) and %d TLS certificate(s).\n"+
"All notification channels are working.",
len(snap.Domains),
len(snap.Hostnames),
len(snap.Ports),
len(snap.Certificates),
)
w.log.Info("sending startup test notification")
w.notify.SendNotification(
ctx,
"✅ dnswatcher startup complete",
msg,
"success",
)
}
// --- Utility functions ---
func toSet(items []string) map[string]bool {

View File

@@ -682,6 +682,191 @@ func TestGracefulShutdown(t *testing.T) {
}
}
func setupHostnameIP(
deps *testDeps,
hostname, ip string,
) {
deps.resolver.allRecords[hostname] = map[string]map[string][]string{
"ns1.example.com.": {"A": {ip}},
}
deps.portChecker.results[ip+":80"] = true
deps.portChecker.results[ip+":443"] = true
deps.tlsChecker.certs[ip+":"+hostname] = &tlscheck.CertificateInfo{
CommonName: hostname,
Issuer: "DigiCert",
NotAfter: time.Now().Add(90 * 24 * time.Hour),
SubjectAlternativeNames: []string{hostname},
}
}
func updateHostnameIP(deps *testDeps, hostname, ip string) {
deps.resolver.mu.Lock()
deps.resolver.allRecords[hostname] = map[string]map[string][]string{
"ns1.example.com.": {"A": {ip}},
}
deps.resolver.mu.Unlock()
deps.portChecker.mu.Lock()
deps.portChecker.results[ip+":80"] = true
deps.portChecker.results[ip+":443"] = true
deps.portChecker.mu.Unlock()
deps.tlsChecker.mu.Lock()
deps.tlsChecker.certs[ip+":"+hostname] = &tlscheck.CertificateInfo{
CommonName: hostname,
Issuer: "DigiCert",
NotAfter: time.Now().Add(90 * 24 * time.Hour),
SubjectAlternativeNames: []string{hostname},
}
deps.tlsChecker.mu.Unlock()
}
func TestDNSRunsBeforePortAndTLSChecks(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Hostnames = []string{"www.example.com"}
w, deps := newTestWatcher(t, cfg)
setupHostnameIP(deps, "www.example.com", "10.0.0.1")
ctx := t.Context()
w.RunOnce(ctx)
snap := deps.state.GetSnapshot()
if _, ok := snap.Ports["10.0.0.1:80"]; !ok {
t.Fatal("expected port state for 10.0.0.1:80")
}
// DNS changes to a new IP; port and TLS must pick it up.
updateHostnameIP(deps, "www.example.com", "10.0.0.2")
w.RunOnce(ctx)
snap = deps.state.GetSnapshot()
if _, ok := snap.Ports["10.0.0.2:80"]; !ok {
t.Error("port check used stale DNS: missing 10.0.0.2:80")
}
certKey := "10.0.0.2:443:www.example.com"
if _, ok := snap.Certificates[certKey]; !ok {
t.Error("TLS check used stale DNS: missing " + certKey)
}
}
func TestSendTestNotification_Enabled(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Domains = []string{"example.com"}
cfg.Hostnames = []string{"www.example.com"}
cfg.SendTestNotification = true
w, deps := newTestWatcher(t, cfg)
setupBaselineMocks(deps)
w.RunOnce(t.Context())
// RunOnce does not send the test notification — it is
// sent by Run after RunOnce completes. Call the exported
// RunOnce then check that no test notification was sent
// (only Run triggers it). We test the full path via Run.
notifications := deps.notifier.getNotifications()
if len(notifications) != 0 {
t.Errorf(
"RunOnce should not send test notification, got %d",
len(notifications),
)
}
}
func TestSendTestNotification_ViaRun(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Domains = []string{"example.com"}
cfg.Hostnames = []string{"www.example.com"}
cfg.SendTestNotification = true
cfg.DNSInterval = 24 * time.Hour
cfg.TLSInterval = 24 * time.Hour
w, deps := newTestWatcher(t, cfg)
setupBaselineMocks(deps)
ctx, cancel := context.WithCancel(t.Context())
done := make(chan struct{})
go func() {
w.Run(ctx)
close(done)
}()
// Wait for the initial scan and test notification.
time.Sleep(500 * time.Millisecond)
cancel()
<-done
notifications := deps.notifier.getNotifications()
found := false
for _, n := range notifications {
if n.Priority == "success" &&
n.Title == "✅ dnswatcher startup complete" {
found = true
}
}
if !found {
t.Errorf(
"expected startup test notification, got: %v",
notifications,
)
}
}
func TestSendTestNotification_Disabled(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Domains = []string{"example.com"}
cfg.Hostnames = []string{"www.example.com"}
cfg.SendTestNotification = false
cfg.DNSInterval = 24 * time.Hour
cfg.TLSInterval = 24 * time.Hour
w, deps := newTestWatcher(t, cfg)
setupBaselineMocks(deps)
ctx, cancel := context.WithCancel(t.Context())
done := make(chan struct{})
go func() {
w.Run(ctx)
close(done)
}()
time.Sleep(500 * time.Millisecond)
cancel()
<-done
notifications := deps.notifier.getNotifications()
for _, n := range notifications {
if n.Title == "✅ dnswatcher startup complete" {
t.Error(
"test notification should not be sent when disabled",
)
}
}
}
func TestNSFailureAndRecovery(t *testing.T) {
t.Parallel()

1
static/css/tailwind.min.css vendored Normal file

File diff suppressed because one or more lines are too long

10
static/static.go Normal file
View File

@@ -0,0 +1,10 @@
// Package static provides embedded static assets.
package static
import "embed"
// Static contains the embedded static assets (CSS, JS) served
// at the /s/ URL prefix.
//
//go:embed css
var Static embed.FS