When set to a truthy value, sends a startup status notification to all configured notification channels after the first full scan completes on application startup. The notification is clearly an all-ok/success message showing the number of monitored domains, hostnames, ports, and certificates.
Changes:
- Added `SendTestNotification` config field reading `DNSWATCHER_SEND_TEST_NOTIFICATION`
- Added `maybeSendTestNotification()` in watcher, called after initial `RunOnce` in `Run`
- Added 3 watcher tests (enabled via Run, enabled via RunOnce alone, disabled)
- Added config tests for the new field
- Updated README: env var table, example .env, Docker example
Closes#84
Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #85
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
## Summary
Adds a read-only web dashboard at `GET /` that shows the current monitoring state and recent alerts. Unauthenticated, single-page, no navigation.
## What it shows
- **Summary bar**: counts of monitored domains, hostnames, ports, certificates
- **Domains**: nameservers with last-checked age
- **Hostnames**: per-nameserver DNS records, status badges, relative age
- **Ports**: open/closed state with associated hostnames and age
- **TLS Certificates**: CN, issuer, expiry (color-coded by urgency), status, age
- **Recent Alerts**: last 100 notifications in reverse chronological order with priority badges
Every data point displays its age (e.g. "5m ago") so freshness is visible at a glance. Auto-refreshes every 30 seconds.
## What it does NOT show
No secrets: webhook URLs, ntfy topics, Slack/Mattermost endpoints, API tokens, and configuration details are never exposed.
## Design
All assets (CSS) are embedded in the binary and served from `/s/`. Zero external HTTP requests at runtime — no CDN dependencies or third-party resources. Dark, technical aesthetic with saturated teals and blues on dark slate. Single page — everything on one screen.
## Implementation
- `internal/notify/history.go` — thread-safe ring buffer (`AlertHistory`) storing last 100 alerts
- `internal/notify/notify.go` — records each alert in history before dispatch; refactored `SendNotification` into smaller `dispatch*` helpers to satisfy funlen
- `internal/handlers/dashboard.go` — `HandleDashboard()` handler with embedded HTML template, helper functions (`relTime`, `formatRecords`, `expiryDays`, `joinStrings`)
- `internal/handlers/templates/dashboard.html` — Tailwind-styled single-page dashboard
- `internal/handlers/handlers.go` — added `State` and `Notify` dependencies via fx
- `internal/server/routes.go` — registered `GET /` route
- `static/` — embedded CSS assets served via `/s/` prefix
- `README.md` — documented the dashboard and new endpoint
## Tests
- `internal/notify/history_test.go` — empty, add+recent ordering, overflow beyond capacity
- `internal/handlers/dashboard_test.go` — `relTime`, `expiryDays`, `formatRecords`
- All existing tests pass unchanged
- `docker build .` passes
closes [#82](#82)
<!-- session: rework-pr-83 -->
Co-authored-by: user <user@Mac.lan guest wan>
Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #83
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
## Summary
Add comprehensive tests for the `internal/config` package, covering the main configuration loading path that was previously untested.
Closes [issue #72](#72)
## What Changed
Added three new test files:
- **`config_test.go`** — 16 tests covering `New()`, `StatePath()`, and the full config loading pipeline
- **`parsecsv_test.go`** — 10 test cases for `parseCSV()` edge cases
- **`export_test.go`** — standard Go export bridge for testing unexported `parseCSV`
## Test Coverage
| Area | Tests |
|------|-------|
| Default values | All 14 config fields verified against documented defaults |
| Environment overrides | All env vars tested including `PORT` (unprefixed) |
| Invalid duration fallback | `DNSWATCHER_DNS_INTERVAL=banana` falls back to 1h |
| Invalid TLS interval | `DNSWATCHER_TLS_INTERVAL=notaduration` falls back to 12h |
| No targets error | Empty/missing `DNSWATCHER_TARGETS` returns `ErrNoTargets` |
| Invalid targets | Public suffix (`co.uk`) rejected with error |
| CSV parsing | Trailing commas, leading commas, consecutive commas, whitespace, tabs |
| Debug mode | `DNSWATCHER_DEBUG=true` enables debug logging |
| Target classification | Domains vs hostnames correctly separated via PSL |
| StatePath | Path construction with various `DataDir` values |
| Empty appname | Falls back to "dnswatcher" config file name |
**Coverage: 23% → 92.5%**
## Notes
- Tests use `viper.Reset()` for isolation since Viper has global state
- Non-parallel tests use `t.Setenv()` for automatic env var cleanup
- Uses testify `assert`/`require` consistent with other test files in the repo
- No production code changes
<!-- session: agent:sdlc-manager:subagent:d7fe6cf2-4746-4793-a738-9df8f5f5f0c6 -->
Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #81
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
## Summary
Fixes documentation inaccuracies in README.md identified during QA audit.
### Changes
**API table (closes #67):**
- Removed `GET /api/v1/domains` and `GET /api/v1/hostnames` from the HTTP API table. These endpoints are not implemented — the only routes in `internal/server/routes.go` are `/health`, `/api/v1/status`, and `/metrics` (conditional).
**Feature claims (closes #68):**
- Removed "Inconsistency resolved" from hostname monitoring features. `detectInconsistencies()` detects current inconsistencies but has no state tracking to detect when they resolve.
- Removed `nxdomain` and `nodata` from the state status values table. While the resolver defines these constants, `buildHostnameState()` in the watcher only ever sets status to `"ok"`. Failed queries set `"error"` via the NS disappearance path. These values are never written to state.
- Removed "Empty response" (NODATA/NXDOMAIN) detection claim. Changes are caught generically by `detectRecordChanges()`, not with specific NODATA/NXDOMAIN labeling.
### What was NOT changed
- "Inconsistency detected" remains — this IS implemented in `detectInconsistencies()`.
- All other feature claims were verified against the code and are accurate.
- No Go source code was modified.
Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #74
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
## Summary
When `DNSWATCHER_TARGETS` is empty (the default), dnswatcher previously started successfully and ran indefinitely monitoring nothing. This is a common misconfiguration — forgetting to set the variable or making a typo in its name — and gave no indication anything was wrong.
## Changes
- Added `ErrNoTargets` sentinel error in `internal/config/config.go`
- Extracted `parseAndValidateTargets()` helper to validate that at least one domain or hostname is configured after target classification
- If no targets are configured, dnswatcher now exits with a clear error: `"no monitoring targets configured: set DNSWATCHER_TARGETS environment variable"`
- Updated README.md to document that `DNSWATCHER_TARGETS` is required and dnswatcher will refuse to start without it
## How it works
The validation runs during config construction (via uber/fx), before the watcher or any other component starts. If `DNSWATCHER_TARGETS` is empty or contains only whitespace/empty entries, `buildConfig()` returns `ErrNoTargets`, which causes fx to fail startup with a clear error message.
This is fail-fast behavior: a monitoring daemon with nothing to monitor is a misconfiguration and should not silently run.
Closes #69
Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Reviewed-on: #75
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
## Summary
The `OnStart` hook previously derived the watcher's context from the fx startup context (`startCtx`) via `context.WithoutCancel()`. While `WithoutCancel` strips cancellation and deadline, using `context.Background()` makes the intent explicit: the watcher's monitoring loop must outlive the fx startup phase and is controlled solely by the `cancel` func called in `OnStop`.
## Changes
- Replace `context.WithCancel(context.WithoutCancel(startCtx))` with `context.WithCancel(context.Background())`
- Add explanatory comment documenting why the watcher context is not derived from the startup context
- Unused `startCtx` parameter changed to `_`
Closes#53
Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #63
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
## Summary
Port state keys are `ip:port` with a single `hostname` field. When multiple hostnames resolve to the same IP (shared hosting, CDN), only one hostname was associated. This caused orphaned port state when that hostname removed the IP from DNS while the IP remained valid for other hostnames.
## Changes
### State (`internal/state/state.go`)
- `PortState.Hostname` (string) → `PortState.Hostnames` ([]string)
- Custom `UnmarshalJSON` for backward compatibility: reads old single `hostname` field and migrates to a single-element `hostnames` slice
- Added `DeletePortState` and `GetAllPortKeys` methods for cleanup
### Watcher (`internal/watcher/watcher.go`)
- Refactored `checkAllPorts` into three phases:
1. Build IP:port → hostname associations from current DNS data
2. Check each unique IP:port once with all associated hostnames
3. Clean up stale port state entries with no hostname references
- Port change notifications now list all associated hostnames (`Hosts:` instead of `Host:`)
- Added `buildPortAssociations`, `parsePortKey`, and `cleanupStalePorts` helper functions
### README
- Updated state file format example: `hostname` → `hostnames` (array)
- Updated notification description to reflect multiple hostnames
## Backward Compatibility
Existing state files with the old single `hostname` string are handled gracefully via custom JSON unmarshaling — they are read as single-element `hostnames` slices.
Closes #55
Co-authored-by: clawbot <clawbot@noreply.eeqj.de>
Reviewed-on: #65
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
## Summary
DNS checks now always complete before port or TLS checks begin, ensuring those checks use freshly resolved IP addresses instead of potentially stale ones from a previous cycle.
## Problem
Port and TLS checks read IP addresses from state that was populated during the most recent DNS check. If DNS changes between cycles, port/TLS checks may target stale IPs. In particular, when the TLS ticker fired (every 12h), it ran `runTLSChecks` without refreshing DNS first — meaning TLS checks could use IPs that were up to 12 hours old.
## Changes
- **Extract `runDNSChecks()`** from the former `runDNSAndPortChecks()` so DNS resolution can be invoked independently as a prerequisite for any check type.
- **TLS ticker now runs DNS first**: When the TLS ticker fires, DNS checks run before TLS checks, ensuring fresh IPs.
- **`RunOnce` uses explicit 3-phase ordering**: DNS → ports → TLS. Port checks must complete before TLS because TLS checks only target IPs where port 443 is open.
- **New test `TestDNSRunsBeforePortAndTLSChecks`**: Verifies that when DNS IPs change between cycles, port and TLS checks pick up the new IPs.
- **README updated**: Monitoring lifecycle section now documents the DNS-first ordering guarantee.
## Check ordering
| Trigger | Phase 1 | Phase 2 | Phase 3 |
|---------|---------|---------|----------|
| Startup (`RunOnce`) | DNS | Ports | TLS |
| DNS ticker | DNS | Ports | — |
| TLS ticker | DNS | — | TLS |
closes #58
Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #64
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
Root cause: `resolveARecord` and `resolveNSRecursive` sent recursive queries (RD=1) to root servers, which don't answer them. This caused 5s timeouts × 2 retries × 3 servers = hanging tests.
Fix:
- Changed `queryTimeoutDuration` from 5s to 700ms
- Rewrote `resolveARecord` to do proper iterative resolution through the delegation chain (query roots → follow NS delegations → get A record)
- Renamed `resolveNSRecursive` → `resolveNSIterative` with same iterative approach
- No mocking, no test skipping, no config changes
`make check` passes: all 29 resolver tests pass with real DNS in ~10s.
Co-authored-by: clawbot <clawbot@git.eeqj.de>
Reviewed-on: #28
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
- Replace convoluted CI workflow (setup-go, install golangci-lint, install
goimports, make check) with simple 'docker build .' per repo policy
- Pin Docker base images by SHA256 hash instead of mutable tags
- Pin golangci-lint and goimports by commit hash instead of @latest
- Add binutils-gold for linker compatibility on alpine
- Run on all pushes, not just main/PR branches
Documents the project testing philosophy: all resolver tests must
use live DNS queries. Mocking the DNS client layer is not permitted.
Includes rationale and anti-patterns to avoid.
The ErrNotImplemented sentinel error was dead code left over from
initial scaffolding. The resolver performs real iterative DNS
lookups from root servers, PortCheck does TCP connection checks,
and TLSCheck verifies TLS certificates and expiry. Removed the
unused error constant.
checkTLSExpiry fired every monitoring cycle with no deduplication,
causing notification spam for expiring certificates. Added an
in-memory map tracking the last notification time per domain/IP
pair, suppressing re-notification within the TLS check interval.
Added TestTLSExpiryWarningDedup to verify deduplication works.
collectIPs only reads HostnameState, but checkDomain only stored
DomainState (nameservers). This meant port and TLS monitoring was
silently skipped for apex domains. Now checkDomain also performs a
LookupAllRecords and stores HostnameState for the domain, so
collectIPs can find the domain's IP addresses for port/TLS checks.
Added TestDomainPortAndTLSChecks to verify the fix.
State.Save() was using RLock but mutating s.snapshot.LastUpdated,
which is a write operation. This created a data race since other
goroutines could also hold a read lock and observe a partially
written timestamp. Changed to full Lock to ensure exclusive access
during the mutation.
- extractCertInfo now returns an error (ErrNoPeerCertificates) instead
of an empty struct when there are no peer certificates
- SubjectAlternativeNames now includes both DNS names and IP addresses
from cert.IPAddresses
Addresses review feedback on PR #7.
Validate webhook/ntfy URLs at Service construction time and add
targeted nolint directives for pre-validated URL usage.
Fix goimports formatting in tlscheck_test.go.
Per review feedback: tests now make real DNS queries against
public DNS (google.com, cloudflare.com) instead of using a
mock DNS client. The DNSClient interface and mock infrastructure
have been removed.
- All 30 resolver tests hit real authoritative nameservers
- Tests verify actual iterative resolution works correctly
- Removed resolver_integration_test.go (merged into main tests)
- Context timeout increased to 60s for iterative resolution
- Extract DNSClient interface from resolver to allow dependency injection
- Convert all resolver methods from package-level to receiver methods
using the injectable DNS client
- Rewrite resolver_test.go with a mock DNS client that simulates the
full delegation chain (root → TLD → authoritative) in-process
- Move 2 integration tests (real DNS) behind //go:build integration tag
- Add NewFromLoggerWithClient constructor for test injection
- Add LookupAllRecords implementation (was returning ErrNotImplemented)
All unit tests are hermetic (no network) and complete in <1s.
Total make check passes in ~5s.
Closes#12
Implement full iterative DNS resolution from root servers through TLD
and domain nameservers using github.com/miekg/dns.
- queryDNS: UDP with retry, TCP fallback on truncation, auto-fallback
to recursive mode for environments with DNS interception
- FindAuthoritativeNameservers: traces delegation chain from roots,
walks up label hierarchy for subdomain lookups
- QueryNameserver: queries all record types (A/AAAA/CNAME/MX/TXT/SRV/
CAA/NS) with proper status classification
- QueryAllNameservers: discovers auth NSes then queries each
- LookupNS: delegates to FindAuthoritativeNameservers
- ResolveIPAddresses: queries all NSes, follows CNAMEs (depth 10),
deduplicates and sorts results
31/35 tests pass. 4 NXDOMAIN tests fail due to wildcard DNS on
sneak.cloud (nxdomain-surely-does-not-exist.dns.sneak.cloud resolves
to datavi.be/162.55.148.94 via catch-all). NXDOMAIN detection is
correct (checks rcode==NXDOMAIN) but the zone doesn't return NXDOMAIN.
35 tests define the full resolver contract using live DNS queries
against *.dns.sneak.cloud (Cloudflare). Tests cover:
- FindAuthoritativeNameservers: iterative NS discovery, sorting,
determinism, trailing dot handling, TLD and subdomain cases
- QueryNameserver: A, AAAA, CNAME, MX, TXT, NXDOMAIN, per-NS
response model with status field, sorted record values
- QueryAllNameservers: independent per-NS queries, consistency
verification, NXDOMAIN from all NS
- LookupNS: NS record lookup matching FindAuthoritative
- ResolveIPAddresses: basic, multi-A, IPv6, dual-stack, CNAME
following, deduplication, sorting, NXDOMAIN returns empty
- Context cancellation for all methods
- Iterative resolution proof (resolves example.com from root)
Also adds DNSSEC validation to planned future features in README.