Files
dnswatcher/README.md
clawbot 65180ad661
All checks were successful
check / check (push) Successful in 5s
feat: add DNSWATCHER_SEND_TEST_NOTIFICATION env var (#85)
When set to a truthy value, sends a startup status notification to all configured notification channels after the first full scan completes on application startup. The notification is clearly an all-ok/success message showing the number of monitored domains, hostnames, ports, and certificates.

Changes:
- Added `SendTestNotification` config field reading `DNSWATCHER_SEND_TEST_NOTIFICATION`
- Added `maybeSendTestNotification()` in watcher, called after initial `RunOnce` in `Run`
- Added 3 watcher tests (enabled via Run, enabled via RunOnce alone, disabled)
- Added config tests for the new field
- Updated README: env var table, example .env, Docker example

Closes #84

Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #85
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-04 21:41:55 +01:00

438 lines
17 KiB
Markdown

# dnswatcher
dnswatcher is a pre-1.0 Go daemon by [@sneak](https://sneak.berlin) that monitors DNS records, TCP port availability, and TLS certificates, delivering real-time change notifications via Slack, Mattermost, and ntfy webhooks.
> ⚠️ Pre-1.0 software. APIs, configuration, and behavior may change without notice.
dnswatcher watches configured DNS domains and hostnames for changes, monitors TCP
port availability, tracks TLS certificate expiry, and delivers real-time
notifications via Slack, Mattermost, and/or ntfy webhooks.
It performs all DNS resolution itself via iterative (non-recursive) queries,
tracing from root nameservers to authoritative servers directly—never relying
on upstream recursive resolvers.
State is persisted to a local JSON file so that monitoring survives restarts
without requiring an external database.
---
## Features
### DNS Domain Monitoring (Apex Domains)
- Accepts a list of DNS domain names (apex domains, identified via the
[Public Suffix List](https://publicsuffix.org/)).
- Every **1 hour**, performs a full iterative trace from root servers to
discover all authoritative nameservers (NS records) for each domain.
- Queries **every** discovered authoritative nameserver independently.
- Stores the NS record set as observed by the delegation chain.
- Any change triggers a notification:
- NS added to or removed from the delegation.
- NS IP address changed (glue record change).
### DNS Hostname Monitoring (Subdomains)
- Accepts a list of DNS hostnames (subdomains, distinguished from apex
domains via the Public Suffix List).
- Every **1 hour**, performs a full iterative trace to discover the
authoritative nameservers for the hostname's parent domain.
- Queries **each** authoritative nameserver independently for **all**
record types: A, AAAA, CNAME, MX, TXT, SRV, CAA, NS.
- Stores results **per nameserver**. The state for a hostname is not a
merged view — it is a map from nameserver to record set.
- Any observable change in any nameserver's response triggers a
notification. This includes:
- **Record change**: A nameserver returns different records than it
did on the previous check (additions, removals, value changes).
- **NS query failure**: A nameserver that previously responded
becomes unreachable (timeout, SERVFAIL, REFUSED, network error).
This is distinct from "responded with no records."
- **NS recovery**: A previously-unreachable nameserver starts
responding again.
- **Inconsistency detected**: Two nameservers that previously agreed
now return different record sets for the same hostname.
### TCP Port Monitoring
- For every configured domain and hostname, constructs a deduplicated list
of all IPv4 and IPv6 addresses resolved via A, AAAA, and CNAME chain
resolution across all authoritative nameservers.
- Checks TCP connectivity on ports **80** and **443** for each IP address.
- Every **1 hour**, re-checks all ports.
- Any change in port availability triggers a notification:
- Port transitioned from open to closed (or vice versa).
- New IP appeared (from DNS change) and its port state was recorded.
- IP disappeared (from DNS change) — noted in the DNS change
notification; port state for that IP is removed.
### TLS Certificate Monitoring
- Every **12 hours**, for each IP address listening on port 443, connects
via TLS using the correct SNI hostname.
- Records the certificate's Subject CN, SANs, issuer, and expiry date.
- Any change triggers a notification:
- Certificate is expiring within **7 days** (warning, repeated each
check until renewed or expired).
- Certificate CN, issuer, or SANs changed (replacement detected,
reports old and new values).
- TLS connection failure to a previously-reachable IP:443 (handshake
error, timeout, connection refused after previously succeeding).
- TLS recovery: a previously-failing IP:443 now completes a
handshake again.
### Notifications
**Every observable state change produces a notification.** dnswatcher is
designed as a real-time change feed — degradations, failures, recoveries,
and routine changes are all reported equally.
Supported notification backends:
| Backend | Configuration | Payload Format |
|----------------|--------------------------|------------------------------|
| **Slack** | Incoming Webhook URL | Attachments with color |
| **Mattermost** | Incoming Webhook URL | Slack-compatible attachments |
| **ntfy** | Topic URL (e.g. `https://ntfy.sh/mytopic`) | Title + body + priority |
All configured endpoints receive every notification. Notification content
includes:
- **DNS record changes**: Which hostname, which nameserver, what record
type, old values, new values.
- **DNS NS changes**: Which domain, which nameservers were added/removed.
- **NS query failures**: Which nameserver failed, error type (timeout,
SERVFAIL, REFUSED, network error), which hostname/domain affected.
- **NS recoveries**: Which nameserver recovered, which hostname/domain.
- **NS inconsistencies**: Which nameservers disagree, what each one
returned, which hostname affected.
- **Port changes**: Which IP:port, old state, new state, all associated
hostnames.
- **TLS expiry warnings**: Which certificate, days remaining, CN,
issuer, associated hostname and IP.
- **TLS certificate changes**: Old and new CN/issuer/SANs, associated
hostname and IP.
- **TLS connection failures/recoveries**: Which IP:port, error details,
associated hostname.
### State Management
- All monitoring state is kept in memory and persisted to a JSON file on
disk (`DATA_DIR/state.json`).
- State is loaded on startup to resume monitoring without triggering
false-positive change notifications.
- State is written atomically (write to temp file, then rename) to prevent
corruption.
### Web Dashboard
dnswatcher includes an unauthenticated, read-only web dashboard at the
root URL (`/`). It displays:
- **Summary counts** for monitored domains, hostnames, ports, and
certificates.
- **Domains** with their discovered nameservers.
- **Hostnames** with per-nameserver DNS records and status.
- **Ports** with open/closed state and associated hostnames.
- **TLS certificates** with CN, issuer, expiry, and status.
- **Recent alerts** (last 100 notifications sent since the process
started), displayed in reverse chronological order.
Every data point shows its age (e.g. "5m ago") so you can tell at a
glance how fresh the information is. The page auto-refreshes every 30
seconds.
The dashboard intentionally does not expose any configuration details
such as webhook URLs, notification endpoints, or API tokens.
All assets (CSS) are embedded in the binary and served from the
application itself. The dashboard makes zero external HTTP requests —
no CDN dependencies or third-party resources are loaded at runtime.
### HTTP API
dnswatcher exposes a lightweight HTTP API for operational visibility:
| Endpoint | Description |
|---------------------------------------|--------------------------------|
| `GET /` | Web dashboard (HTML) |
| `GET /s/...` | Static assets (embedded CSS) |
| `GET /.well-known/healthcheck` | Health check (JSON) |
| `GET /health` | Health check (JSON, legacy) |
| `GET /api/v1/status` | Current monitoring state |
| `GET /metrics` | Prometheus metrics (optional) |
---
## Architecture
```
cmd/dnswatcher/main.go Entry point (uber/fx bootstrap)
internal/
config/config.go Viper-based configuration
globals/globals.go Build-time variables (version)
logger/logger.go slog structured logging (TTY detection)
healthcheck/healthcheck.go Health check service
middleware/middleware.go HTTP middleware (logging, CORS, metrics auth)
handlers/handlers.go HTTP request handlers
server/
server.go HTTP server lifecycle
routes.go Route definitions
state/state.go JSON file state persistence
resolver/resolver.go Iterative DNS resolution engine
portcheck/portcheck.go TCP port connectivity checker
tlscheck/tlscheck.go TLS certificate inspector
notify/notify.go Notification service (Slack, Mattermost, ntfy)
watcher/watcher.go Main monitoring orchestrator and scheduler
```
### Design Principles
- **No recursive resolvers**: All DNS resolution is performed iteratively,
tracing from root nameservers through the delegation chain to
authoritative servers.
- **No external database**: State is persisted as a single JSON file.
- **Dependency injection**: All components are wired via
[uber/fx](https://github.com/uber-go/fx).
- **Structured logging**: All logs use `log/slog` with JSON output in
production (TTY detection for development).
- **Graceful shutdown**: All background goroutines respect context
cancellation and the fx lifecycle.
---
## Configuration
Configuration is loaded via [Viper](https://github.com/spf13/viper) with
the following precedence (highest to lowest):
1. Environment variables (prefixed with `DNSWATCHER_`)
2. `.env` file (loaded via godotenv)
3. Config file: `/etc/dnswatcher/dnswatcher.yaml`,
`~/.config/dnswatcher/dnswatcher.yaml`, or `./dnswatcher.yaml`
4. Defaults
### Environment Variables
| Variable | Description | Default |
|---------------------------------|--------------------------------------------|-------------|
| `PORT` | HTTP listen port | `8080` |
| `DNSWATCHER_DEBUG` | Enable debug logging | `false` |
| `DNSWATCHER_DATA_DIR` | Directory for state file | `./data` |
| `DNSWATCHER_TARGETS` | Comma-separated DNS names (auto-classified via PSL) | `""` |
| `DNSWATCHER_SLACK_WEBHOOK` | Slack incoming webhook URL | `""` |
| `DNSWATCHER_MATTERMOST_WEBHOOK` | Mattermost incoming webhook URL | `""` |
| `DNSWATCHER_NTFY_TOPIC` | ntfy topic URL | `""` |
| `DNSWATCHER_DNS_INTERVAL` | DNS check interval | `1h` |
| `DNSWATCHER_TLS_INTERVAL` | TLS check interval | `12h` |
| `DNSWATCHER_TLS_EXPIRY_WARNING` | Days before expiry to warn | `7` |
| `DNSWATCHER_SENTRY_DSN` | Sentry DSN for error reporting | `""` |
| `DNSWATCHER_MAINTENANCE_MODE` | Enable maintenance mode | `false` |
| `DNSWATCHER_METRICS_USERNAME` | Basic auth username for /metrics | `""` |
| `DNSWATCHER_METRICS_PASSWORD` | Basic auth password for /metrics | `""` |
| `DNSWATCHER_SEND_TEST_NOTIFICATION` | Send a test notification after first scan completes | `false` |
**`DNSWATCHER_TARGETS` is required.** dnswatcher will refuse to start if no
monitoring targets are configured. A monitoring daemon with nothing to monitor
is a misconfiguration, so dnswatcher fails fast with a clear error message
rather than running silently. Set `DNSWATCHER_TARGETS` to a comma-separated
list of DNS names before starting.
### Example `.env`
```sh
PORT=8080
DNSWATCHER_DEBUG=false
DNSWATCHER_DATA_DIR=./data
DNSWATCHER_TARGETS=example.com,example.org,www.example.com,api.example.com,mail.example.org
DNSWATCHER_SLACK_WEBHOOK=https://hooks.slack.com/services/T.../B.../xxx
DNSWATCHER_MATTERMOST_WEBHOOK=https://mattermost.example.com/hooks/xxx
DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-dns-alerts
DNSWATCHER_SEND_TEST_NOTIFICATION=true
```
---
## DNS Resolution Strategy
dnswatcher never uses the system's configured recursive resolver. Instead,
it performs full iterative resolution:
1. **Root servers**: Starts from the IANA root nameserver list (hardcoded,
with periodic refresh).
2. **TLD delegation**: Queries root servers for the TLD NS records.
3. **Domain delegation**: Queries TLD nameservers for the domain's NS
records.
4. **Authoritative query**: Queries all discovered authoritative
nameservers directly for the requested records.
This approach ensures:
- Independence from any upstream resolver's cache or filtering.
- Ability to detect split-horizon or inconsistent responses across
authoritative servers.
- Visibility into the full delegation chain.
For hostname monitoring, the resolver follows CNAME chains (with a
depth limit to prevent loops) before collecting terminal A/AAAA records.
---
## State File Format
The state file (`DATA_DIR/state.json`) contains the complete monitoring
snapshot. Hostname records are stored **per authoritative nameserver**,
not as a merged view, to enable inconsistency detection.
```json
{
"version": 1,
"lastUpdated": "2026-02-19T12:00:00Z",
"domains": {
"example.com": {
"nameservers": ["ns1.example.com.", "ns2.example.com."],
"lastChecked": "2026-02-19T12:00:00Z"
}
},
"hostnames": {
"www.example.com": {
"recordsByNameserver": {
"ns1.example.com.": {
"records": {
"A": ["93.184.216.34"],
"AAAA": ["2606:2800:220:1:248:1893:25c8:1946"]
},
"status": "ok",
"lastChecked": "2026-02-19T12:00:00Z"
},
"ns2.example.com.": {
"records": {
"A": ["93.184.216.34"],
"AAAA": ["2606:2800:220:1:248:1893:25c8:1946"]
},
"status": "ok",
"lastChecked": "2026-02-19T12:00:00Z"
}
},
"lastChecked": "2026-02-19T12:00:00Z"
}
},
"ports": {
"93.184.216.34:80": {
"open": true,
"hostnames": ["www.example.com"],
"lastChecked": "2026-02-19T12:00:00Z"
},
"93.184.216.34:443": {
"open": true,
"hostnames": ["www.example.com"],
"lastChecked": "2026-02-19T12:00:00Z"
}
},
"certificates": {
"93.184.216.34:443:www.example.com": {
"commonName": "www.example.com",
"issuer": "DigiCert TLS RSA SHA256 2020 CA1",
"notAfter": "2027-01-15T23:59:59Z",
"subjectAlternativeNames": ["www.example.com"],
"status": "ok",
"lastChecked": "2026-02-19T06:00:00Z"
}
}
}
```
The `status` field for each per-nameserver entry and certificate entry
tracks reachability:
| Status | Meaning |
|-------------|-------------------------------------------------|
| `ok` | Query succeeded, records are current |
| `error` | Query failed (timeout, SERVFAIL, network error) |
---
## Building
```sh
make build # Build binary to bin/dnswatcher
make test # Run tests with race detector
make lint # Run golangci-lint
make fmt # Format code
make check # Run all checks (format, lint, test, build)
make clean # Remove build artifacts
```
### Build-Time Variables
Version is injected via `-ldflags`:
```sh
go build -ldflags "-X main.Version=$(git describe --tags --always)" ./cmd/dnswatcher
```
---
## Docker
```sh
docker build -t dnswatcher .
docker run -d \
-p 8080:8080 \
-v dnswatcher-data:/var/lib/dnswatcher \
-e DNSWATCHER_TARGETS=example.com,www.example.com \
-e DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-alerts \
-e DNSWATCHER_SEND_TEST_NOTIFICATION=true \
dnswatcher
```
---
## Monitoring Lifecycle
1. **Startup**: Load state from disk. If no state file exists, start
with empty state (first check will establish baseline without
triggering change notifications).
2. **Initial check**: Immediately perform all DNS, port, and TLS checks
on startup.
3. **Periodic checks** (DNS always runs first):
- DNS checks: every `DNSWATCHER_DNS_INTERVAL` (default 1h). Also
re-run before every TLS check cycle to ensure fresh IPs.
- Port checks: every `DNSWATCHER_DNS_INTERVAL`, after DNS completes.
- TLS checks: every `DNSWATCHER_TLS_INTERVAL` (default 12h), after
DNS completes.
- Port and TLS checks always use freshly resolved IP addresses from
the DNS phase that immediately precedes them — never stale IPs
from a previous cycle.
4. **On change detection**: Send notifications to all configured
endpoints, update in-memory state, persist to disk.
5. **Shutdown**: Persist final state to disk, complete in-flight
notifications, stop gracefully.
---
## Planned Future Features (Post-1.0)
- **DNSSEC validation**: Validate the DNSSEC chain of trust during
iterative resolution and report DNSSEC failures as notifications.
---
## Project Structure
Follows the conventions defined in `REPO_POLICIES.md`, adapted from the
[upaas](https://git.eeqj.de/sneak/upaas) project template. Uses uber/fx
for dependency injection, go-chi for HTTP routing, slog for logging, and
Viper for configuration.
---
## License
License has not yet been chosen for this project. Pending decision by the
author (MIT, GPL, or WTFPL).
## Author
[@sneak](https://sneak.berlin)