# dnswatcher > ⚠️ **Pre-1.0 software.** APIs, configuration, and behavior may change > without notice. dnswatcher is a production DNS and infrastructure monitoring daemon written in Go. It watches configured DNS domains and hostnames for changes, monitors TCP port availability, tracks TLS certificate expiry, and delivers real-time notifications via Slack, Mattermost, and/or ntfy webhooks. It performs all DNS resolution itself via iterative (non-recursive) queries, tracing from root nameservers to authoritative servers directly—never relying on upstream recursive resolvers. State is persisted to a local JSON file so that monitoring survives restarts without requiring an external database. --- ## Features ### Target Classification All monitored DNS names are provided via a single `DNSWATCHER_TARGETS` list. dnswatcher uses the [Public Suffix List](https://publicsuffix.org/) to automatically classify each entry as an apex domain (eTLD+1, e.g. `example.com`, `example.co.uk`) or a hostname (subdomain, e.g. `www.example.com`). Apex domains receive NS delegation monitoring; hostnames receive per-nameserver record monitoring. Both receive port and TLS checks. ### DNS Domain Monitoring (Apex Domains) - Apex domains are identified automatically via the PSL. - Every **1 hour**, performs a full iterative trace from root servers to discover all authoritative nameservers (NS records) for each domain. - Queries **every** discovered authoritative nameserver independently. - Stores the NS record set as observed by the delegation chain. - Any change triggers a notification: - NS added to or removed from the delegation. - NS IP address changed (glue record change). ### DNS Hostname Monitoring (Subdomains) - Accepts a list of DNS hostnames (subdomains, distinguished from apex domains via the Public Suffix List). - Every **1 hour**, performs a full iterative trace to discover the authoritative nameservers for the hostname's parent domain. - Queries **each** authoritative nameserver independently for **all** record types: A, AAAA, CNAME, MX, TXT, SRV, CAA, NS. - Stores results **per nameserver**. The state for a hostname is not a merged view — it is a map from nameserver to record set. - Any observable change in any nameserver's response triggers a notification. This includes: - **Record change**: A nameserver returns different records than it did on the previous check (additions, removals, value changes). - **NS query failure**: A nameserver that previously responded becomes unreachable (timeout, SERVFAIL, REFUSED, network error). This is distinct from "responded with no records." - **NS recovery**: A previously-unreachable nameserver starts responding again. - **Inconsistency detected**: Two nameservers that previously agreed now return different record sets for the same hostname. - **Inconsistency resolved**: Nameservers that previously disagreed are now back in agreement. - **Empty response**: A nameserver that previously returned records now returns an authoritative empty response (NODATA/NXDOMAIN). ### TCP Port Monitoring - For every configured domain and hostname, constructs a deduplicated list of all IPv4 and IPv6 addresses resolved via A, AAAA, and CNAME chain resolution across all authoritative nameservers. - Checks TCP connectivity on ports **80** and **443** for each IP address. - Every **1 hour**, re-checks all ports. - Any change in port availability triggers a notification: - Port transitioned from open to closed (or vice versa). - New IP appeared (from DNS change) and its port state was recorded. - IP disappeared (from DNS change) — noted in the DNS change notification; port state for that IP is removed. ### TLS Certificate Monitoring - Every **12 hours**, for each IP address listening on port 443, connects via TLS using the correct SNI hostname. - Records the certificate's Subject CN, SANs, issuer, and expiry date. - Any change triggers a notification: - Certificate is expiring within **7 days** (warning, repeated each check until renewed or expired). - Certificate CN, issuer, or SANs changed (replacement detected, reports old and new values). - TLS connection failure to a previously-reachable IP:443 (handshake error, timeout, connection refused after previously succeeding). - TLS recovery: a previously-failing IP:443 now completes a handshake again. ### Notifications **Every observable state change produces a notification.** dnswatcher is designed as a real-time change feed — degradations, failures, recoveries, and routine changes are all reported equally. Supported notification backends: | Backend | Configuration | Payload Format | |----------------|--------------------------|------------------------------| | **Slack** | Incoming Webhook URL | Attachments with color | | **Mattermost** | Incoming Webhook URL | Slack-compatible attachments | | **ntfy** | Topic URL (e.g. `https://ntfy.sh/mytopic`) | Title + body + priority | All configured endpoints receive every notification. Notification content includes: - **DNS record changes**: Which hostname, which nameserver, what record type, old values, new values. - **DNS NS changes**: Which domain, which nameservers were added/removed. - **NS query failures**: Which nameserver failed, error type (timeout, SERVFAIL, REFUSED, network error), which hostname/domain affected. - **NS recoveries**: Which nameserver recovered, which hostname/domain. - **NS inconsistencies**: Which nameservers disagree, what each one returned, which hostname affected. - **Port changes**: Which IP:port, old state, new state, associated hostname. - **TLS expiry warnings**: Which certificate, days remaining, CN, issuer, associated hostname and IP. - **TLS certificate changes**: Old and new CN/issuer/SANs, associated hostname and IP. - **TLS connection failures/recoveries**: Which IP:port, error details, associated hostname. ### State Management - All monitoring state is kept in memory and persisted to a JSON file on disk (`DATA_DIR/state.json`). - State is loaded on startup to resume monitoring without triggering false-positive change notifications. - State is written atomically (write to temp file, then rename) to prevent corruption. ### HTTP API dnswatcher exposes a lightweight HTTP API for operational visibility: | Endpoint | Description | |---------------------------------------|--------------------------------| | `GET /health` | Health check (JSON) | | `GET /api/v1/status` | Current monitoring state | | `GET /api/v1/domains` | Configured domains and status | | `GET /api/v1/hostnames` | Configured hostnames and status| | `GET /metrics` | Prometheus metrics (optional) | --- ## Architecture ``` cmd/dnswatcher/main.go Entry point (uber/fx bootstrap) internal/ config/config.go Viper-based configuration globals/globals.go Build-time variables (version, arch) logger/logger.go slog structured logging (TTY detection) healthcheck/healthcheck.go Health check service middleware/middleware.go HTTP middleware (logging, CORS, metrics auth) handlers/handlers.go HTTP request handlers server/ server.go HTTP server lifecycle routes.go Route definitions state/state.go JSON file state persistence resolver/resolver.go Iterative DNS resolution engine portcheck/portcheck.go TCP port connectivity checker tlscheck/tlscheck.go TLS certificate inspector notify/notify.go Notification service (Slack, Mattermost, ntfy) watcher/watcher.go Main monitoring orchestrator and scheduler ``` ### Design Principles - **No recursive resolvers**: All DNS resolution is performed iteratively, tracing from root nameservers through the delegation chain to authoritative servers. - **No external database**: State is persisted as a single JSON file. - **Dependency injection**: All components are wired via [uber/fx](https://github.com/uber-go/fx). - **Structured logging**: All logs use `log/slog` with JSON output in production (TTY detection for development). - **Graceful shutdown**: All background goroutines respect context cancellation and the fx lifecycle. --- ## Configuration Configuration is loaded via [Viper](https://github.com/spf13/viper) with the following precedence (highest to lowest): 1. Environment variables (prefixed with `DNSWATCHER_`) 2. `.env` file (loaded via godotenv) 3. Config file: `/etc/dnswatcher/dnswatcher.yaml`, `~/.config/dnswatcher/dnswatcher.yaml`, or `./dnswatcher.yaml` 4. Defaults ### Environment Variables | Variable | Description | Default | |---------------------------------|--------------------------------------------|-------------| | `PORT` | HTTP listen port | `8080` | | `DNSWATCHER_DEBUG` | Enable debug logging | `false` | | `DNSWATCHER_DATA_DIR` | Directory for state file | `./data` | | `DNSWATCHER_TARGETS` | Comma-separated list of DNS names to monitor | `""` | | `DNSWATCHER_SLACK_WEBHOOK` | Slack incoming webhook URL | `""` | | `DNSWATCHER_MATTERMOST_WEBHOOK` | Mattermost incoming webhook URL | `""` | | `DNSWATCHER_NTFY_TOPIC` | ntfy topic URL | `""` | | `DNSWATCHER_DNS_INTERVAL` | DNS check interval | `1h` | | `DNSWATCHER_TLS_INTERVAL` | TLS check interval | `12h` | | `DNSWATCHER_TLS_EXPIRY_WARNING` | Days before expiry to warn | `7` | | `DNSWATCHER_SENTRY_DSN` | Sentry DSN for error reporting | `""` | | `DNSWATCHER_MAINTENANCE_MODE` | Enable maintenance mode | `false` | | `DNSWATCHER_METRICS_USERNAME` | Basic auth username for /metrics | `""` | | `DNSWATCHER_METRICS_PASSWORD` | Basic auth password for /metrics | `""` | ### Example `.env` ```sh PORT=8080 DNSWATCHER_DEBUG=false DNSWATCHER_DATA_DIR=./data DNSWATCHER_TARGETS=example.com,example.org,www.example.com,api.example.com,mail.example.org DNSWATCHER_SLACK_WEBHOOK=https://hooks.slack.com/services/T.../B.../xxx DNSWATCHER_MATTERMOST_WEBHOOK=https://mattermost.example.com/hooks/xxx DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-dns-alerts ``` --- ## DNS Resolution Strategy dnswatcher never uses the system's configured recursive resolver. Instead, it performs full iterative resolution: 1. **Root servers**: Starts from the IANA root nameserver list (hardcoded, with periodic refresh). 2. **TLD delegation**: Queries root servers for the TLD NS records. 3. **Domain delegation**: Queries TLD nameservers for the domain's NS records. 4. **Authoritative query**: Queries all discovered authoritative nameservers directly for the requested records. This approach ensures: - Independence from any upstream resolver's cache or filtering. - Ability to detect split-horizon or inconsistent responses across authoritative servers. - Visibility into the full delegation chain. For hostname monitoring, the resolver follows CNAME chains (with a depth limit to prevent loops) before collecting terminal A/AAAA records. --- ## State File Format The state file (`DATA_DIR/state.json`) contains the complete monitoring snapshot. Hostname records are stored **per authoritative nameserver**, not as a merged view, to enable inconsistency detection. ```json { "version": 1, "lastUpdated": "2026-02-19T12:00:00Z", "domains": { "example.com": { "nameservers": ["ns1.example.com.", "ns2.example.com."], "lastChecked": "2026-02-19T12:00:00Z" } }, "hostnames": { "www.example.com": { "recordsByNameserver": { "ns1.example.com.": { "records": { "A": ["93.184.216.34"], "AAAA": ["2606:2800:220:1:248:1893:25c8:1946"] }, "status": "ok", "lastChecked": "2026-02-19T12:00:00Z" }, "ns2.example.com.": { "records": { "A": ["93.184.216.34"], "AAAA": ["2606:2800:220:1:248:1893:25c8:1946"] }, "status": "ok", "lastChecked": "2026-02-19T12:00:00Z" } }, "lastChecked": "2026-02-19T12:00:00Z" } }, "ports": { "93.184.216.34:80": { "open": true, "hostname": "www.example.com", "lastChecked": "2026-02-19T12:00:00Z" }, "93.184.216.34:443": { "open": true, "hostname": "www.example.com", "lastChecked": "2026-02-19T12:00:00Z" } }, "certificates": { "93.184.216.34:443:www.example.com": { "commonName": "www.example.com", "issuer": "DigiCert TLS RSA SHA256 2020 CA1", "notAfter": "2027-01-15T23:59:59Z", "subjectAlternativeNames": ["www.example.com"], "status": "ok", "lastChecked": "2026-02-19T06:00:00Z" } } } ``` The `status` field for each per-nameserver entry and certificate entry tracks reachability: | Status | Meaning | |-------------|-------------------------------------------------| | `ok` | Query succeeded, records are current | | `error` | Query failed (timeout, SERVFAIL, network error) | | `nxdomain` | Authoritative NXDOMAIN response | | `nodata` | Authoritative empty response (NODATA) | --- ## Building ```sh make build # Build binary to bin/dnswatcher make test # Run tests with race detector make lint # Run golangci-lint make fmt # Format code make check # Run all checks (format, lint, test, build) make clean # Remove build artifacts ``` ### Build-Time Variables Version and architecture are injected via `-ldflags`: ```sh go build -ldflags "-X main.Version=$(git describe --tags --always) \ -X main.Buildarch=$(go env GOARCH)" ./cmd/dnswatcher ``` --- ## Docker ```sh docker build -t dnswatcher . docker run -d \ -p 8080:8080 \ -v dnswatcher-data:/var/lib/dnswatcher \ -e DNSWATCHER_TARGETS=example.com,www.example.com \ -e DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-alerts \ dnswatcher ``` --- ## Monitoring Lifecycle 1. **Startup**: Load state from disk. If no state file exists, start with empty state (first check will establish baseline without triggering change notifications). 2. **Initial check**: Immediately perform all DNS, port, and TLS checks on startup. 3. **Periodic checks**: - DNS and port checks: every `DNSWATCHER_DNS_INTERVAL` (default 1h). - TLS checks: every `DNSWATCHER_TLS_INTERVAL` (default 12h). 4. **On change detection**: Send notifications to all configured endpoints, update in-memory state, persist to disk. 5. **Shutdown**: Persist final state to disk, complete in-flight notifications, stop gracefully. --- ## Project Structure Follows the conventions defined in `CONVENTIONS.md`, adapted from the [upaas](https://git.eeqj.de/sneak/upaas) project template. Uses uber/fx for dependency injection, go-chi for HTTP routing, slog for logging, and Viper for configuration.