|
|
||
|---|---|---|
| .gitea/workflows | ||
| cmd/dnswatcher | ||
| internal | ||
| .gitignore | ||
| .golangci.yml | ||
| CLAUDE.md | ||
| CONVENTIONS.md | ||
| Dockerfile | ||
| go.mod | ||
| go.sum | ||
| Makefile | ||
| README.md | ||
dnswatcher
⚠️ Pre-1.0 software. APIs, configuration, and behavior may change without notice.
dnswatcher is a production DNS and infrastructure monitoring daemon written in Go. It watches configured DNS domains and hostnames for changes, monitors TCP port availability, tracks TLS certificate expiry, and delivers real-time notifications via Slack, Mattermost, and/or ntfy webhooks.
It performs all DNS resolution itself via iterative (non-recursive) queries, tracing from root nameservers to authoritative servers directly—never relying on upstream recursive resolvers.
State is persisted to a local JSON file so that monitoring survives restarts without requiring an external database.
Features
DNS Domain Monitoring (Apex Domains)
- Accepts a list of DNS domain names (apex domains, identified via the Public Suffix List).
- Every 1 hour, performs a full iterative trace from root servers to discover all authoritative nameservers (NS records) for each domain.
- Queries every discovered authoritative nameserver independently.
- Stores the NS record set as observed by the delegation chain.
- Any change triggers a notification:
- NS added to or removed from the delegation.
- NS IP address changed (glue record change).
DNS Hostname Monitoring (Subdomains)
- Accepts a list of DNS hostnames (subdomains, distinguished from apex domains via the Public Suffix List).
- Every 1 hour, performs a full iterative trace to discover the authoritative nameservers for the hostname's parent domain.
- Queries each authoritative nameserver independently for all record types: A, AAAA, CNAME, MX, TXT, SRV, CAA, NS.
- Stores results per nameserver. The state for a hostname is not a merged view — it is a map from nameserver to record set.
- Any observable change in any nameserver's response triggers a
notification. This includes:
- Record change: A nameserver returns different records than it did on the previous check (additions, removals, value changes).
- NS query failure: A nameserver that previously responded becomes unreachable (timeout, SERVFAIL, REFUSED, network error). This is distinct from "responded with no records."
- NS recovery: A previously-unreachable nameserver starts responding again.
- Inconsistency detected: Two nameservers that previously agreed now return different record sets for the same hostname.
- Inconsistency resolved: Nameservers that previously disagreed are now back in agreement.
- Empty response: A nameserver that previously returned records now returns an authoritative empty response (NODATA/NXDOMAIN).
TCP Port Monitoring
- For every configured domain and hostname, constructs a deduplicated list of all IPv4 and IPv6 addresses resolved via A, AAAA, and CNAME chain resolution across all authoritative nameservers.
- Checks TCP connectivity on ports 80 and 443 for each IP address.
- Every 1 hour, re-checks all ports.
- Any change in port availability triggers a notification:
- Port transitioned from open to closed (or vice versa).
- New IP appeared (from DNS change) and its port state was recorded.
- IP disappeared (from DNS change) — noted in the DNS change notification; port state for that IP is removed.
TLS Certificate Monitoring
- Every 12 hours, for each IP address listening on port 443, connects via TLS using the correct SNI hostname.
- Records the certificate's Subject CN, SANs, issuer, and expiry date.
- Any change triggers a notification:
- Certificate is expiring within 7 days (warning, repeated each check until renewed or expired).
- Certificate CN, issuer, or SANs changed (replacement detected, reports old and new values).
- TLS connection failure to a previously-reachable IP:443 (handshake error, timeout, connection refused after previously succeeding).
- TLS recovery: a previously-failing IP:443 now completes a handshake again.
Notifications
Every observable state change produces a notification. dnswatcher is designed as a real-time change feed — degradations, failures, recoveries, and routine changes are all reported equally.
Supported notification backends:
| Backend | Configuration | Payload Format |
|---|---|---|
| Slack | Incoming Webhook URL | Attachments with color |
| Mattermost | Incoming Webhook URL | Slack-compatible attachments |
| ntfy | Topic URL (e.g. https://ntfy.sh/mytopic) |
Title + body + priority |
All configured endpoints receive every notification. Notification content includes:
- DNS record changes: Which hostname, which nameserver, what record type, old values, new values.
- DNS NS changes: Which domain, which nameservers were added/removed.
- NS query failures: Which nameserver failed, error type (timeout, SERVFAIL, REFUSED, network error), which hostname/domain affected.
- NS recoveries: Which nameserver recovered, which hostname/domain.
- NS inconsistencies: Which nameservers disagree, what each one returned, which hostname affected.
- Port changes: Which IP:port, old state, new state, associated hostname.
- TLS expiry warnings: Which certificate, days remaining, CN, issuer, associated hostname and IP.
- TLS certificate changes: Old and new CN/issuer/SANs, associated hostname and IP.
- TLS connection failures/recoveries: Which IP:port, error details, associated hostname.
State Management
- All monitoring state is kept in memory and persisted to a JSON file on
disk (
DATA_DIR/state.json). - State is loaded on startup to resume monitoring without triggering false-positive change notifications.
- State is written atomically (write to temp file, then rename) to prevent corruption.
HTTP API
dnswatcher exposes a lightweight HTTP API for operational visibility:
| Endpoint | Description |
|---|---|
GET /health |
Health check (JSON) |
GET /api/v1/status |
Current monitoring state |
GET /api/v1/domains |
Configured domains and status |
GET /api/v1/hostnames |
Configured hostnames and status |
GET /metrics |
Prometheus metrics (optional) |
Architecture
cmd/dnswatcher/main.go Entry point (uber/fx bootstrap)
internal/
config/config.go Viper-based configuration
globals/globals.go Build-time variables (version, arch)
logger/logger.go slog structured logging (TTY detection)
healthcheck/healthcheck.go Health check service
middleware/middleware.go HTTP middleware (logging, CORS, metrics auth)
handlers/handlers.go HTTP request handlers
server/
server.go HTTP server lifecycle
routes.go Route definitions
state/state.go JSON file state persistence
resolver/resolver.go Iterative DNS resolution engine
portcheck/portcheck.go TCP port connectivity checker
tlscheck/tlscheck.go TLS certificate inspector
notify/notify.go Notification service (Slack, Mattermost, ntfy)
watcher/watcher.go Main monitoring orchestrator and scheduler
Design Principles
- No recursive resolvers: All DNS resolution is performed iteratively, tracing from root nameservers through the delegation chain to authoritative servers.
- No external database: State is persisted as a single JSON file.
- Dependency injection: All components are wired via uber/fx.
- Structured logging: All logs use
log/slogwith JSON output in production (TTY detection for development). - Graceful shutdown: All background goroutines respect context cancellation and the fx lifecycle.
Configuration
Configuration is loaded via Viper with the following precedence (highest to lowest):
- Environment variables (prefixed with
DNSWATCHER_) .envfile (loaded via godotenv)- Config file:
/etc/dnswatcher/dnswatcher.yaml,~/.config/dnswatcher/dnswatcher.yaml, or./dnswatcher.yaml - Defaults
Environment Variables
| Variable | Description | Default |
|---|---|---|
PORT |
HTTP listen port | 8080 |
DNSWATCHER_DEBUG |
Enable debug logging | false |
DNSWATCHER_DATA_DIR |
Directory for state file | ./data |
DNSWATCHER_TARGETS |
Comma-separated DNS names (auto-classified via PSL) | "" |
DNSWATCHER_SLACK_WEBHOOK |
Slack incoming webhook URL | "" |
DNSWATCHER_MATTERMOST_WEBHOOK |
Mattermost incoming webhook URL | "" |
DNSWATCHER_NTFY_TOPIC |
ntfy topic URL | "" |
DNSWATCHER_DNS_INTERVAL |
DNS check interval | 1h |
DNSWATCHER_TLS_INTERVAL |
TLS check interval | 12h |
DNSWATCHER_TLS_EXPIRY_WARNING |
Days before expiry to warn | 7 |
DNSWATCHER_SENTRY_DSN |
Sentry DSN for error reporting | "" |
DNSWATCHER_MAINTENANCE_MODE |
Enable maintenance mode | false |
DNSWATCHER_METRICS_USERNAME |
Basic auth username for /metrics | "" |
DNSWATCHER_METRICS_PASSWORD |
Basic auth password for /metrics | "" |
Example .env
PORT=8080
DNSWATCHER_DEBUG=false
DNSWATCHER_DATA_DIR=./data
DNSWATCHER_TARGETS=example.com,example.org,www.example.com,api.example.com,mail.example.org
DNSWATCHER_SLACK_WEBHOOK=https://hooks.slack.com/services/T.../B.../xxx
DNSWATCHER_MATTERMOST_WEBHOOK=https://mattermost.example.com/hooks/xxx
DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-dns-alerts
DNS Resolution Strategy
dnswatcher never uses the system's configured recursive resolver. Instead, it performs full iterative resolution:
- Root servers: Starts from the IANA root nameserver list (hardcoded, with periodic refresh).
- TLD delegation: Queries root servers for the TLD NS records.
- Domain delegation: Queries TLD nameservers for the domain's NS records.
- Authoritative query: Queries all discovered authoritative nameservers directly for the requested records.
This approach ensures:
- Independence from any upstream resolver's cache or filtering.
- Ability to detect split-horizon or inconsistent responses across authoritative servers.
- Visibility into the full delegation chain.
For hostname monitoring, the resolver follows CNAME chains (with a depth limit to prevent loops) before collecting terminal A/AAAA records.
State File Format
The state file (DATA_DIR/state.json) contains the complete monitoring
snapshot. Hostname records are stored per authoritative nameserver,
not as a merged view, to enable inconsistency detection.
{
"version": 1,
"lastUpdated": "2026-02-19T12:00:00Z",
"domains": {
"example.com": {
"nameservers": ["ns1.example.com.", "ns2.example.com."],
"lastChecked": "2026-02-19T12:00:00Z"
}
},
"hostnames": {
"www.example.com": {
"recordsByNameserver": {
"ns1.example.com.": {
"records": {
"A": ["93.184.216.34"],
"AAAA": ["2606:2800:220:1:248:1893:25c8:1946"]
},
"status": "ok",
"lastChecked": "2026-02-19T12:00:00Z"
},
"ns2.example.com.": {
"records": {
"A": ["93.184.216.34"],
"AAAA": ["2606:2800:220:1:248:1893:25c8:1946"]
},
"status": "ok",
"lastChecked": "2026-02-19T12:00:00Z"
}
},
"lastChecked": "2026-02-19T12:00:00Z"
}
},
"ports": {
"93.184.216.34:80": {
"open": true,
"hostname": "www.example.com",
"lastChecked": "2026-02-19T12:00:00Z"
},
"93.184.216.34:443": {
"open": true,
"hostname": "www.example.com",
"lastChecked": "2026-02-19T12:00:00Z"
}
},
"certificates": {
"93.184.216.34:443:www.example.com": {
"commonName": "www.example.com",
"issuer": "DigiCert TLS RSA SHA256 2020 CA1",
"notAfter": "2027-01-15T23:59:59Z",
"subjectAlternativeNames": ["www.example.com"],
"status": "ok",
"lastChecked": "2026-02-19T06:00:00Z"
}
}
}
The status field for each per-nameserver entry and certificate entry
tracks reachability:
| Status | Meaning |
|---|---|
ok |
Query succeeded, records are current |
error |
Query failed (timeout, SERVFAIL, network error) |
nxdomain |
Authoritative NXDOMAIN response |
nodata |
Authoritative empty response (NODATA) |
Building
make build # Build binary to bin/dnswatcher
make test # Run tests with race detector
make lint # Run golangci-lint
make fmt # Format code
make check # Run all checks (format, lint, test, build)
make clean # Remove build artifacts
Build-Time Variables
Version and architecture are injected via -ldflags:
go build -ldflags "-X main.Version=$(git describe --tags --always) \
-X main.Buildarch=$(go env GOARCH)" ./cmd/dnswatcher
Docker
docker build -t dnswatcher .
docker run -d \
-p 8080:8080 \
-v dnswatcher-data:/var/lib/dnswatcher \
-e DNSWATCHER_TARGETS=example.com,www.example.com \
-e DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-alerts \
dnswatcher
Monitoring Lifecycle
- Startup: Load state from disk. If no state file exists, start with empty state (first check will establish baseline without triggering change notifications).
- Initial check: Immediately perform all DNS, port, and TLS checks on startup.
- Periodic checks:
- DNS and port checks: every
DNSWATCHER_DNS_INTERVAL(default 1h). - TLS checks: every
DNSWATCHER_TLS_INTERVAL(default 12h).
- DNS and port checks: every
- On change detection: Send notifications to all configured endpoints, update in-memory state, persist to disk.
- Shutdown: Persist final state to disk, complete in-flight notifications, stop gracefully.
Planned Future Features (Post-1.0)
- DNSSEC validation: Validate the DNSSEC chain of trust during iterative resolution and report DNSSEC failures as notifications.
Project Structure
Follows the conventions defined in CONVENTIONS.md, adapted from the
upaas project template. Uses uber/fx
for dependency injection, go-chi for HTTP routing, slog for logging, and
Viper for configuration.