Go to file
2026-02-20 19:38:37 +01:00
.gitea/workflows security: pin all go install refs to commit SHAs 2026-02-20 03:10:39 -08:00
cmd/dnswatcher feat: implement watcher monitoring orchestrator 2026-02-19 13:48:46 -08:00
internal Merge pull request 'feat: implement TCP port connectivity checker (closes #3)' (#6) from feature/portcheck-implementation into main 2026-02-20 19:38:37 +01:00
.gitignore Initial scaffold with per-nameserver DNS monitoring model 2026-02-19 21:05:39 +01:00
.golangci.yml Initial scaffold with per-nameserver DNS monitoring model 2026-02-19 21:05:39 +01:00
CLAUDE.md Initial scaffold with per-nameserver DNS monitoring model 2026-02-19 21:05:39 +01:00
CONVENTIONS.md Initial scaffold with per-nameserver DNS monitoring model 2026-02-19 21:05:39 +01:00
Dockerfile Initial scaffold with per-nameserver DNS monitoring model 2026-02-19 21:05:39 +01:00
go.mod Merge pull request 'feat: implement TCP port connectivity checker (closes #3)' (#6) from feature/portcheck-implementation into main 2026-02-20 19:38:37 +01:00
go.sum fix: mock DNS in resolver tests for hermetic, fast unit tests 2026-02-20 05:58:51 -08:00
Makefile Initial scaffold with per-nameserver DNS monitoring model 2026-02-19 21:05:39 +01:00
README.md Add resolver API definition and comprehensive test suite 2026-02-20 05:58:51 -08:00

dnswatcher

⚠️ Pre-1.0 software. APIs, configuration, and behavior may change without notice.

dnswatcher is a production DNS and infrastructure monitoring daemon written in Go. It watches configured DNS domains and hostnames for changes, monitors TCP port availability, tracks TLS certificate expiry, and delivers real-time notifications via Slack, Mattermost, and/or ntfy webhooks.

It performs all DNS resolution itself via iterative (non-recursive) queries, tracing from root nameservers to authoritative servers directly—never relying on upstream recursive resolvers.

State is persisted to a local JSON file so that monitoring survives restarts without requiring an external database.


Features

DNS Domain Monitoring (Apex Domains)

  • Accepts a list of DNS domain names (apex domains, identified via the Public Suffix List).
  • Every 1 hour, performs a full iterative trace from root servers to discover all authoritative nameservers (NS records) for each domain.
  • Queries every discovered authoritative nameserver independently.
  • Stores the NS record set as observed by the delegation chain.
  • Any change triggers a notification:
    • NS added to or removed from the delegation.
    • NS IP address changed (glue record change).

DNS Hostname Monitoring (Subdomains)

  • Accepts a list of DNS hostnames (subdomains, distinguished from apex domains via the Public Suffix List).
  • Every 1 hour, performs a full iterative trace to discover the authoritative nameservers for the hostname's parent domain.
  • Queries each authoritative nameserver independently for all record types: A, AAAA, CNAME, MX, TXT, SRV, CAA, NS.
  • Stores results per nameserver. The state for a hostname is not a merged view — it is a map from nameserver to record set.
  • Any observable change in any nameserver's response triggers a notification. This includes:
    • Record change: A nameserver returns different records than it did on the previous check (additions, removals, value changes).
    • NS query failure: A nameserver that previously responded becomes unreachable (timeout, SERVFAIL, REFUSED, network error). This is distinct from "responded with no records."
    • NS recovery: A previously-unreachable nameserver starts responding again.
    • Inconsistency detected: Two nameservers that previously agreed now return different record sets for the same hostname.
    • Inconsistency resolved: Nameservers that previously disagreed are now back in agreement.
    • Empty response: A nameserver that previously returned records now returns an authoritative empty response (NODATA/NXDOMAIN).

TCP Port Monitoring

  • For every configured domain and hostname, constructs a deduplicated list of all IPv4 and IPv6 addresses resolved via A, AAAA, and CNAME chain resolution across all authoritative nameservers.
  • Checks TCP connectivity on ports 80 and 443 for each IP address.
  • Every 1 hour, re-checks all ports.
  • Any change in port availability triggers a notification:
    • Port transitioned from open to closed (or vice versa).
    • New IP appeared (from DNS change) and its port state was recorded.
    • IP disappeared (from DNS change) — noted in the DNS change notification; port state for that IP is removed.

TLS Certificate Monitoring

  • Every 12 hours, for each IP address listening on port 443, connects via TLS using the correct SNI hostname.
  • Records the certificate's Subject CN, SANs, issuer, and expiry date.
  • Any change triggers a notification:
    • Certificate is expiring within 7 days (warning, repeated each check until renewed or expired).
    • Certificate CN, issuer, or SANs changed (replacement detected, reports old and new values).
    • TLS connection failure to a previously-reachable IP:443 (handshake error, timeout, connection refused after previously succeeding).
    • TLS recovery: a previously-failing IP:443 now completes a handshake again.

Notifications

Every observable state change produces a notification. dnswatcher is designed as a real-time change feed — degradations, failures, recoveries, and routine changes are all reported equally.

Supported notification backends:

Backend Configuration Payload Format
Slack Incoming Webhook URL Attachments with color
Mattermost Incoming Webhook URL Slack-compatible attachments
ntfy Topic URL (e.g. https://ntfy.sh/mytopic) Title + body + priority

All configured endpoints receive every notification. Notification content includes:

  • DNS record changes: Which hostname, which nameserver, what record type, old values, new values.
  • DNS NS changes: Which domain, which nameservers were added/removed.
  • NS query failures: Which nameserver failed, error type (timeout, SERVFAIL, REFUSED, network error), which hostname/domain affected.
  • NS recoveries: Which nameserver recovered, which hostname/domain.
  • NS inconsistencies: Which nameservers disagree, what each one returned, which hostname affected.
  • Port changes: Which IP:port, old state, new state, associated hostname.
  • TLS expiry warnings: Which certificate, days remaining, CN, issuer, associated hostname and IP.
  • TLS certificate changes: Old and new CN/issuer/SANs, associated hostname and IP.
  • TLS connection failures/recoveries: Which IP:port, error details, associated hostname.

State Management

  • All monitoring state is kept in memory and persisted to a JSON file on disk (DATA_DIR/state.json).
  • State is loaded on startup to resume monitoring without triggering false-positive change notifications.
  • State is written atomically (write to temp file, then rename) to prevent corruption.

HTTP API

dnswatcher exposes a lightweight HTTP API for operational visibility:

Endpoint Description
GET /health Health check (JSON)
GET /api/v1/status Current monitoring state
GET /api/v1/domains Configured domains and status
GET /api/v1/hostnames Configured hostnames and status
GET /metrics Prometheus metrics (optional)

Architecture

cmd/dnswatcher/main.go         Entry point (uber/fx bootstrap)

internal/
  config/config.go             Viper-based configuration
  globals/globals.go           Build-time variables (version, arch)
  logger/logger.go             slog structured logging (TTY detection)
  healthcheck/healthcheck.go   Health check service
  middleware/middleware.go      HTTP middleware (logging, CORS, metrics auth)
  handlers/handlers.go         HTTP request handlers
  server/
    server.go                  HTTP server lifecycle
    routes.go                  Route definitions
  state/state.go               JSON file state persistence
  resolver/resolver.go         Iterative DNS resolution engine
  portcheck/portcheck.go       TCP port connectivity checker
  tlscheck/tlscheck.go        TLS certificate inspector
  notify/notify.go             Notification service (Slack, Mattermost, ntfy)
  watcher/watcher.go           Main monitoring orchestrator and scheduler

Design Principles

  • No recursive resolvers: All DNS resolution is performed iteratively, tracing from root nameservers through the delegation chain to authoritative servers.
  • No external database: State is persisted as a single JSON file.
  • Dependency injection: All components are wired via uber/fx.
  • Structured logging: All logs use log/slog with JSON output in production (TTY detection for development).
  • Graceful shutdown: All background goroutines respect context cancellation and the fx lifecycle.

Configuration

Configuration is loaded via Viper with the following precedence (highest to lowest):

  1. Environment variables (prefixed with DNSWATCHER_)
  2. .env file (loaded via godotenv)
  3. Config file: /etc/dnswatcher/dnswatcher.yaml, ~/.config/dnswatcher/dnswatcher.yaml, or ./dnswatcher.yaml
  4. Defaults

Environment Variables

Variable Description Default
PORT HTTP listen port 8080
DNSWATCHER_DEBUG Enable debug logging false
DNSWATCHER_DATA_DIR Directory for state file ./data
DNSWATCHER_TARGETS Comma-separated DNS names (auto-classified via PSL) ""
DNSWATCHER_SLACK_WEBHOOK Slack incoming webhook URL ""
DNSWATCHER_MATTERMOST_WEBHOOK Mattermost incoming webhook URL ""
DNSWATCHER_NTFY_TOPIC ntfy topic URL ""
DNSWATCHER_DNS_INTERVAL DNS check interval 1h
DNSWATCHER_TLS_INTERVAL TLS check interval 12h
DNSWATCHER_TLS_EXPIRY_WARNING Days before expiry to warn 7
DNSWATCHER_SENTRY_DSN Sentry DSN for error reporting ""
DNSWATCHER_MAINTENANCE_MODE Enable maintenance mode false
DNSWATCHER_METRICS_USERNAME Basic auth username for /metrics ""
DNSWATCHER_METRICS_PASSWORD Basic auth password for /metrics ""

Example .env

PORT=8080
DNSWATCHER_DEBUG=false
DNSWATCHER_DATA_DIR=./data
DNSWATCHER_TARGETS=example.com,example.org,www.example.com,api.example.com,mail.example.org
DNSWATCHER_SLACK_WEBHOOK=https://hooks.slack.com/services/T.../B.../xxx
DNSWATCHER_MATTERMOST_WEBHOOK=https://mattermost.example.com/hooks/xxx
DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-dns-alerts

DNS Resolution Strategy

dnswatcher never uses the system's configured recursive resolver. Instead, it performs full iterative resolution:

  1. Root servers: Starts from the IANA root nameserver list (hardcoded, with periodic refresh).
  2. TLD delegation: Queries root servers for the TLD NS records.
  3. Domain delegation: Queries TLD nameservers for the domain's NS records.
  4. Authoritative query: Queries all discovered authoritative nameservers directly for the requested records.

This approach ensures:

  • Independence from any upstream resolver's cache or filtering.
  • Ability to detect split-horizon or inconsistent responses across authoritative servers.
  • Visibility into the full delegation chain.

For hostname monitoring, the resolver follows CNAME chains (with a depth limit to prevent loops) before collecting terminal A/AAAA records.


State File Format

The state file (DATA_DIR/state.json) contains the complete monitoring snapshot. Hostname records are stored per authoritative nameserver, not as a merged view, to enable inconsistency detection.

{
  "version": 1,
  "lastUpdated": "2026-02-19T12:00:00Z",
  "domains": {
    "example.com": {
      "nameservers": ["ns1.example.com.", "ns2.example.com."],
      "lastChecked": "2026-02-19T12:00:00Z"
    }
  },
  "hostnames": {
    "www.example.com": {
      "recordsByNameserver": {
        "ns1.example.com.": {
          "records": {
            "A": ["93.184.216.34"],
            "AAAA": ["2606:2800:220:1:248:1893:25c8:1946"]
          },
          "status": "ok",
          "lastChecked": "2026-02-19T12:00:00Z"
        },
        "ns2.example.com.": {
          "records": {
            "A": ["93.184.216.34"],
            "AAAA": ["2606:2800:220:1:248:1893:25c8:1946"]
          },
          "status": "ok",
          "lastChecked": "2026-02-19T12:00:00Z"
        }
      },
      "lastChecked": "2026-02-19T12:00:00Z"
    }
  },
  "ports": {
    "93.184.216.34:80": {
      "open": true,
      "hostname": "www.example.com",
      "lastChecked": "2026-02-19T12:00:00Z"
    },
    "93.184.216.34:443": {
      "open": true,
      "hostname": "www.example.com",
      "lastChecked": "2026-02-19T12:00:00Z"
    }
  },
  "certificates": {
    "93.184.216.34:443:www.example.com": {
      "commonName": "www.example.com",
      "issuer": "DigiCert TLS RSA SHA256 2020 CA1",
      "notAfter": "2027-01-15T23:59:59Z",
      "subjectAlternativeNames": ["www.example.com"],
      "status": "ok",
      "lastChecked": "2026-02-19T06:00:00Z"
    }
  }
}

The status field for each per-nameserver entry and certificate entry tracks reachability:

Status Meaning
ok Query succeeded, records are current
error Query failed (timeout, SERVFAIL, network error)
nxdomain Authoritative NXDOMAIN response
nodata Authoritative empty response (NODATA)

Building

make build      # Build binary to bin/dnswatcher
make test       # Run tests with race detector
make lint       # Run golangci-lint
make fmt        # Format code
make check      # Run all checks (format, lint, test, build)
make clean      # Remove build artifacts

Build-Time Variables

Version and architecture are injected via -ldflags:

go build -ldflags "-X main.Version=$(git describe --tags --always) \
  -X main.Buildarch=$(go env GOARCH)" ./cmd/dnswatcher

Docker

docker build -t dnswatcher .
docker run -d \
  -p 8080:8080 \
  -v dnswatcher-data:/var/lib/dnswatcher \
  -e DNSWATCHER_TARGETS=example.com,www.example.com \
  -e DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-alerts \
  dnswatcher

Monitoring Lifecycle

  1. Startup: Load state from disk. If no state file exists, start with empty state (first check will establish baseline without triggering change notifications).
  2. Initial check: Immediately perform all DNS, port, and TLS checks on startup.
  3. Periodic checks:
    • DNS and port checks: every DNSWATCHER_DNS_INTERVAL (default 1h).
    • TLS checks: every DNSWATCHER_TLS_INTERVAL (default 12h).
  4. On change detection: Send notifications to all configured endpoints, update in-memory state, persist to disk.
  5. Shutdown: Persist final state to disk, complete in-flight notifications, stop gracefully.

Planned Future Features (Post-1.0)

  • DNSSEC validation: Validate the DNSSEC chain of trust during iterative resolution and report DNSSEC failures as notifications.

Project Structure

Follows the conventions defined in CONVENTIONS.md, adapted from the upaas project template. Uses uber/fx for dependency injection, go-chi for HTTP routing, slog for logging, and Viper for configuration.