All checks were successful
check / check (push) Successful in 46s
Port state keys are ip:port with a single hostname field. When multiple hostnames resolve to the same IP (shared hosting, CDN), only one hostname was associated. This caused orphaned port state when that hostname removed the IP from DNS while the IP remained valid for other hostnames. Changes: - PortState.Hostname (string) → PortState.Hostnames ([]string) - Custom UnmarshalJSON for backward compatibility with old state files that have single 'hostname' field (migrated to single-element slice) - Refactored checkAllPorts to build IP:port→hostname associations first, then check each unique IP:port once with all associated hostnames - Port state entries are cleaned up when no hostnames reference them - Port change notifications now list all associated hostnames - Added DeletePortState and GetAllPortKeys methods to state - Updated README state file format documentation Closes #55
410 lines
16 KiB
Markdown
410 lines
16 KiB
Markdown
# dnswatcher
|
|
|
|
dnswatcher is a pre-1.0 Go daemon by [@sneak](https://sneak.berlin) that monitors DNS records, TCP port availability, and TLS certificates, delivering real-time change notifications via Slack, Mattermost, and ntfy webhooks.
|
|
|
|
> ⚠️ Pre-1.0 software. APIs, configuration, and behavior may change without notice.
|
|
|
|
dnswatcher watches configured DNS domains and hostnames for changes, monitors TCP
|
|
port availability, tracks TLS certificate expiry, and delivers real-time
|
|
notifications via Slack, Mattermost, and/or ntfy webhooks.
|
|
|
|
It performs all DNS resolution itself via iterative (non-recursive) queries,
|
|
tracing from root nameservers to authoritative servers directly—never relying
|
|
on upstream recursive resolvers.
|
|
|
|
State is persisted to a local JSON file so that monitoring survives restarts
|
|
without requiring an external database.
|
|
|
|
---
|
|
|
|
## Features
|
|
|
|
### DNS Domain Monitoring (Apex Domains)
|
|
|
|
- Accepts a list of DNS domain names (apex domains, identified via the
|
|
[Public Suffix List](https://publicsuffix.org/)).
|
|
- Every **1 hour**, performs a full iterative trace from root servers to
|
|
discover all authoritative nameservers (NS records) for each domain.
|
|
- Queries **every** discovered authoritative nameserver independently.
|
|
- Stores the NS record set as observed by the delegation chain.
|
|
- Any change triggers a notification:
|
|
- NS added to or removed from the delegation.
|
|
- NS IP address changed (glue record change).
|
|
|
|
### DNS Hostname Monitoring (Subdomains)
|
|
|
|
- Accepts a list of DNS hostnames (subdomains, distinguished from apex
|
|
domains via the Public Suffix List).
|
|
- Every **1 hour**, performs a full iterative trace to discover the
|
|
authoritative nameservers for the hostname's parent domain.
|
|
- Queries **each** authoritative nameserver independently for **all**
|
|
record types: A, AAAA, CNAME, MX, TXT, SRV, CAA, NS.
|
|
- Stores results **per nameserver**. The state for a hostname is not a
|
|
merged view — it is a map from nameserver to record set.
|
|
- Any observable change in any nameserver's response triggers a
|
|
notification. This includes:
|
|
- **Record change**: A nameserver returns different records than it
|
|
did on the previous check (additions, removals, value changes).
|
|
- **NS query failure**: A nameserver that previously responded
|
|
becomes unreachable (timeout, SERVFAIL, REFUSED, network error).
|
|
This is distinct from "responded with no records."
|
|
- **NS recovery**: A previously-unreachable nameserver starts
|
|
responding again.
|
|
- **Inconsistency detected**: Two nameservers that previously agreed
|
|
now return different record sets for the same hostname.
|
|
- **Inconsistency resolved**: Nameservers that previously disagreed
|
|
are now back in agreement.
|
|
- **Empty response**: A nameserver that previously returned records
|
|
now returns an authoritative empty response (NODATA/NXDOMAIN).
|
|
|
|
### TCP Port Monitoring
|
|
|
|
- For every configured domain and hostname, constructs a deduplicated list
|
|
of all IPv4 and IPv6 addresses resolved via A, AAAA, and CNAME chain
|
|
resolution across all authoritative nameservers.
|
|
- Checks TCP connectivity on ports **80** and **443** for each IP address.
|
|
- Every **1 hour**, re-checks all ports.
|
|
- Any change in port availability triggers a notification:
|
|
- Port transitioned from open to closed (or vice versa).
|
|
- New IP appeared (from DNS change) and its port state was recorded.
|
|
- IP disappeared (from DNS change) — noted in the DNS change
|
|
notification; port state for that IP is removed.
|
|
|
|
### TLS Certificate Monitoring
|
|
|
|
- Every **12 hours**, for each IP address listening on port 443, connects
|
|
via TLS using the correct SNI hostname.
|
|
- Records the certificate's Subject CN, SANs, issuer, and expiry date.
|
|
- Any change triggers a notification:
|
|
- Certificate is expiring within **7 days** (warning, repeated each
|
|
check until renewed or expired).
|
|
- Certificate CN, issuer, or SANs changed (replacement detected,
|
|
reports old and new values).
|
|
- TLS connection failure to a previously-reachable IP:443 (handshake
|
|
error, timeout, connection refused after previously succeeding).
|
|
- TLS recovery: a previously-failing IP:443 now completes a
|
|
handshake again.
|
|
|
|
### Notifications
|
|
|
|
**Every observable state change produces a notification.** dnswatcher is
|
|
designed as a real-time change feed — degradations, failures, recoveries,
|
|
and routine changes are all reported equally.
|
|
|
|
Supported notification backends:
|
|
|
|
| Backend | Configuration | Payload Format |
|
|
|----------------|--------------------------|------------------------------|
|
|
| **Slack** | Incoming Webhook URL | Attachments with color |
|
|
| **Mattermost** | Incoming Webhook URL | Slack-compatible attachments |
|
|
| **ntfy** | Topic URL (e.g. `https://ntfy.sh/mytopic`) | Title + body + priority |
|
|
|
|
All configured endpoints receive every notification. Notification content
|
|
includes:
|
|
|
|
- **DNS record changes**: Which hostname, which nameserver, what record
|
|
type, old values, new values.
|
|
- **DNS NS changes**: Which domain, which nameservers were added/removed.
|
|
- **NS query failures**: Which nameserver failed, error type (timeout,
|
|
SERVFAIL, REFUSED, network error), which hostname/domain affected.
|
|
- **NS recoveries**: Which nameserver recovered, which hostname/domain.
|
|
- **NS inconsistencies**: Which nameservers disagree, what each one
|
|
returned, which hostname affected.
|
|
- **Port changes**: Which IP:port, old state, new state, all associated
|
|
hostnames.
|
|
- **TLS expiry warnings**: Which certificate, days remaining, CN,
|
|
issuer, associated hostname and IP.
|
|
- **TLS certificate changes**: Old and new CN/issuer/SANs, associated
|
|
hostname and IP.
|
|
- **TLS connection failures/recoveries**: Which IP:port, error details,
|
|
associated hostname.
|
|
|
|
### State Management
|
|
|
|
- All monitoring state is kept in memory and persisted to a JSON file on
|
|
disk (`DATA_DIR/state.json`).
|
|
- State is loaded on startup to resume monitoring without triggering
|
|
false-positive change notifications.
|
|
- State is written atomically (write to temp file, then rename) to prevent
|
|
corruption.
|
|
|
|
### HTTP API
|
|
|
|
dnswatcher exposes a lightweight HTTP API for operational visibility:
|
|
|
|
| Endpoint | Description |
|
|
|---------------------------------------|--------------------------------|
|
|
| `GET /health` | Health check (JSON) |
|
|
| `GET /api/v1/status` | Current monitoring state |
|
|
| `GET /api/v1/domains` | Configured domains and status |
|
|
| `GET /api/v1/hostnames` | Configured hostnames and status|
|
|
| `GET /metrics` | Prometheus metrics (optional) |
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
cmd/dnswatcher/main.go Entry point (uber/fx bootstrap)
|
|
|
|
internal/
|
|
config/config.go Viper-based configuration
|
|
globals/globals.go Build-time variables (version, arch)
|
|
logger/logger.go slog structured logging (TTY detection)
|
|
healthcheck/healthcheck.go Health check service
|
|
middleware/middleware.go HTTP middleware (logging, CORS, metrics auth)
|
|
handlers/handlers.go HTTP request handlers
|
|
server/
|
|
server.go HTTP server lifecycle
|
|
routes.go Route definitions
|
|
state/state.go JSON file state persistence
|
|
resolver/resolver.go Iterative DNS resolution engine
|
|
portcheck/portcheck.go TCP port connectivity checker
|
|
tlscheck/tlscheck.go TLS certificate inspector
|
|
notify/notify.go Notification service (Slack, Mattermost, ntfy)
|
|
watcher/watcher.go Main monitoring orchestrator and scheduler
|
|
```
|
|
|
|
### Design Principles
|
|
|
|
- **No recursive resolvers**: All DNS resolution is performed iteratively,
|
|
tracing from root nameservers through the delegation chain to
|
|
authoritative servers.
|
|
- **No external database**: State is persisted as a single JSON file.
|
|
- **Dependency injection**: All components are wired via
|
|
[uber/fx](https://github.com/uber-go/fx).
|
|
- **Structured logging**: All logs use `log/slog` with JSON output in
|
|
production (TTY detection for development).
|
|
- **Graceful shutdown**: All background goroutines respect context
|
|
cancellation and the fx lifecycle.
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
Configuration is loaded via [Viper](https://github.com/spf13/viper) with
|
|
the following precedence (highest to lowest):
|
|
|
|
1. Environment variables (prefixed with `DNSWATCHER_`)
|
|
2. `.env` file (loaded via godotenv)
|
|
3. Config file: `/etc/dnswatcher/dnswatcher.yaml`,
|
|
`~/.config/dnswatcher/dnswatcher.yaml`, or `./dnswatcher.yaml`
|
|
4. Defaults
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Description | Default |
|
|
|---------------------------------|--------------------------------------------|-------------|
|
|
| `PORT` | HTTP listen port | `8080` |
|
|
| `DNSWATCHER_DEBUG` | Enable debug logging | `false` |
|
|
| `DNSWATCHER_DATA_DIR` | Directory for state file | `./data` |
|
|
| `DNSWATCHER_TARGETS` | Comma-separated DNS names (auto-classified via PSL) | `""` |
|
|
| `DNSWATCHER_SLACK_WEBHOOK` | Slack incoming webhook URL | `""` |
|
|
| `DNSWATCHER_MATTERMOST_WEBHOOK` | Mattermost incoming webhook URL | `""` |
|
|
| `DNSWATCHER_NTFY_TOPIC` | ntfy topic URL | `""` |
|
|
| `DNSWATCHER_DNS_INTERVAL` | DNS check interval | `1h` |
|
|
| `DNSWATCHER_TLS_INTERVAL` | TLS check interval | `12h` |
|
|
| `DNSWATCHER_TLS_EXPIRY_WARNING` | Days before expiry to warn | `7` |
|
|
| `DNSWATCHER_SENTRY_DSN` | Sentry DSN for error reporting | `""` |
|
|
| `DNSWATCHER_MAINTENANCE_MODE` | Enable maintenance mode | `false` |
|
|
| `DNSWATCHER_METRICS_USERNAME` | Basic auth username for /metrics | `""` |
|
|
| `DNSWATCHER_METRICS_PASSWORD` | Basic auth password for /metrics | `""` |
|
|
|
|
### Example `.env`
|
|
|
|
```sh
|
|
PORT=8080
|
|
DNSWATCHER_DEBUG=false
|
|
DNSWATCHER_DATA_DIR=./data
|
|
DNSWATCHER_TARGETS=example.com,example.org,www.example.com,api.example.com,mail.example.org
|
|
DNSWATCHER_SLACK_WEBHOOK=https://hooks.slack.com/services/T.../B.../xxx
|
|
DNSWATCHER_MATTERMOST_WEBHOOK=https://mattermost.example.com/hooks/xxx
|
|
DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-dns-alerts
|
|
```
|
|
|
|
---
|
|
|
|
## DNS Resolution Strategy
|
|
|
|
dnswatcher never uses the system's configured recursive resolver. Instead,
|
|
it performs full iterative resolution:
|
|
|
|
1. **Root servers**: Starts from the IANA root nameserver list (hardcoded,
|
|
with periodic refresh).
|
|
2. **TLD delegation**: Queries root servers for the TLD NS records.
|
|
3. **Domain delegation**: Queries TLD nameservers for the domain's NS
|
|
records.
|
|
4. **Authoritative query**: Queries all discovered authoritative
|
|
nameservers directly for the requested records.
|
|
|
|
This approach ensures:
|
|
- Independence from any upstream resolver's cache or filtering.
|
|
- Ability to detect split-horizon or inconsistent responses across
|
|
authoritative servers.
|
|
- Visibility into the full delegation chain.
|
|
|
|
For hostname monitoring, the resolver follows CNAME chains (with a
|
|
depth limit to prevent loops) before collecting terminal A/AAAA records.
|
|
|
|
---
|
|
|
|
## State File Format
|
|
|
|
The state file (`DATA_DIR/state.json`) contains the complete monitoring
|
|
snapshot. Hostname records are stored **per authoritative nameserver**,
|
|
not as a merged view, to enable inconsistency detection.
|
|
|
|
```json
|
|
{
|
|
"version": 1,
|
|
"lastUpdated": "2026-02-19T12:00:00Z",
|
|
"domains": {
|
|
"example.com": {
|
|
"nameservers": ["ns1.example.com.", "ns2.example.com."],
|
|
"lastChecked": "2026-02-19T12:00:00Z"
|
|
}
|
|
},
|
|
"hostnames": {
|
|
"www.example.com": {
|
|
"recordsByNameserver": {
|
|
"ns1.example.com.": {
|
|
"records": {
|
|
"A": ["93.184.216.34"],
|
|
"AAAA": ["2606:2800:220:1:248:1893:25c8:1946"]
|
|
},
|
|
"status": "ok",
|
|
"lastChecked": "2026-02-19T12:00:00Z"
|
|
},
|
|
"ns2.example.com.": {
|
|
"records": {
|
|
"A": ["93.184.216.34"],
|
|
"AAAA": ["2606:2800:220:1:248:1893:25c8:1946"]
|
|
},
|
|
"status": "ok",
|
|
"lastChecked": "2026-02-19T12:00:00Z"
|
|
}
|
|
},
|
|
"lastChecked": "2026-02-19T12:00:00Z"
|
|
}
|
|
},
|
|
"ports": {
|
|
"93.184.216.34:80": {
|
|
"open": true,
|
|
"hostnames": ["www.example.com"],
|
|
"lastChecked": "2026-02-19T12:00:00Z"
|
|
},
|
|
"93.184.216.34:443": {
|
|
"open": true,
|
|
"hostnames": ["www.example.com"],
|
|
"lastChecked": "2026-02-19T12:00:00Z"
|
|
}
|
|
},
|
|
"certificates": {
|
|
"93.184.216.34:443:www.example.com": {
|
|
"commonName": "www.example.com",
|
|
"issuer": "DigiCert TLS RSA SHA256 2020 CA1",
|
|
"notAfter": "2027-01-15T23:59:59Z",
|
|
"subjectAlternativeNames": ["www.example.com"],
|
|
"status": "ok",
|
|
"lastChecked": "2026-02-19T06:00:00Z"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
The `status` field for each per-nameserver entry and certificate entry
|
|
tracks reachability:
|
|
|
|
| Status | Meaning |
|
|
|-------------|-------------------------------------------------|
|
|
| `ok` | Query succeeded, records are current |
|
|
| `error` | Query failed (timeout, SERVFAIL, network error) |
|
|
| `nxdomain` | Authoritative NXDOMAIN response |
|
|
| `nodata` | Authoritative empty response (NODATA) |
|
|
|
|
---
|
|
|
|
## Building
|
|
|
|
```sh
|
|
make build # Build binary to bin/dnswatcher
|
|
make test # Run tests with race detector
|
|
make lint # Run golangci-lint
|
|
make fmt # Format code
|
|
make check # Run all checks (format, lint, test, build)
|
|
make clean # Remove build artifacts
|
|
```
|
|
|
|
### Build-Time Variables
|
|
|
|
Version and architecture are injected via `-ldflags`:
|
|
|
|
```sh
|
|
go build -ldflags "-X main.Version=$(git describe --tags --always) \
|
|
-X main.Buildarch=$(go env GOARCH)" ./cmd/dnswatcher
|
|
```
|
|
|
|
---
|
|
|
|
## Docker
|
|
|
|
```sh
|
|
docker build -t dnswatcher .
|
|
docker run -d \
|
|
-p 8080:8080 \
|
|
-v dnswatcher-data:/var/lib/dnswatcher \
|
|
-e DNSWATCHER_TARGETS=example.com,www.example.com \
|
|
-e DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-alerts \
|
|
dnswatcher
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring Lifecycle
|
|
|
|
1. **Startup**: Load state from disk. If no state file exists, start
|
|
with empty state (first check will establish baseline without
|
|
triggering change notifications).
|
|
2. **Initial check**: Immediately perform all DNS, port, and TLS checks
|
|
on startup.
|
|
3. **Periodic checks** (DNS always runs first):
|
|
- DNS checks: every `DNSWATCHER_DNS_INTERVAL` (default 1h). Also
|
|
re-run before every TLS check cycle to ensure fresh IPs.
|
|
- Port checks: every `DNSWATCHER_DNS_INTERVAL`, after DNS completes.
|
|
- TLS checks: every `DNSWATCHER_TLS_INTERVAL` (default 12h), after
|
|
DNS completes.
|
|
- Port and TLS checks always use freshly resolved IP addresses from
|
|
the DNS phase that immediately precedes them — never stale IPs
|
|
from a previous cycle.
|
|
4. **On change detection**: Send notifications to all configured
|
|
endpoints, update in-memory state, persist to disk.
|
|
5. **Shutdown**: Persist final state to disk, complete in-flight
|
|
notifications, stop gracefully.
|
|
|
|
---
|
|
|
|
## Planned Future Features (Post-1.0)
|
|
|
|
- **DNSSEC validation**: Validate the DNSSEC chain of trust during
|
|
iterative resolution and report DNSSEC failures as notifications.
|
|
|
|
---
|
|
|
|
## Project Structure
|
|
|
|
Follows the conventions defined in `REPO_POLICIES.md`, adapted from the
|
|
[upaas](https://git.eeqj.de/sneak/upaas) project template. Uses uber/fx
|
|
for dependency injection, go-chi for HTTP routing, slog for logging, and
|
|
Viper for configuration.
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
License has not yet been chosen for this project. Pending decision by the
|
|
author (MIT, GPL, or WTFPL).
|
|
|
|
## Author
|
|
|
|
[@sneak](https://sneak.berlin)
|