6 Commits

Author SHA1 Message Date
user
713a2b7332 config: fail fast when DNSWATCHER_TARGETS is empty
All checks were successful
check / check (push) Successful in 45s
When DNSWATCHER_TARGETS is empty or unset (the default), dnswatcher now
exits with a clear error message instead of silently starting with
nothing to monitor.

Added ErrNoTargets sentinel error returned from config.New when both
domains and hostnames lists are empty after target classification. This
causes the fx application to fail to start, preventing silent
misconfiguration.

Also extracted classifyAndValidateTargets and parseDurationOrDefault
helper functions to keep buildConfig within the funlen limit.

Closes #69
2026-03-01 16:22:27 -08:00
6ebc4ffa04 fix: use context.Background() for watcher goroutine lifetime (#63)
All checks were successful
check / check (push) Successful in 31s
## Summary

The `OnStart` hook previously derived the watcher's context from the fx startup context (`startCtx`) via `context.WithoutCancel()`. While `WithoutCancel` strips cancellation and deadline, using `context.Background()` makes the intent explicit: the watcher's monitoring loop must outlive the fx startup phase and is controlled solely by the `cancel` func called in `OnStop`.

## Changes

- Replace `context.WithCancel(context.WithoutCancel(startCtx))` with `context.WithCancel(context.Background())`
- Add explanatory comment documenting why the watcher context is not derived from the startup context
- Unused `startCtx` parameter changed to `_`

Closes #53

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #63
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 00:39:08 +01:00
b20e75459f fix: track multiple hostnames per IP:port in port state (#65)
All checks were successful
check / check (push) Successful in 34s
## Summary

Port state keys are `ip:port` with a single `hostname` field. When multiple hostnames resolve to the same IP (shared hosting, CDN), only one hostname was associated. This caused orphaned port state when that hostname removed the IP from DNS while the IP remained valid for other hostnames.

## Changes

### State (`internal/state/state.go`)
- `PortState.Hostname` (string) → `PortState.Hostnames` ([]string)
- Custom `UnmarshalJSON` for backward compatibility: reads old single `hostname` field and migrates to a single-element `hostnames` slice
- Added `DeletePortState` and `GetAllPortKeys` methods for cleanup

### Watcher (`internal/watcher/watcher.go`)
- Refactored `checkAllPorts` into three phases:
  1. Build IP:port → hostname associations from current DNS data
  2. Check each unique IP:port once with all associated hostnames
  3. Clean up stale port state entries with no hostname references
- Port change notifications now list all associated hostnames (`Hosts:` instead of `Host:`)
- Added `buildPortAssociations`, `parsePortKey`, and `cleanupStalePorts` helper functions

### README
- Updated state file format example: `hostname` → `hostnames` (array)
- Updated notification description to reflect multiple hostnames

## Backward Compatibility

Existing state files with the old single `hostname` string are handled gracefully via custom JSON unmarshaling — they are read as single-element `hostnames` slices.

Closes #55

Co-authored-by: clawbot <clawbot@noreply.eeqj.de>
Reviewed-on: #65
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 00:32:27 +01:00
ee14bd01ae fix: enforce DNS-first ordering for port and TLS checks (#64)
All checks were successful
check / check (push) Successful in 8s
## Summary

DNS checks now always complete before port or TLS checks begin, ensuring those checks use freshly resolved IP addresses instead of potentially stale ones from a previous cycle.

## Problem

Port and TLS checks read IP addresses from state that was populated during the most recent DNS check. If DNS changes between cycles, port/TLS checks may target stale IPs. In particular, when the TLS ticker fired (every 12h), it ran `runTLSChecks` without refreshing DNS first — meaning TLS checks could use IPs that were up to 12 hours old.

## Changes

- **Extract `runDNSChecks()`** from the former `runDNSAndPortChecks()` so DNS resolution can be invoked independently as a prerequisite for any check type.
- **TLS ticker now runs DNS first**: When the TLS ticker fires, DNS checks run before TLS checks, ensuring fresh IPs.
- **`RunOnce` uses explicit 3-phase ordering**: DNS → ports → TLS. Port checks must complete before TLS because TLS checks only target IPs where port 443 is open.
- **New test `TestDNSRunsBeforePortAndTLSChecks`**: Verifies that when DNS IPs change between cycles, port and TLS checks pick up the new IPs.
- **README updated**: Monitoring lifecycle section now documents the DNS-first ordering guarantee.

## Check ordering

| Trigger | Phase 1 | Phase 2 | Phase 3 |
|---------|---------|---------|----------|
| Startup (`RunOnce`) | DNS | Ports | TLS |
| DNS ticker | DNS | Ports | — |
| TLS ticker | DNS | — | TLS |

closes #58

Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #64
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 00:10:49 +01:00
2835c2dc43 REPO_POLICIES compliance audit (#40)
All checks were successful
check / check (push) Successful in 8s
Brings the repository into compliance with REPO_POLICIES standards.

Closes [issue #39](#39).

## Changes

### Added files
- **REPO_POLICIES.md** — fetched from `sneak/prompts` (last_modified: 2026-02-22)
- **.editorconfig** — fetched from `sneak/prompts`, enforces 4-space indents (tabs for Makefile)
- **.dockerignore** — standard Go exclusions (.git/, bin/, *.md, LICENSE, .editorconfig, .gitignore)

### Makefile updates
- Added `fmt-check` target (read-only gofmt check)
- Added `hooks` target (installs pre-commit hook running `make check`)
- Added `docker` target (runs `docker build .`)
- Added `-timeout 30s` to both `test` and `check` targets
- Updated `.PHONY` list with all new targets

### Removed files
- **CLAUDE.md** — superseded by REPO_POLICIES.md
- **CONVENTIONS.md** — superseded by REPO_POLICIES.md

### README updates
- First line now includes project name, purpose, category (daemon), and author per REPO_POLICIES format
- Updated CONVENTIONS.md reference to REPO_POLICIES.md
- Added **License** section (pending author choice)
- Added **Author** section: [@sneak](https://sneak.berlin)

### Intentionally skipped
- **LICENSE file** — not created; license choice (MIT, GPL, or WTFPL) requires sneak's input

## Verification
- `docker build .` passes (all checks green: fmt, lint, tests, build)
- No changes to `.golangci.yml`, test assertions, or linter config

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #40
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-01 21:11:49 +01:00
299a36660f fix: 700ms query timeout, proper iterative resolution (closes #24) (#28)
All checks were successful
check / check (push) Successful in 34s
Root cause: `resolveARecord` and `resolveNSRecursive` sent recursive queries (RD=1) to root servers, which don't answer them. This caused 5s timeouts × 2 retries × 3 servers = hanging tests.

Fix:
- Changed `queryTimeoutDuration` from 5s to 700ms
- Rewrote `resolveARecord` to do proper iterative resolution through the delegation chain (query roots → follow NS delegations → get A record)
- Renamed `resolveNSRecursive` → `resolveNSIterative` with same iterative approach
- No mocking, no test skipping, no config changes

`make check` passes: all 29 resolver tests pass with real DNS in ~10s.

Co-authored-by: clawbot <clawbot@git.eeqj.de>
Reviewed-on: #28
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-01 21:10:38 +01:00
7 changed files with 461 additions and 96 deletions

View File

@@ -110,8 +110,8 @@ includes:
- **NS recoveries**: Which nameserver recovered, which hostname/domain.
- **NS inconsistencies**: Which nameservers disagree, what each one
returned, which hostname affected.
- **Port changes**: Which IP:port, old state, new state, associated
hostname.
- **Port changes**: Which IP:port, old state, new state, all associated
hostnames.
- **TLS expiry warnings**: Which certificate, days remaining, CN,
issuer, associated hostname and IP.
- **TLS certificate changes**: Old and new CN/issuer/SANs, associated
@@ -290,12 +290,12 @@ not as a merged view, to enable inconsistency detection.
"ports": {
"93.184.216.34:80": {
"open": true,
"hostname": "www.example.com",
"hostnames": ["www.example.com"],
"lastChecked": "2026-02-19T12:00:00Z"
},
"93.184.216.34:443": {
"open": true,
"hostname": "www.example.com",
"hostnames": ["www.example.com"],
"lastChecked": "2026-02-19T12:00:00Z"
}
},
@@ -367,9 +367,15 @@ docker run -d \
triggering change notifications).
2. **Initial check**: Immediately perform all DNS, port, and TLS checks
on startup.
3. **Periodic checks**:
- DNS and port checks: every `DNSWATCHER_DNS_INTERVAL` (default 1h).
- TLS checks: every `DNSWATCHER_TLS_INTERVAL` (default 12h).
3. **Periodic checks** (DNS always runs first):
- DNS checks: every `DNSWATCHER_DNS_INTERVAL` (default 1h). Also
re-run before every TLS check cycle to ensure fresh IPs.
- Port checks: every `DNSWATCHER_DNS_INTERVAL`, after DNS completes.
- TLS checks: every `DNSWATCHER_TLS_INTERVAL` (default 12h), after
DNS completes.
- Port and TLS checks always use freshly resolved IP addresses from
the DNS phase that immediately precedes them — never stale IPs
from a previous cycle.
4. **On change detection**: Send notifications to all configured
endpoints, update in-memory state, persist to disk.
5. **Shutdown**: Persist final state to disk, complete in-flight

View File

@@ -15,6 +15,12 @@ import (
"sneak.berlin/go/dnswatcher/internal/logger"
)
// ErrNoTargets is returned when DNSWATCHER_TARGETS is empty or unset.
var ErrNoTargets = errors.New(
"no targets configured: set DNSWATCHER_TARGETS to a comma-separated " +
"list of DNS names to monitor",
)
// Default configuration values.
const (
defaultPort = 8080
@@ -118,25 +124,9 @@ func buildConfig(
}
}
dnsInterval, err := time.ParseDuration(
viper.GetString("DNS_INTERVAL"),
)
domains, hostnames, err := classifyAndValidateTargets()
if err != nil {
dnsInterval = defaultDNSInterval
}
tlsInterval, err := time.ParseDuration(
viper.GetString("TLS_INTERVAL"),
)
if err != nil {
tlsInterval = defaultTLSInterval
}
domains, hostnames, err := ClassifyTargets(
parseCSV(viper.GetString("TARGETS")),
)
if err != nil {
return nil, fmt.Errorf("invalid targets configuration: %w", err)
return nil, err
}
cfg := &Config{
@@ -148,8 +138,8 @@ func buildConfig(
SlackWebhook: viper.GetString("SLACK_WEBHOOK"),
MattermostWebhook: viper.GetString("MATTERMOST_WEBHOOK"),
NtfyTopic: viper.GetString("NTFY_TOPIC"),
DNSInterval: dnsInterval,
TLSInterval: tlsInterval,
DNSInterval: parseDurationOrDefault("DNS_INTERVAL", defaultDNSInterval),
TLSInterval: parseDurationOrDefault("TLS_INTERVAL", defaultTLSInterval),
TLSExpiryWarning: viper.GetInt("TLS_EXPIRY_WARNING"),
SentryDSN: viper.GetString("SENTRY_DSN"),
MaintenanceMode: viper.GetBool("MAINTENANCE_MODE"),
@@ -162,6 +152,32 @@ func buildConfig(
return cfg, nil
}
func classifyAndValidateTargets() ([]string, []string, error) {
domains, hostnames, err := ClassifyTargets(
parseCSV(viper.GetString("TARGETS")),
)
if err != nil {
return nil, nil, fmt.Errorf(
"invalid targets configuration: %w", err,
)
}
if len(domains) == 0 && len(hostnames) == 0 {
return nil, nil, ErrNoTargets
}
return domains, hostnames, nil
}
func parseDurationOrDefault(key string, fallback time.Duration) time.Duration {
d, err := time.ParseDuration(viper.GetString(key))
if err != nil {
return fallback
}
return d
}
func parseCSV(input string) []string {
if input == "" {
return nil

View File

@@ -0,0 +1,87 @@
package config_test
import (
"errors"
"testing"
"go.uber.org/fx"
"sneak.berlin/go/dnswatcher/internal/config"
"sneak.berlin/go/dnswatcher/internal/globals"
"sneak.berlin/go/dnswatcher/internal/logger"
)
func TestNewReturnsErrNoTargetsWhenEmpty(t *testing.T) {
// Cannot use t.Parallel() because t.Setenv modifies the process
// environment.
t.Setenv("DNSWATCHER_TARGETS", "")
t.Setenv("DNSWATCHER_DATA_DIR", t.TempDir())
var cfg *config.Config
app := fx.New(
fx.Provide(
func() *globals.Globals {
return &globals.Globals{
Appname: "dnswatcher-test-empty",
}
},
logger.New,
config.New,
),
fx.Populate(&cfg),
fx.NopLogger,
)
err := app.Err()
if err == nil {
t.Fatal(
"expected error when DNSWATCHER_TARGETS is empty, got nil",
)
}
if !errors.Is(err, config.ErrNoTargets) {
t.Errorf("expected ErrNoTargets, got: %v", err)
}
}
func TestNewSucceedsWithTargets(t *testing.T) {
// Cannot use t.Parallel() because t.Setenv modifies the process
// environment.
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DATA_DIR", t.TempDir())
// Prevent loading a local config file by changing to a temp dir.
t.Chdir(t.TempDir())
var cfg *config.Config
app := fx.New(
fx.Provide(
func() *globals.Globals {
return &globals.Globals{
Appname: "dnswatcher-test-ok",
}
},
logger.New,
config.New,
),
fx.Populate(&cfg),
fx.NopLogger,
)
err := app.Err()
if err != nil {
t.Fatalf(
"expected no error with valid targets, got: %v",
err,
)
}
if len(cfg.Domains) != 1 || cfg.Domains[0] != "example.com" {
t.Errorf(
"expected [example.com], got domains=%v",
cfg.Domains,
)
}
}

View File

@@ -4,7 +4,6 @@ import (
"context"
"errors"
"fmt"
"math/rand"
"net"
"sort"
"strings"
@@ -42,22 +41,6 @@ func rootServerList() []string {
}
}
const maxRootServers = 3
// randomRootServers returns a shuffled subset of root servers.
func randomRootServers() []string {
all := rootServerList()
rand.Shuffle(len(all), func(i, j int) {
all[i], all[j] = all[j], all[i]
})
if len(all) > maxRootServers {
return all[:maxRootServers]
}
return all
}
func checkCtx(ctx context.Context) error {
err := ctx.Err()
if err != nil {
@@ -244,7 +227,7 @@ func (r *Resolver) followDelegation(
authNS := extractNSSet(resp.Ns)
if len(authNS) == 0 {
return r.resolveNSRecursive(ctx, domain)
return r.resolveNSIterative(ctx, domain)
}
glue := extractGlue(resp.Extra)
@@ -308,60 +291,84 @@ func (r *Resolver) resolveNSIPs(
return ips
}
// resolveNSRecursive queries for NS records using recursive
// resolution as a fallback for intercepted environments.
func (r *Resolver) resolveNSRecursive(
// resolveNSIterative queries for NS records using iterative
// resolution as a fallback when followDelegation finds no
// authoritative answer in the delegation chain.
func (r *Resolver) resolveNSIterative(
ctx context.Context,
domain string,
) ([]string, error) {
domain = dns.Fqdn(domain)
msg := new(dns.Msg)
msg.SetQuestion(domain, dns.TypeNS)
msg.RecursionDesired = true
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
for _, ip := range randomRootServers() {
domain = dns.Fqdn(domain)
servers := rootServerList()
for range maxDelegation {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
addr := net.JoinHostPort(ip, "53")
resp, _, err := r.client.ExchangeContext(ctx, msg, addr)
resp, err := r.queryServers(
ctx, servers, domain, dns.TypeNS,
)
if err != nil {
continue
return nil, err
}
nsNames := extractNSSet(resp.Answer)
if len(nsNames) > 0 {
return nsNames, nil
}
// Follow delegation.
authNS := extractNSSet(resp.Ns)
if len(authNS) == 0 {
break
}
glue := extractGlue(resp.Extra)
nextServers := glueIPs(authNS, glue)
if len(nextServers) == 0 {
break
}
servers = nextServers
}
return nil, ErrNoNameservers
}
// resolveARecord resolves a hostname to IPv4 addresses.
// resolveARecord resolves a hostname to IPv4 addresses using
// iterative resolution through the delegation chain.
func (r *Resolver) resolveARecord(
ctx context.Context,
hostname string,
) ([]string, error) {
hostname = dns.Fqdn(hostname)
msg := new(dns.Msg)
msg.SetQuestion(hostname, dns.TypeA)
msg.RecursionDesired = true
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
for _, ip := range randomRootServers() {
hostname = dns.Fqdn(hostname)
servers := rootServerList()
for range maxDelegation {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
addr := net.JoinHostPort(ip, "53")
resp, _, err := r.client.ExchangeContext(ctx, msg, addr)
resp, err := r.queryServers(
ctx, servers, hostname, dns.TypeA,
)
if err != nil {
continue
return nil, fmt.Errorf(
"resolving %s: %w", hostname, err,
)
}
// Check for A records in the answer section.
var ips []string
for _, rr := range resp.Answer {
@@ -373,6 +380,24 @@ func (r *Resolver) resolveARecord(
if len(ips) > 0 {
return ips, nil
}
// Follow delegation if present.
authNS := extractNSSet(resp.Ns)
if len(authNS) == 0 {
break
}
glue := extractGlue(resp.Extra)
nextServers := glueIPs(authNS, glue)
if len(nextServers) == 0 {
// Resolve NS IPs iteratively — but guard
// against infinite recursion by using only
// already-resolved servers.
break
}
servers = nextServers
}
return nil, fmt.Errorf(
@@ -402,7 +427,7 @@ func (r *Resolver) FindAuthoritativeNameservers(
candidate := strings.Join(labels[i:], ".") + "."
nsNames, err := r.followDelegation(
ctx, candidate, randomRootServers(),
ctx, candidate, rootServerList(),
)
if err == nil && len(nsNames) > 0 {
sort.Strings(nsNames)

View File

@@ -57,10 +57,49 @@ type HostnameState struct {
// PortState holds the monitoring state for a port.
type PortState struct {
Open bool `json:"open"`
Hostname string `json:"hostname"`
Hostnames []string `json:"hostnames"`
LastChecked time.Time `json:"lastChecked"`
}
// UnmarshalJSON implements custom unmarshaling to handle both
// the old single-hostname format and the new multi-hostname
// format for backward compatibility with existing state files.
func (ps *PortState) UnmarshalJSON(data []byte) error {
// Use an alias to prevent infinite recursion.
type portStateAlias struct {
Open bool `json:"open"`
Hostnames []string `json:"hostnames"`
LastChecked time.Time `json:"lastChecked"`
}
var alias portStateAlias
err := json.Unmarshal(data, &alias)
if err != nil {
return fmt.Errorf("unmarshaling port state: %w", err)
}
ps.Open = alias.Open
ps.Hostnames = alias.Hostnames
ps.LastChecked = alias.LastChecked
// If Hostnames is empty, try reading the old single-hostname
// format for backward compatibility.
if len(ps.Hostnames) == 0 {
var old struct {
Hostname string `json:"hostname"`
}
// Best-effort: ignore errors since the main unmarshal
// already succeeded.
if json.Unmarshal(data, &old) == nil && old.Hostname != "" {
ps.Hostnames = []string{old.Hostname}
}
}
return nil
}
// CertificateState holds TLS certificate monitoring state.
type CertificateState struct {
CommonName string `json:"commonName"`
@@ -263,6 +302,27 @@ func (s *State) GetPortState(key string) (*PortState, bool) {
return ps, ok
}
// DeletePortState removes a port state entry.
func (s *State) DeletePortState(key string) {
s.mu.Lock()
defer s.mu.Unlock()
delete(s.snapshot.Ports, key)
}
// GetAllPortKeys returns all port state keys.
func (s *State) GetAllPortKeys() []string {
s.mu.RLock()
defer s.mu.RUnlock()
keys := make([]string, 0, len(s.snapshot.Ports))
for k := range s.snapshot.Ports {
keys = append(keys, k)
}
return keys
}
// SetCertificateState updates the state for a certificate.
func (s *State) SetCertificateState(
key string,

View File

@@ -72,13 +72,15 @@ func New(
}
lifecycle.Append(fx.Hook{
OnStart: func(startCtx context.Context) error {
ctx, cancel := context.WithCancel(
context.WithoutCancel(startCtx),
)
OnStart: func(_ context.Context) error {
// Use context.Background() — the fx startup context
// expires after startup completes, so deriving from it
// would cancel the watcher immediately. The watcher's
// lifetime is controlled by w.cancel in OnStop.
ctx, cancel := context.WithCancel(context.Background())
w.cancel = cancel
go w.Run(ctx)
go w.Run(ctx) //nolint:contextcheck // intentionally not derived from startCtx
return nil
},
@@ -141,9 +143,16 @@ func (w *Watcher) Run(ctx context.Context) {
return
case <-dnsTicker.C:
w.runDNSAndPortChecks(ctx)
w.runDNSChecks(ctx)
w.checkAllPorts(ctx)
w.saveState()
case <-tlsTicker.C:
// Run DNS first so TLS checks use freshly
// resolved IP addresses, not stale ones from
// a previous cycle.
w.runDNSChecks(ctx)
w.runTLSChecks(ctx)
w.saveState()
}
@@ -151,10 +160,26 @@ func (w *Watcher) Run(ctx context.Context) {
}
// RunOnce performs a single complete monitoring cycle.
// DNS checks run first so that port and TLS checks use
// freshly resolved IP addresses. Port checks run before
// TLS because TLS checks only target IPs with an open
// port 443.
func (w *Watcher) RunOnce(ctx context.Context) {
w.detectFirstRun()
w.runDNSAndPortChecks(ctx)
// Phase 1: DNS resolution must complete first so that
// subsequent checks use fresh IP addresses.
w.runDNSChecks(ctx)
// Phase 2: Port checks populate port state that TLS
// checks depend on (TLS only targets IPs where port
// 443 is open).
w.checkAllPorts(ctx)
// Phase 3: TLS checks use fresh DNS IPs and current
// port state.
w.runTLSChecks(ctx)
w.saveState()
w.firstRun = false
}
@@ -171,7 +196,11 @@ func (w *Watcher) detectFirstRun() {
}
}
func (w *Watcher) runDNSAndPortChecks(ctx context.Context) {
// runDNSChecks performs DNS resolution for all configured domains
// and hostnames, updating state with freshly resolved records.
// This must complete before port or TLS checks run so those
// checks operate on current IP addresses.
func (w *Watcher) runDNSChecks(ctx context.Context) {
for _, domain := range w.config.Domains {
w.checkDomain(ctx, domain)
}
@@ -179,8 +208,6 @@ func (w *Watcher) runDNSAndPortChecks(ctx context.Context) {
for _, hostname := range w.config.Hostnames {
w.checkHostname(ctx, hostname)
}
w.checkAllPorts(ctx)
}
func (w *Watcher) checkDomain(
@@ -448,24 +475,94 @@ func (w *Watcher) detectInconsistencies(
}
func (w *Watcher) checkAllPorts(ctx context.Context) {
for _, hostname := range w.config.Hostnames {
w.checkPortsForHostname(ctx, hostname)
// Phase 1: Build current IP:port → hostname associations
// from fresh DNS data.
associations := w.buildPortAssociations()
// Phase 2: Check each unique IP:port and update state
// with the full set of associated hostnames.
for key, hostnames := range associations {
ip, port := parsePortKey(key)
if port == 0 {
continue
}
w.checkSinglePort(ctx, ip, port, hostnames)
}
for _, domain := range w.config.Domains {
w.checkPortsForHostname(ctx, domain)
}
// Phase 3: Remove port state entries that no longer have
// any hostname referencing them.
w.cleanupStalePorts(associations)
}
func (w *Watcher) checkPortsForHostname(
ctx context.Context,
hostname string,
) {
ips := w.collectIPs(hostname)
// buildPortAssociations constructs a map from IP:port keys to
// the sorted set of hostnames currently resolving to that IP.
func (w *Watcher) buildPortAssociations() map[string][]string {
assoc := make(map[string]map[string]bool)
for _, ip := range ips {
for _, port := range monitoredPorts {
w.checkSinglePort(ctx, ip, port, hostname)
allNames := make(
[]string, 0,
len(w.config.Hostnames)+len(w.config.Domains),
)
allNames = append(allNames, w.config.Hostnames...)
allNames = append(allNames, w.config.Domains...)
for _, name := range allNames {
ips := w.collectIPs(name)
for _, ip := range ips {
for _, port := range monitoredPorts {
key := fmt.Sprintf("%s:%d", ip, port)
if assoc[key] == nil {
assoc[key] = make(map[string]bool)
}
assoc[key][name] = true
}
}
}
result := make(map[string][]string, len(assoc))
for key, set := range assoc {
hostnames := make([]string, 0, len(set))
for h := range set {
hostnames = append(hostnames, h)
}
sort.Strings(hostnames)
result[key] = hostnames
}
return result
}
// parsePortKey splits an "ip:port" key into its components.
func parsePortKey(key string) (string, int) {
lastColon := strings.LastIndex(key, ":")
if lastColon < 0 {
return key, 0
}
ip := key[:lastColon]
var p int
_, err := fmt.Sscanf(key[lastColon+1:], "%d", &p)
if err != nil {
return ip, 0
}
return ip, p
}
// cleanupStalePorts removes port state entries that are no
// longer referenced by any hostname in the current DNS data.
func (w *Watcher) cleanupStalePorts(
currentAssociations map[string][]string,
) {
for _, key := range w.state.GetAllPortKeys() {
if _, exists := currentAssociations[key]; !exists {
w.state.DeletePortState(key)
}
}
}
@@ -502,7 +599,7 @@ func (w *Watcher) checkSinglePort(
ctx context.Context,
ip string,
port int,
hostname string,
hostnames []string,
) {
result, err := w.portCheck.CheckPort(ctx, ip, port)
if err != nil {
@@ -527,8 +624,8 @@ func (w *Watcher) checkSinglePort(
}
msg := fmt.Sprintf(
"Host: %s\nAddress: %s\nPort now %s",
hostname, key, stateStr,
"Hosts: %s\nAddress: %s\nPort now %s",
strings.Join(hostnames, ", "), key, stateStr,
)
w.notify.SendNotification(
@@ -541,7 +638,7 @@ func (w *Watcher) checkSinglePort(
w.state.SetPortState(key, &state.PortState{
Open: result.Open,
Hostname: hostname,
Hostnames: hostnames,
LastChecked: now,
})
}

View File

@@ -682,6 +682,80 @@ func TestGracefulShutdown(t *testing.T) {
}
}
func setupHostnameIP(
deps *testDeps,
hostname, ip string,
) {
deps.resolver.allRecords[hostname] = map[string]map[string][]string{
"ns1.example.com.": {"A": {ip}},
}
deps.portChecker.results[ip+":80"] = true
deps.portChecker.results[ip+":443"] = true
deps.tlsChecker.certs[ip+":"+hostname] = &tlscheck.CertificateInfo{
CommonName: hostname,
Issuer: "DigiCert",
NotAfter: time.Now().Add(90 * 24 * time.Hour),
SubjectAlternativeNames: []string{hostname},
}
}
func updateHostnameIP(deps *testDeps, hostname, ip string) {
deps.resolver.mu.Lock()
deps.resolver.allRecords[hostname] = map[string]map[string][]string{
"ns1.example.com.": {"A": {ip}},
}
deps.resolver.mu.Unlock()
deps.portChecker.mu.Lock()
deps.portChecker.results[ip+":80"] = true
deps.portChecker.results[ip+":443"] = true
deps.portChecker.mu.Unlock()
deps.tlsChecker.mu.Lock()
deps.tlsChecker.certs[ip+":"+hostname] = &tlscheck.CertificateInfo{
CommonName: hostname,
Issuer: "DigiCert",
NotAfter: time.Now().Add(90 * 24 * time.Hour),
SubjectAlternativeNames: []string{hostname},
}
deps.tlsChecker.mu.Unlock()
}
func TestDNSRunsBeforePortAndTLSChecks(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Hostnames = []string{"www.example.com"}
w, deps := newTestWatcher(t, cfg)
setupHostnameIP(deps, "www.example.com", "10.0.0.1")
ctx := t.Context()
w.RunOnce(ctx)
snap := deps.state.GetSnapshot()
if _, ok := snap.Ports["10.0.0.1:80"]; !ok {
t.Fatal("expected port state for 10.0.0.1:80")
}
// DNS changes to a new IP; port and TLS must pick it up.
updateHostnameIP(deps, "www.example.com", "10.0.0.2")
w.RunOnce(ctx)
snap = deps.state.GetSnapshot()
if _, ok := snap.Ports["10.0.0.2:80"]; !ok {
t.Error("port check used stale DNS: missing 10.0.0.2:80")
}
certKey := "10.0.0.2:443:www.example.com"
if _, ok := snap.Certificates[certKey]; !ok {
t.Error("TLS check used stale DNS: missing " + certKey)
}
}
func TestNSFailureAndRecovery(t *testing.T) {
t.Parallel()