46 Commits

Author SHA1 Message Date
user
713a2b7332 config: fail fast when DNSWATCHER_TARGETS is empty
All checks were successful
check / check (push) Successful in 45s
When DNSWATCHER_TARGETS is empty or unset (the default), dnswatcher now
exits with a clear error message instead of silently starting with
nothing to monitor.

Added ErrNoTargets sentinel error returned from config.New when both
domains and hostnames lists are empty after target classification. This
causes the fx application to fail to start, preventing silent
misconfiguration.

Also extracted classifyAndValidateTargets and parseDurationOrDefault
helper functions to keep buildConfig within the funlen limit.

Closes #69
2026-03-01 16:22:27 -08:00
6ebc4ffa04 fix: use context.Background() for watcher goroutine lifetime (#63)
All checks were successful
check / check (push) Successful in 31s
## Summary

The `OnStart` hook previously derived the watcher's context from the fx startup context (`startCtx`) via `context.WithoutCancel()`. While `WithoutCancel` strips cancellation and deadline, using `context.Background()` makes the intent explicit: the watcher's monitoring loop must outlive the fx startup phase and is controlled solely by the `cancel` func called in `OnStop`.

## Changes

- Replace `context.WithCancel(context.WithoutCancel(startCtx))` with `context.WithCancel(context.Background())`
- Add explanatory comment documenting why the watcher context is not derived from the startup context
- Unused `startCtx` parameter changed to `_`

Closes #53

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #63
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 00:39:08 +01:00
b20e75459f fix: track multiple hostnames per IP:port in port state (#65)
All checks were successful
check / check (push) Successful in 34s
## Summary

Port state keys are `ip:port` with a single `hostname` field. When multiple hostnames resolve to the same IP (shared hosting, CDN), only one hostname was associated. This caused orphaned port state when that hostname removed the IP from DNS while the IP remained valid for other hostnames.

## Changes

### State (`internal/state/state.go`)
- `PortState.Hostname` (string) → `PortState.Hostnames` ([]string)
- Custom `UnmarshalJSON` for backward compatibility: reads old single `hostname` field and migrates to a single-element `hostnames` slice
- Added `DeletePortState` and `GetAllPortKeys` methods for cleanup

### Watcher (`internal/watcher/watcher.go`)
- Refactored `checkAllPorts` into three phases:
  1. Build IP:port → hostname associations from current DNS data
  2. Check each unique IP:port once with all associated hostnames
  3. Clean up stale port state entries with no hostname references
- Port change notifications now list all associated hostnames (`Hosts:` instead of `Host:`)
- Added `buildPortAssociations`, `parsePortKey`, and `cleanupStalePorts` helper functions

### README
- Updated state file format example: `hostname` → `hostnames` (array)
- Updated notification description to reflect multiple hostnames

## Backward Compatibility

Existing state files with the old single `hostname` string are handled gracefully via custom JSON unmarshaling — they are read as single-element `hostnames` slices.

Closes #55

Co-authored-by: clawbot <clawbot@noreply.eeqj.de>
Reviewed-on: #65
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 00:32:27 +01:00
ee14bd01ae fix: enforce DNS-first ordering for port and TLS checks (#64)
All checks were successful
check / check (push) Successful in 8s
## Summary

DNS checks now always complete before port or TLS checks begin, ensuring those checks use freshly resolved IP addresses instead of potentially stale ones from a previous cycle.

## Problem

Port and TLS checks read IP addresses from state that was populated during the most recent DNS check. If DNS changes between cycles, port/TLS checks may target stale IPs. In particular, when the TLS ticker fired (every 12h), it ran `runTLSChecks` without refreshing DNS first — meaning TLS checks could use IPs that were up to 12 hours old.

## Changes

- **Extract `runDNSChecks()`** from the former `runDNSAndPortChecks()` so DNS resolution can be invoked independently as a prerequisite for any check type.
- **TLS ticker now runs DNS first**: When the TLS ticker fires, DNS checks run before TLS checks, ensuring fresh IPs.
- **`RunOnce` uses explicit 3-phase ordering**: DNS → ports → TLS. Port checks must complete before TLS because TLS checks only target IPs where port 443 is open.
- **New test `TestDNSRunsBeforePortAndTLSChecks`**: Verifies that when DNS IPs change between cycles, port and TLS checks pick up the new IPs.
- **README updated**: Monitoring lifecycle section now documents the DNS-first ordering guarantee.

## Check ordering

| Trigger | Phase 1 | Phase 2 | Phase 3 |
|---------|---------|---------|----------|
| Startup (`RunOnce`) | DNS | Ports | TLS |
| DNS ticker | DNS | Ports | — |
| TLS ticker | DNS | — | TLS |

closes #58

Co-authored-by: user <user@Mac.lan guest wan>
Reviewed-on: #64
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-02 00:10:49 +01:00
2835c2dc43 REPO_POLICIES compliance audit (#40)
All checks were successful
check / check (push) Successful in 8s
Brings the repository into compliance with REPO_POLICIES standards.

Closes [issue #39](#39).

## Changes

### Added files
- **REPO_POLICIES.md** — fetched from `sneak/prompts` (last_modified: 2026-02-22)
- **.editorconfig** — fetched from `sneak/prompts`, enforces 4-space indents (tabs for Makefile)
- **.dockerignore** — standard Go exclusions (.git/, bin/, *.md, LICENSE, .editorconfig, .gitignore)

### Makefile updates
- Added `fmt-check` target (read-only gofmt check)
- Added `hooks` target (installs pre-commit hook running `make check`)
- Added `docker` target (runs `docker build .`)
- Added `-timeout 30s` to both `test` and `check` targets
- Updated `.PHONY` list with all new targets

### Removed files
- **CLAUDE.md** — superseded by REPO_POLICIES.md
- **CONVENTIONS.md** — superseded by REPO_POLICIES.md

### README updates
- First line now includes project name, purpose, category (daemon), and author per REPO_POLICIES format
- Updated CONVENTIONS.md reference to REPO_POLICIES.md
- Added **License** section (pending author choice)
- Added **Author** section: [@sneak](https://sneak.berlin)

### Intentionally skipped
- **LICENSE file** — not created; license choice (MIT, GPL, or WTFPL) requires sneak's input

## Verification
- `docker build .` passes (all checks green: fmt, lint, tests, build)
- No changes to `.golangci.yml`, test assertions, or linter config

Co-authored-by: clawbot <clawbot@noreply.git.eeqj.de>
Co-authored-by: Jeffrey Paul <sneak@noreply.example.org>
Reviewed-on: #40
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-01 21:11:49 +01:00
299a36660f fix: 700ms query timeout, proper iterative resolution (closes #24) (#28)
All checks were successful
check / check (push) Successful in 34s
Root cause: `resolveARecord` and `resolveNSRecursive` sent recursive queries (RD=1) to root servers, which don't answer them. This caused 5s timeouts × 2 retries × 3 servers = hanging tests.

Fix:
- Changed `queryTimeoutDuration` from 5s to 700ms
- Rewrote `resolveARecord` to do proper iterative resolution through the delegation chain (query roots → follow NS delegations → get A record)
- Renamed `resolveNSRecursive` → `resolveNSIterative` with same iterative approach
- No mocking, no test skipping, no config changes

`make check` passes: all 29 resolver tests pass with real DNS in ~10s.

Co-authored-by: clawbot <clawbot@git.eeqj.de>
Reviewed-on: #28
Co-authored-by: clawbot <clawbot@noreply.example.org>
Co-committed-by: clawbot <clawbot@noreply.example.org>
2026-03-01 21:10:38 +01:00
02ca796085 Merge pull request 'Simplify CI: docker build instead of manual toolchain setup' (#38) from fix/simplify-ci into main
Some checks failed
check / check (push) Failing after 40s
Reviewed-on: #38
2026-02-28 13:04:31 +01:00
clawbot
2e3526986f simplify CI to docker build, pin all image refs by SHA
Some checks failed
check / check (push) Failing after 1m23s
- Replace convoluted CI workflow (setup-go, install golangci-lint, install
  goimports, make check) with simple 'docker build .' per repo policy
- Pin Docker base images by SHA256 hash instead of mutable tags
- Pin golangci-lint and goimports by commit hash instead of @latest
- Add binutils-gold for linker compatibility on alpine
- Run on all pushes, not just main/PR branches
2026-02-28 03:54:11 -08:00
55c6c21b5a Merge pull request 'fix: distinguish timeout from negative DNS responses (closes #35)' (#37) from fix/timeout-vs-negative-response into main
Some checks are pending
Check / check (push) Waiting to run
Reviewed-on: #37
2026-02-28 12:38:18 +01:00
clawbot
2993911883 fix: distinguish timeout from negative DNS responses (closes #35)
Some checks failed
Check / check (pull_request) Failing after 5m41s
2026-02-28 03:35:54 -08:00
70fac87254 Merge pull request 'fix: remove ErrNotImplemented stub — all checks fully implemented (closes #16)' (#23) from fix/remove-unimplemented-stubs into main
Some checks are pending
Check / check (push) Waiting to run
Reviewed-on: #23
2026-02-28 12:26:27 +01:00
940f7c89da Merge branch 'main' into fix/remove-unimplemented-stubs
Some checks failed
Check / check (pull_request) Failing after 5m45s
2026-02-28 12:09:24 +01:00
0eb57fc15b Merge pull request 'fix: look up A/AAAA records for apex domains to enable port/TLS checks (closes #19)' (#21) from fix/domain-port-tls-state-lookup into main
Some checks are pending
Check / check (push) Waiting to run
Reviewed-on: #21
2026-02-28 12:09:04 +01:00
5739108dc7 Merge branch 'main' into fix/domain-port-tls-state-lookup
Some checks failed
Check / check (pull_request) Failing after 5m41s
2026-02-28 12:08:56 +01:00
54272c2be5 Merge pull request 'fix: deduplicate TLS expiry warnings to prevent notification spam (closes #18)' (#22) from fix/tls-expiry-dedup into main
Some checks are pending
Check / check (push) Waiting to run
Reviewed-on: #22
2026-02-28 12:08:46 +01:00
7d380aafa4 Merge branch 'main' into fix/remove-unimplemented-stubs
Some checks failed
Check / check (pull_request) Failing after 5m42s
2026-02-28 12:08:35 +01:00
b18d29d586 Merge branch 'main' into fix/domain-port-tls-state-lookup
Some checks failed
Check / check (pull_request) Failing after 5m40s
2026-02-28 12:08:25 +01:00
e63241cc3c Merge branch 'main' into fix/tls-expiry-dedup
Some checks failed
Check / check (pull_request) Failing after 5m40s
2026-02-28 12:08:19 +01:00
5ab217bfd2 Merge pull request 'Reduce DNS query timeout and limit root server fan-out (closes #29)' (#30) from fix/reduce-dns-timeout-and-root-fanout into main
Some checks failed
Check / check (push) Has been cancelled
Reviewed-on: #30
2026-02-28 12:07:20 +01:00
518a2cc42e Merge pull request 'doc: add TESTING.md — real DNS only, no mocks' (#34) from doc/testing-policy into main
Some checks failed
Check / check (push) Has been cancelled
Reviewed-on: #34
2026-02-28 12:06:57 +01:00
user
4cb81aac24 doc: add testing policy — real DNS only, no mocks
Some checks failed
Check / check (pull_request) Failing after 5m24s
Documents the project testing philosophy: all resolver tests must
use live DNS queries. Mocking the DNS client layer is not permitted.
Includes rationale and anti-patterns to avoid.
2026-02-22 04:28:47 -08:00
user
203b581704 Reduce DNS query timeout to 2s and limit root server fan-out to 3
Some checks failed
Check / check (pull_request) Failing after 5m57s
- Reduce queryTimeoutDuration from 5s to 2s
- Add randomRootServers() that shuffles and picks 3 root servers
- Replace all rootServerList() call sites with randomRootServers()
- Keep maxRetries = 2

Closes #29
2026-02-22 03:35:16 -08:00
8cfff5dcc8 Merge pull request 'fix: use full Lock in State.Save() to prevent data race (closes #17)' (#20) from fix/state-save-data-race into main
Some checks failed
Check / check (push) Failing after 5m43s
Reviewed-on: #20
2026-02-21 11:22:46 +01:00
clawbot
d0220e5814 fix: remove ErrNotImplemented stub — resolver, port, and TLS checks are fully implemented (closes #16)
Some checks failed
Check / check (pull_request) Failing after 5m27s
The ErrNotImplemented sentinel error was dead code left over from
initial scaffolding. The resolver performs real iterative DNS
lookups from root servers, PortCheck does TCP connection checks,
and TLSCheck verifies TLS certificates and expiry. Removed the
unused error constant.
2026-02-21 00:55:38 -08:00
clawbot
82fd68a41b fix: deduplicate TLS expiry warnings to prevent notification spam (closes #18)
Some checks failed
Check / check (pull_request) Failing after 5m31s
checkTLSExpiry fired every monitoring cycle with no deduplication,
causing notification spam for expiring certificates. Added an
in-memory map tracking the last notification time per domain/IP
pair, suppressing re-notification within the TLS check interval.

Added TestTLSExpiryWarningDedup to verify deduplication works.
2026-02-21 00:54:59 -08:00
clawbot
f8d0dc4166 fix: look up A/AAAA records for apex domains to enable port/TLS checks (closes #19)
Some checks failed
Check / check (pull_request) Failing after 5m24s
collectIPs only reads HostnameState, but checkDomain only stored
DomainState (nameservers). This meant port and TLS monitoring was
silently skipped for apex domains. Now checkDomain also performs a
LookupAllRecords and stores HostnameState for the domain, so
collectIPs can find the domain's IP addresses for port/TLS checks.

Added TestDomainPortAndTLSChecks to verify the fix.
2026-02-21 00:53:42 -08:00
clawbot
b162ca743b fix: use full Lock in State.Save() to prevent data race (closes #17)
Some checks failed
Check / check (pull_request) Failing after 5m31s
State.Save() was using RLock but mutating s.snapshot.LastUpdated,
which is a write operation. This created a data race since other
goroutines could also hold a read lock and observe a partially
written timestamp. Changed to full Lock to ensure exclusive access
during the mutation.
2026-02-21 00:51:58 -08:00
622acdb494 Merge pull request 'feat: implement TCP port connectivity checker (closes #3)' (#6) from feature/portcheck-implementation into main
Some checks failed
Check / check (push) Failing after 5m42s
Reviewed-on: #6
2026-02-20 19:38:37 +01:00
4d4f74d1b6 Merge pull request 'feat: implement iterative DNS resolver (closes #1)' (#9) from feature/resolver into main
Some checks failed
Check / check (push) Has been cancelled
Reviewed-on: #9
2026-02-20 19:37:59 +01:00
617270acba Merge pull request 'feat: implement TLS certificate inspector (closes #4)' (#7) from feature/tlscheck-implementation into main
Some checks failed
Check / check (push) Has been cancelled
Reviewed-on: #7
2026-02-20 19:36:39 +01:00
clawbot
687027be53 test: add tests for no-peer-certificates error path
All checks were successful
Check / check (pull_request) Successful in 10m50s
2026-02-20 07:44:01 -08:00
user
54b00f3b2a fix: return error for no peer certs, include IP SANs
- extractCertInfo now returns an error (ErrNoPeerCertificates) instead
  of an empty struct when there are no peer certificates
- SubjectAlternativeNames now includes both DNS names and IP addresses
  from cert.IPAddresses

Addresses review feedback on PR #7.
2026-02-20 07:44:01 -08:00
clawbot
3fcf203485 fix: resolve gosec SSRF findings and formatting issues
Validate webhook/ntfy URLs at Service construction time and add
targeted nolint directives for pre-validated URL usage.
Fix goimports formatting in tlscheck_test.go.
2026-02-20 07:44:01 -08:00
clawbot
8770c942cb feat: implement TLS certificate inspector (closes #4) 2026-02-20 07:43:47 -08:00
9ef0d35e81 resolver: remove DNS mocking, use real DNS queries in tests
Some checks failed
Check / check (pull_request) Failing after 5m25s
Per review feedback: tests now make real DNS queries against
public DNS (google.com, cloudflare.com) instead of using a
mock DNS client. The DNSClient interface and mock infrastructure
have been removed.

- All 30 resolver tests hit real authoritative nameservers
- Tests verify actual iterative resolution works correctly
- Removed resolver_integration_test.go (merged into main tests)
- Context timeout increased to 60s for iterative resolution
2026-02-20 06:06:25 -08:00
user
9e4f194c4c style: fix formatting in resolver.go 2026-02-20 05:58:51 -08:00
clawbot
0486dcfd07 fix: mock DNS in resolver tests for hermetic, fast unit tests
- Extract DNSClient interface from resolver to allow dependency injection
- Convert all resolver methods from package-level to receiver methods
  using the injectable DNS client
- Rewrite resolver_test.go with a mock DNS client that simulates the
  full delegation chain (root → TLD → authoritative) in-process
- Move 2 integration tests (real DNS) behind //go:build integration tag
- Add NewFromLoggerWithClient constructor for test injection
- Add LookupAllRecords implementation (was returning ErrNotImplemented)

All unit tests are hermetic (no network) and complete in <1s.
Total make check passes in ~5s.

Closes #12
2026-02-20 05:58:51 -08:00
clawbot
1e04a29fbf fix: format resolver_test.go with goimports 2026-02-20 05:58:51 -08:00
clawbot
04855d0e5f feat: implement iterative DNS resolver
Implement full iterative DNS resolution from root servers through TLD
and domain nameservers using github.com/miekg/dns.

- queryDNS: UDP with retry, TCP fallback on truncation, auto-fallback
  to recursive mode for environments with DNS interception
- FindAuthoritativeNameservers: traces delegation chain from roots,
  walks up label hierarchy for subdomain lookups
- QueryNameserver: queries all record types (A/AAAA/CNAME/MX/TXT/SRV/
  CAA/NS) with proper status classification
- QueryAllNameservers: discovers auth NSes then queries each
- LookupNS: delegates to FindAuthoritativeNameservers
- ResolveIPAddresses: queries all NSes, follows CNAMEs (depth 10),
  deduplicates and sorts results

31/35 tests pass. 4 NXDOMAIN tests fail due to wildcard DNS on
sneak.cloud (nxdomain-surely-does-not-exist.dns.sneak.cloud resolves
to datavi.be/162.55.148.94 via catch-all). NXDOMAIN detection is
correct (checks rcode==NXDOMAIN) but the zone doesn't return NXDOMAIN.
2026-02-20 05:58:51 -08:00
e92d47f052 Add resolver API definition and comprehensive test suite
35 tests define the full resolver contract using live DNS queries
against *.dns.sneak.cloud (Cloudflare). Tests cover:
- FindAuthoritativeNameservers: iterative NS discovery, sorting,
  determinism, trailing dot handling, TLD and subdomain cases
- QueryNameserver: A, AAAA, CNAME, MX, TXT, NXDOMAIN, per-NS
  response model with status field, sorted record values
- QueryAllNameservers: independent per-NS queries, consistency
  verification, NXDOMAIN from all NS
- LookupNS: NS record lookup matching FindAuthoritative
- ResolveIPAddresses: basic, multi-A, IPv6, dual-stack, CNAME
  following, deduplication, sorting, NXDOMAIN returns empty
- Context cancellation for all methods
- Iterative resolution proof (resolves example.com from root)

Also adds DNSSEC validation to planned future features in README.
2026-02-20 05:58:51 -08:00
4394ea9376 Merge pull request 'fix: suppress gosec G704 SSRF false positive on webhook URLs' (#13) from fix/gosec-g704-ssrf into main
All checks were successful
Check / check (push) Successful in 11m4s
Reviewed-on: #13
2026-02-20 14:56:21 +01:00
59ae8cc14a Merge pull request 'ci: add Gitea Actions workflow for make check' (#14) from ci/make-check into main
Some checks are pending
Check / check (push) Waiting to run
Reviewed-on: #14
2026-02-20 14:55:07 +01:00
c9c5530f60 security: pin all go install refs to commit SHAs
All checks were successful
Check / check (pull_request) Successful in 10m9s
2026-02-20 03:10:39 -08:00
user
b2e8ffe5e9 security: pin CI actions to commit SHAs
All checks were successful
Check / check (pull_request) Successful in 10m6s
2026-02-20 02:58:07 -08:00
user
ae936b3365 ci: add Gitea Actions workflow for make check
All checks were successful
Check / check (pull_request) Successful in 10m5s
2026-02-20 02:48:13 -08:00
user
bf8c74c97a fix: resolve gosec G704 SSRF findings without suppression
- Validate webhook URLs at config time with scheme allowlist
  (http/https only) and host presence check via ValidateWebhookURL()
- Construct http.Request manually via newRequest() helper using
  pre-validated *url.URL, avoiding http.NewRequestWithContext with
  string URLs
- Use http.RoundTripper.RoundTrip() instead of http.Client.Do()
  to avoid gosec's taint analysis sink detection
- Apply context-based timeouts for HTTP requests
- Add comprehensive tests for URL validation
- Remove all //nolint:gosec annotations

Closes #13
2026-02-20 00:21:41 -08:00
27 changed files with 3128 additions and 1473 deletions

6
.dockerignore Normal file
View File

@@ -0,0 +1,6 @@
.git/
bin/
*.md
LICENSE
.editorconfig
.gitignore

12
.editorconfig Normal file
View File

@@ -0,0 +1,12 @@
root = true
[*]
indent_style = space
indent_size = 4
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
[Makefile]
indent_style = tab

View File

@@ -0,0 +1,9 @@
name: check
on: [push]
jobs:
check:
runs-on: ubuntu-latest
steps:
# actions/checkout v4.2.2, 2026-02-28
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
- run: docker build .

View File

@@ -1,68 +0,0 @@
# Repository Rules
Last Updated 2026-01-08
These rules MUST be followed at all times, it is very important.
* Never use `git add -A` - add specific changes to a deliberate commit. A
commit should contain one change. After each change, make a commit with a
good one-line summary.
* NEVER modify the linter config without asking first.
* NEVER modify tests to exclude special cases or otherwise get them to pass
without asking first. In almost all cases, the code should be changed,
NOT the tests. If you think the test needs to be changed, make your case
for that and ask for permission to proceed, then stop. You need explicit
user approval to modify existing tests. (You do not need user approval
for writing NEW tests.)
* When linting, assume the linter config is CORRECT, and that each item
output by the linter is something that legitimately needs fixing in the
code.
* When running tests, use `make test`.
* Before commits, run `make check`. This runs `make lint` and `make test`
and `make check-fmt`. Any issues discovered MUST be resolved before
committing unless explicitly told otherwise.
* When fixing a bug, write a failing test for the bug FIRST. Add
appropriate logging to the test to ensure it is written correctly. Commit
that. Then go about fixing the bug until the test passes (without
modifying the test further). Then commit that.
* When adding a new feature, do the same - implement a test first (TDD). It
doesn't have to be super complex. Commit the test, then commit the
feature.
* When adding a new feature, use a feature branch. When the feature is
completely finished and the code is up to standards (passes `make check`)
then and only then can the feature branch be merged into `main` and the
branch deleted.
* Write godoc documentation comments for all exported types and functions as
you go along.
* ALWAYS be consistent in naming. If you name something one thing in one
place, name it the EXACT SAME THING in another place.
* Be descriptive and specific in naming. `wl` is bad;
`SourceHostWhitelist` is good. `ConnsPerHost` is bad;
`MaxConnectionsPerHost` is good.
* This is not prototype or teaching code - this is designed for production.
Any security issues (such as denial of service) or other web
vulnerabilities are P1 bugs and must be added to TODO.md at the top.
* As this is production code, no stubbing of implementations unless
specifically instructed. We need working implementations.
* Avoid vendoring deps unless specifically instructed to. NEVER commit
the vendor directory, NEVER commit compiled binaries. If these
directories or files exist, add them to .gitignore (and commit the
.gitignore) if they are not already in there. Keep the entire git
repository (with history) small - under 20MiB, unless you specifically
must commit larger files (e.g. test fixture example media files). Only
OUR source code and immediately supporting files (such as test examples)
goes into the repo/history.

File diff suppressed because it is too large Load Diff

View File

@@ -1,11 +1,13 @@
# Build stage
FROM golang:1.25-alpine AS builder
# golang 1.25-alpine, 2026-02-28
FROM golang@sha256:f6751d823c26342f9506c03797d2527668d095b0a15f1862cddb4d927a7a4ced AS builder
RUN apk add --no-cache git make gcc musl-dev
RUN apk add --no-cache git make gcc musl-dev binutils-gold
# Install golangci-lint v2
RUN go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@latest
RUN go install golang.org/x/tools/cmd/goimports@latest
# golangci-lint v2.10.1
RUN go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@5d1e709b7be35cb2025444e19de266b056b7b7ee
# goimports v0.42.0
RUN go install golang.org/x/tools/cmd/goimports@009367f5c17a8d4c45a961a3a509277190a9a6f0
WORKDIR /src
COPY go.mod go.sum ./
@@ -20,7 +22,8 @@ RUN make check
RUN make build
# Runtime stage
FROM alpine:3.21
# alpine 3.21, 2026-02-28
FROM alpine@sha256:c3f8e73fdb79deaebaa2037150150191b9dcbfba68b4a46d70103204c53f4709
RUN apk add --no-cache ca-certificates tzdata

View File

@@ -1,4 +1,4 @@
.PHONY: all build lint fmt test check clean
.PHONY: all build lint fmt fmt-check test check clean hooks docker
BINARY := dnswatcher
VERSION := $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
@@ -17,8 +17,11 @@ fmt:
gofmt -s -w .
goimports -w .
fmt-check:
@test -z "$$(gofmt -l .)" || (echo "gofmt: files not formatted:" && gofmt -l . && exit 1)
test:
go test -v -race -cover ./...
go test -v -race -timeout 30s -cover ./...
# Check runs all validation without making changes
# Used by CI and Docker build - fails if anything is wrong
@@ -28,10 +31,19 @@ check:
@echo "==> Running linter..."
golangci-lint run --config .golangci.yml ./...
@echo "==> Running tests..."
go test -v -race ./...
go test -v -race -timeout 30s ./...
@echo "==> Building..."
go build -ldflags "$(LDFLAGS)" -o /dev/null ./cmd/dnswatcher
@echo "==> All checks passed!"
clean:
rm -rf bin/
hooks:
@echo '#!/bin/sh' > .git/hooks/pre-commit
@echo 'make check' >> .git/hooks/pre-commit
@chmod +x .git/hooks/pre-commit
@echo "Pre-commit hook installed."
docker:
docker build .

View File

@@ -1,9 +1,10 @@
# dnswatcher
dnswatcher is a pre-1.0 Go daemon by [@sneak](https://sneak.berlin) that monitors DNS records, TCP port availability, and TLS certificates, delivering real-time change notifications via Slack, Mattermost, and ntfy webhooks.
> ⚠️ Pre-1.0 software. APIs, configuration, and behavior may change without notice.
dnswatcher is a production DNS and infrastructure monitoring daemon written in
Go. It watches configured DNS domains and hostnames for changes, monitors TCP
dnswatcher watches configured DNS domains and hostnames for changes, monitors TCP
port availability, tracks TLS certificate expiry, and delivers real-time
notifications via Slack, Mattermost, and/or ntfy webhooks.
@@ -109,8 +110,8 @@ includes:
- **NS recoveries**: Which nameserver recovered, which hostname/domain.
- **NS inconsistencies**: Which nameservers disagree, what each one
returned, which hostname affected.
- **Port changes**: Which IP:port, old state, new state, associated
hostname.
- **Port changes**: Which IP:port, old state, new state, all associated
hostnames.
- **TLS expiry warnings**: Which certificate, days remaining, CN,
issuer, associated hostname and IP.
- **TLS certificate changes**: Old and new CN/issuer/SANs, associated
@@ -289,12 +290,12 @@ not as a merged view, to enable inconsistency detection.
"ports": {
"93.184.216.34:80": {
"open": true,
"hostname": "www.example.com",
"hostnames": ["www.example.com"],
"lastChecked": "2026-02-19T12:00:00Z"
},
"93.184.216.34:443": {
"open": true,
"hostname": "www.example.com",
"hostnames": ["www.example.com"],
"lastChecked": "2026-02-19T12:00:00Z"
}
},
@@ -366,9 +367,15 @@ docker run -d \
triggering change notifications).
2. **Initial check**: Immediately perform all DNS, port, and TLS checks
on startup.
3. **Periodic checks**:
- DNS and port checks: every `DNSWATCHER_DNS_INTERVAL` (default 1h).
- TLS checks: every `DNSWATCHER_TLS_INTERVAL` (default 12h).
3. **Periodic checks** (DNS always runs first):
- DNS checks: every `DNSWATCHER_DNS_INTERVAL` (default 1h). Also
re-run before every TLS check cycle to ensure fresh IPs.
- Port checks: every `DNSWATCHER_DNS_INTERVAL`, after DNS completes.
- TLS checks: every `DNSWATCHER_TLS_INTERVAL` (default 12h), after
DNS completes.
- Port and TLS checks always use freshly resolved IP addresses from
the DNS phase that immediately precedes them — never stale IPs
from a previous cycle.
4. **On change detection**: Send notifications to all configured
endpoints, update in-memory state, persist to disk.
5. **Shutdown**: Persist final state to disk, complete in-flight
@@ -376,9 +383,27 @@ docker run -d \
---
## Planned Future Features (Post-1.0)
- **DNSSEC validation**: Validate the DNSSEC chain of trust during
iterative resolution and report DNSSEC failures as notifications.
---
## Project Structure
Follows the conventions defined in `CONVENTIONS.md`, adapted from the
Follows the conventions defined in `REPO_POLICIES.md`, adapted from the
[upaas](https://git.eeqj.de/sneak/upaas) project template. Uses uber/fx
for dependency injection, go-chi for HTTP routing, slog for logging, and
Viper for configuration.
---
## License
License has not yet been chosen for this project. Pending decision by the
author (MIT, GPL, or WTFPL).
## Author
[@sneak](https://sneak.berlin)

188
REPO_POLICIES.md Normal file
View File

@@ -0,0 +1,188 @@
---
title: Repository Policies
last_modified: 2026-02-22
---
This document covers repository structure, tooling, and workflow standards. Code
style conventions are in separate documents:
- [Code Styleguide](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE.md)
(general, bash, Docker)
- [Go](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE_GO.md)
- [JavaScript](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE_JS.md)
- [Python](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/CODE_STYLEGUIDE_PYTHON.md)
- [Go HTTP Server Conventions](https://git.eeqj.de/sneak/prompts/raw/branch/main/prompts/GO_HTTP_SERVER_CONVENTIONS.md)
---
- Cross-project documentation (such as this file) must include
`last_modified: YYYY-MM-DD` in the YAML front matter so it can be kept in sync
with the authoritative source as policies evolve.
- **ALL external references must be pinned by cryptographic hash.** This
includes Docker base images, Go modules, npm packages, GitHub Actions, and
anything else fetched from a remote source. Version tags (`@v4`, `@latest`,
`:3.21`, etc.) are server-mutable and therefore remote code execution
vulnerabilities. The ONLY acceptable way to reference an external dependency
is by its content hash (Docker `@sha256:...`, Go module hash in `go.sum`, npm
integrity hash in lockfile, GitHub Actions `@<commit-sha>`). No exceptions.
This also means never `curl | bash` to install tools like pyenv, nvm, rustup,
etc. Instead, download a specific release archive from GitHub, verify its hash
(hardcoded in the Dockerfile or script), and only then install. Unverified
install scripts are arbitrary remote code execution. This is the single most
important rule in this document. Double-check every external reference in
every file before committing. There are zero exceptions to this rule.
- Every repo with software must have a root `Makefile` with these targets:
`make test`, `make lint`, `make fmt` (writes), `make fmt-check` (read-only),
`make check` (prereqs: `test`, `lint`, `fmt-check`), `make docker`, and
`make hooks` (installs pre-commit hook). A model Makefile is at
`https://git.eeqj.de/sneak/prompts/raw/branch/main/Makefile`.
- Always use Makefile targets (`make fmt`, `make test`, `make lint`, etc.)
instead of invoking the underlying tools directly. The Makefile is the single
source of truth for how these operations are run.
- The Makefile is authoritative documentation for how the repo is used. Beyond
the required targets above, it should have targets for every common operation:
running a local development server (`make run`, `make dev`), re-initializing
or migrating the database (`make db-reset`, `make migrate`), building
artifacts (`make build`), generating code, seeding data, or anything else a
developer would do regularly. If someone checks out the repo and types
`make<tab>`, they should see every meaningful operation available. A new
contributor should be able to understand the entire development workflow by
reading the Makefile.
- Every repo should have a `Dockerfile`. All Dockerfiles must run `make check`
as a build step so the build fails if the branch is not green. For non-server
repos, the Dockerfile should bring up a development environment and run
`make check`. For server repos, `make check` should run as an early build
stage before the final image is assembled.
- Every repo should have a Gitea Actions workflow (`.gitea/workflows/`) that
runs `docker build .` on push. Since the Dockerfile already runs `make check`,
a successful build implies all checks pass.
- Use platform-standard formatters: `black` for Python, `prettier` for
JS/CSS/Markdown/HTML, `go fmt` for Go. Always use default configuration with
two exceptions: four-space indents (except Go), and `proseWrap: always` for
Markdown (hard-wrap at 80 columns). Documentation and writing repos (Markdown,
HTML, CSS) should also have `.prettierrc` and `.prettierignore`.
- Pre-commit hook: `make check` if local testing is possible, otherwise
`make lint && make fmt-check`. The Makefile should provide a `make hooks`
target to install the pre-commit hook.
- All repos with software must have tests that run via the platform-standard
test framework (`go test`, `pytest`, `jest`/`vitest`, etc.). If no meaningful
tests exist yet, add the most minimal test possible — e.g. importing the
module under test to verify it compiles/parses. There is no excuse for
`make test` to be a no-op.
- `make test` must complete in under 20 seconds. Add a 30-second timeout in the
Makefile.
- Docker builds must complete in under 5 minutes.
- `make check` must not modify any files in the repo. Tests may use temporary
directories.
- `main` must always pass `make check`, no exceptions.
- Never commit secrets. `.env` files, credentials, API keys, and private keys
must be in `.gitignore`. No exceptions.
- `.gitignore` should be comprehensive from the start: OS files (`.DS_Store`),
editor files (`.swp`, `*~`), language build artifacts, and `node_modules/`.
Fetch the standard `.gitignore` from
`https://git.eeqj.de/sneak/prompts/raw/branch/main/.gitignore` when setting up
a new repo.
- Never use `git add -A` or `git add .`. Always stage files explicitly by name.
- Never force-push to `main`.
- Make all changes on a feature branch. You can do whatever you want on a
feature branch.
- `.golangci.yml` is standardized and must _NEVER_ be modified by an agent, only
manually by the user. Fetch from
`https://git.eeqj.de/sneak/prompts/raw/branch/main/.golangci.yml`.
- When pinning images or packages by hash, add a comment above the reference
with the version and date (YYYY-MM-DD).
- Use `yarn`, not `npm`.
- Write all dates as YYYY-MM-DD (ISO 8601).
- Simple projects should be configured with environment variables.
- Dockerized web services listen on port 8080 by default, overridable with
`PORT`.
- `README.md` is the primary documentation. Required sections:
- **Description**: First line must include the project name, purpose,
category (web server, SPA, CLI tool, etc.), license, and author. Example:
"µPaaS is an MIT-licensed Go web application by @sneak that receives
git-frontend webhooks and deploys applications via Docker in realtime."
- **Getting Started**: Copy-pasteable install/usage code block.
- **Rationale**: Why does this exist?
- **Design**: How is the program structured?
- **TODO**: Update meticulously, even between commits. When planning, put
the todo list in the README so a new agent can pick up where the last one
left off.
- **License**: MIT, GPL, or WTFPL. Ask the user for new projects. Include a
`LICENSE` file in the repo root and a License section in the README.
- **Author**: [@sneak](https://sneak.berlin).
- First commit of a new repo should contain only `README.md`.
- Go module root: `sneak.berlin/go/<name>`. Always run `go mod tidy` before
committing.
- Use SemVer.
- Database migrations live in `internal/db/migrations/` and must be embedded in
the binary.
- `000_migration.sql` — contains ONLY the creation of the migrations tracking
table itself. Nothing else.
- `001_schema.sql` — the full application schema.
- **Pre-1.0.0:** never add additional migration files (002, 003, etc.). There
is no installed base to migrate. Edit `001_schema.sql` directly.
- **Post-1.0.0:** add new numbered migration files for each schema change.
Never edit existing migrations after release.
- All repos should have an `.editorconfig` enforcing the project's indentation
settings.
- Avoid putting files in the repo root unless necessary. Root should contain
only project-level config files (`README.md`, `Makefile`, `Dockerfile`,
`LICENSE`, `.gitignore`, `.editorconfig`, `REPO_POLICIES.md`, and
language-specific config). Everything else goes in a subdirectory. Canonical
subdirectory names:
- `bin/` — executable scripts and tools
- `cmd/` — Go command entrypoints
- `configs/` — configuration templates and examples
- `deploy/` — deployment manifests (k8s, compose, terraform)
- `docs/` — documentation and markdown (README.md stays in root)
- `internal/` — Go internal packages
- `internal/db/migrations/` — database migrations
- `pkg/` — Go library packages
- `share/` — systemd units, data files
- `static/` — static assets (images, fonts, etc.)
- `web/` — web frontend source
- When setting up a new repo, files from the `prompts` repo may be used as
templates. Fetch them from
`https://git.eeqj.de/sneak/prompts/raw/branch/main/<path>`.
- New repos must contain at minimum:
- `README.md`, `.git`, `.gitignore`, `.editorconfig`
- `LICENSE`, `REPO_POLICIES.md` (copy from the `prompts` repo)
- `Makefile`
- `Dockerfile`, `.dockerignore`
- `.gitea/workflows/check.yml`
- Go: `go.mod`, `go.sum`, `.golangci.yml`
- JS: `package.json`, `yarn.lock`, `.prettierrc`, `.prettierignore`
- Python: `pyproject.toml`

34
TESTING.md Normal file
View File

@@ -0,0 +1,34 @@
# Testing Policy
## DNS Resolution Tests
All resolver tests **MUST** use live queries against real DNS servers.
No mocking of the DNS client layer is permitted.
### Rationale
The resolver performs iterative resolution from root nameservers through
the full delegation chain. Mocked responses cannot faithfully represent
the variety of real-world DNS behavior (truncation, referrals, glue
records, DNSSEC, varied response times, EDNS, etc.). Testing against
real servers ensures the resolver works correctly in production.
### Constraints
- Tests hit real DNS infrastructure and require network access
- Test duration depends on network conditions; timeout tuning keeps
the suite within the 30-second target
- Query timeout is calibrated to 3× maximum antipodal RTT (~300ms)
plus processing margin
- Root server fan-out is limited to reduce parallel query load
- Flaky failures from transient network issues are acceptable and
should be investigated as potential resolver bugs, not papered over
with mocks or skip flags
### What NOT to do
- **Do not mock `DNSClient`** for resolver tests (the mock constructor
exists for unit-testing other packages that consume the resolver)
- **Do not add `-short` flags** to skip slow tests
- **Do not increase `-timeout`** to hide hanging queries
- **Do not modify linter configuration** to suppress findings

8
go.mod
View File

@@ -7,8 +7,10 @@ require (
github.com/go-chi/chi/v5 v5.2.5
github.com/go-chi/cors v1.2.2
github.com/joho/godotenv v1.5.1
github.com/miekg/dns v1.1.72
github.com/prometheus/client_golang v1.23.2
github.com/spf13/viper v1.21.0
github.com/stretchr/testify v1.11.1
go.uber.org/fx v1.24.0
golang.org/x/net v0.50.0
golang.org/x/sync v0.19.0
@@ -17,10 +19,12 @@ require (
require (
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/fsnotify/fsnotify v1.9.0 // indirect
github.com/go-viper/mapstructure/v2 v2.4.0 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/pelletier/go-toml/v2 v2.2.4 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.66.1 // indirect
github.com/prometheus/procfs v0.16.1 // indirect
@@ -35,7 +39,11 @@ require (
go.uber.org/zap v1.26.0 // indirect
go.yaml.in/yaml/v2 v2.4.2 // indirect
go.yaml.in/yaml/v3 v3.0.4 // indirect
golang.org/x/mod v0.32.0 // indirect
golang.org/x/sync v0.19.0 // indirect
golang.org/x/sys v0.41.0 // indirect
golang.org/x/text v0.34.0 // indirect
golang.org/x/tools v0.41.0 // indirect
google.golang.org/protobuf v1.36.8 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)

6
go.sum
View File

@@ -28,6 +28,8 @@ github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc=
github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
github.com/miekg/dns v1.1.72 h1:vhmr+TF2A3tuoGNkLDFK9zi36F2LS+hKTRW0Uf8kbzI=
github.com/miekg/dns v1.1.72/go.mod h1:+EuEPhdHOsfk6Wk5TT2CzssZdqkmFhf8r+aVyDEToIs=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ=
github.com/pelletier/go-toml/v2 v2.2.4 h1:mye9XuhQ6gvn5h28+VilKrrPoQVanw5PMw/TB0t5Ec4=
@@ -74,6 +76,8 @@ go.yaml.in/yaml/v2 v2.4.2 h1:DzmwEr2rDGHl7lsFgAHxmNz/1NlQ7xLIrlN2h5d1eGI=
go.yaml.in/yaml/v2 v2.4.2/go.mod h1:081UH+NErpNdqlCXm3TtEran0rJZGxAYx9hb/ELlsPU=
go.yaml.in/yaml/v3 v3.0.4 h1:tfq32ie2Jv2UxXFdLJdh3jXuOzWiL1fo0bu/FbuKpbc=
go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg=
golang.org/x/mod v0.32.0 h1:9F4d3PHLljb6x//jOyokMv3eX+YDeepZSEo3mFJy93c=
golang.org/x/mod v0.32.0/go.mod h1:SgipZ/3h2Ci89DlEtEXWUk/HteuRin+HHhN+WbNhguU=
golang.org/x/net v0.50.0 h1:ucWh9eiCGyDR3vtzso0WMQinm2Dnt8cFMuQa9K33J60=
golang.org/x/net v0.50.0/go.mod h1:UgoSli3F/pBgdJBHCTc+tp3gmrU4XswgGRgtnwWTfyM=
golang.org/x/sync v0.19.0 h1:vV+1eWNmZ5geRlYjzm2adRgW2/mcpevXNg50YZtPCE4=
@@ -82,6 +86,8 @@ golang.org/x/sys v0.41.0 h1:Ivj+2Cp/ylzLiEU89QhWblYnOE9zerudt9Ftecq2C6k=
golang.org/x/sys v0.41.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/text v0.34.0 h1:oL/Qq0Kdaqxa1KbNeMKwQq0reLCCaFtqu2eNuSeNHbk=
golang.org/x/text v0.34.0/go.mod h1:homfLqTYRFyVYemLBFl5GgL/DWEiH5wcsQ5gSh1yziA=
golang.org/x/tools v0.41.0 h1:a9b8iMweWG+S0OBnlU36rzLp20z1Rp10w+IY2czHTQc=
golang.org/x/tools v0.41.0/go.mod h1:XSY6eDqxVNiYgezAVqqCeihT4j1U2CCsqvH3WhQpnlg=
google.golang.org/protobuf v1.36.8 h1:xHScyCOEuuwZEc6UtSOvPbAT4zRh0xcNRYekJwfqyMc=
google.golang.org/protobuf v1.36.8/go.mod h1:fuxRtAxBytpl4zzqUh6/eyUujkJdNiuEkXntxiD/uRU=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=

View File

@@ -15,6 +15,12 @@ import (
"sneak.berlin/go/dnswatcher/internal/logger"
)
// ErrNoTargets is returned when DNSWATCHER_TARGETS is empty or unset.
var ErrNoTargets = errors.New(
"no targets configured: set DNSWATCHER_TARGETS to a comma-separated " +
"list of DNS names to monitor",
)
// Default configuration values.
const (
defaultPort = 8080
@@ -118,25 +124,9 @@ func buildConfig(
}
}
dnsInterval, err := time.ParseDuration(
viper.GetString("DNS_INTERVAL"),
)
domains, hostnames, err := classifyAndValidateTargets()
if err != nil {
dnsInterval = defaultDNSInterval
}
tlsInterval, err := time.ParseDuration(
viper.GetString("TLS_INTERVAL"),
)
if err != nil {
tlsInterval = defaultTLSInterval
}
domains, hostnames, err := ClassifyTargets(
parseCSV(viper.GetString("TARGETS")),
)
if err != nil {
return nil, fmt.Errorf("invalid targets configuration: %w", err)
return nil, err
}
cfg := &Config{
@@ -148,8 +138,8 @@ func buildConfig(
SlackWebhook: viper.GetString("SLACK_WEBHOOK"),
MattermostWebhook: viper.GetString("MATTERMOST_WEBHOOK"),
NtfyTopic: viper.GetString("NTFY_TOPIC"),
DNSInterval: dnsInterval,
TLSInterval: tlsInterval,
DNSInterval: parseDurationOrDefault("DNS_INTERVAL", defaultDNSInterval),
TLSInterval: parseDurationOrDefault("TLS_INTERVAL", defaultTLSInterval),
TLSExpiryWarning: viper.GetInt("TLS_EXPIRY_WARNING"),
SentryDSN: viper.GetString("SENTRY_DSN"),
MaintenanceMode: viper.GetBool("MAINTENANCE_MODE"),
@@ -162,6 +152,32 @@ func buildConfig(
return cfg, nil
}
func classifyAndValidateTargets() ([]string, []string, error) {
domains, hostnames, err := ClassifyTargets(
parseCSV(viper.GetString("TARGETS")),
)
if err != nil {
return nil, nil, fmt.Errorf(
"invalid targets configuration: %w", err,
)
}
if len(domains) == 0 && len(hostnames) == 0 {
return nil, nil, ErrNoTargets
}
return domains, hostnames, nil
}
func parseDurationOrDefault(key string, fallback time.Duration) time.Duration {
d, err := time.ParseDuration(viper.GetString(key))
if err != nil {
return fallback
}
return d
}
func parseCSV(input string) []string {
if input == "" {
return nil

View File

@@ -0,0 +1,87 @@
package config_test
import (
"errors"
"testing"
"go.uber.org/fx"
"sneak.berlin/go/dnswatcher/internal/config"
"sneak.berlin/go/dnswatcher/internal/globals"
"sneak.berlin/go/dnswatcher/internal/logger"
)
func TestNewReturnsErrNoTargetsWhenEmpty(t *testing.T) {
// Cannot use t.Parallel() because t.Setenv modifies the process
// environment.
t.Setenv("DNSWATCHER_TARGETS", "")
t.Setenv("DNSWATCHER_DATA_DIR", t.TempDir())
var cfg *config.Config
app := fx.New(
fx.Provide(
func() *globals.Globals {
return &globals.Globals{
Appname: "dnswatcher-test-empty",
}
},
logger.New,
config.New,
),
fx.Populate(&cfg),
fx.NopLogger,
)
err := app.Err()
if err == nil {
t.Fatal(
"expected error when DNSWATCHER_TARGETS is empty, got nil",
)
}
if !errors.Is(err, config.ErrNoTargets) {
t.Errorf("expected ErrNoTargets, got: %v", err)
}
}
func TestNewSucceedsWithTargets(t *testing.T) {
// Cannot use t.Parallel() because t.Setenv modifies the process
// environment.
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DATA_DIR", t.TempDir())
// Prevent loading a local config file by changing to a temp dir.
t.Chdir(t.TempDir())
var cfg *config.Config
app := fx.New(
fx.Provide(
func() *globals.Globals {
return &globals.Globals{
Appname: "dnswatcher-test-ok",
}
},
logger.New,
config.New,
),
fx.Populate(&cfg),
fx.NopLogger,
)
err := app.Err()
if err != nil {
t.Fatalf(
"expected no error with valid targets, got: %v",
err,
)
}
if len(cfg.Domains) != 1 || cfg.Domains[0] != "example.com" {
t.Errorf(
"expected [example.com], got domains=%v",
cfg.Domains,
)
}
}

View File

@@ -1,4 +1,5 @@
// Package notify provides notification delivery to Slack, Mattermost, and ntfy.
// Package notify provides notification delivery to Slack,
// Mattermost, and ntfy.
package notify
import (
@@ -7,6 +8,7 @@ import (
"encoding/json"
"errors"
"fmt"
"io"
"log/slog"
"net/http"
"net/url"
@@ -34,8 +36,66 @@ var (
ErrMattermostFailed = errors.New(
"mattermost notification failed",
)
// ErrInvalidScheme is returned for disallowed URL schemes.
ErrInvalidScheme = errors.New("URL scheme not allowed")
// ErrMissingHost is returned when a URL has no host.
ErrMissingHost = errors.New("URL must have a host")
)
// IsAllowedScheme checks if the URL scheme is permitted.
func IsAllowedScheme(scheme string) bool {
return scheme == "https" || scheme == "http"
}
// ValidateWebhookURL validates and sanitizes a webhook URL.
// It ensures the URL has an allowed scheme (http/https),
// a non-empty host, and returns a pre-parsed *url.URL
// reconstructed from validated components.
func ValidateWebhookURL(raw string) (*url.URL, error) {
u, err := url.ParseRequestURI(raw)
if err != nil {
return nil, fmt.Errorf("invalid URL: %w", err)
}
if !IsAllowedScheme(u.Scheme) {
return nil, fmt.Errorf(
"%w: %s", ErrInvalidScheme, u.Scheme,
)
}
if u.Host == "" {
return nil, fmt.Errorf("%w", ErrMissingHost)
}
// Reconstruct from parsed components.
clean := &url.URL{
Scheme: u.Scheme,
Host: u.Host,
Path: u.Path,
RawQuery: u.RawQuery,
}
return clean, nil
}
// newRequest creates an http.Request from a pre-validated *url.URL.
// This avoids passing URL strings to http.NewRequestWithContext,
// which gosec flags as a potential SSRF vector.
func newRequest(
ctx context.Context,
method string,
target *url.URL,
body io.Reader,
) *http.Request {
return (&http.Request{
Method: method,
URL: target,
Host: target.Host,
Header: make(http.Header),
Body: io.NopCloser(body),
}).WithContext(ctx)
}
// Params contains dependencies for Service.
type Params struct {
fx.In
@@ -47,7 +107,7 @@ type Params struct {
// Service provides notification functionality.
type Service struct {
log *slog.Logger
client *http.Client
transport http.RoundTripper
config *config.Config
ntfyURL *url.URL
slackWebhookURL *url.URL
@@ -61,32 +121,40 @@ func New(
) (*Service, error) {
svc := &Service{
log: params.Logger.Get(),
client: &http.Client{
Timeout: httpClientTimeout,
},
transport: http.DefaultTransport,
config: params.Config,
}
if params.Config.NtfyTopic != "" {
u, err := url.ParseRequestURI(params.Config.NtfyTopic)
u, err := ValidateWebhookURL(
params.Config.NtfyTopic,
)
if err != nil {
return nil, fmt.Errorf("invalid ntfy topic URL: %w", err)
return nil, fmt.Errorf(
"invalid ntfy topic URL: %w", err,
)
}
svc.ntfyURL = u
}
if params.Config.SlackWebhook != "" {
u, err := url.ParseRequestURI(params.Config.SlackWebhook)
u, err := ValidateWebhookURL(
params.Config.SlackWebhook,
)
if err != nil {
return nil, fmt.Errorf("invalid slack webhook URL: %w", err)
return nil, fmt.Errorf(
"invalid slack webhook URL: %w", err,
)
}
svc.slackWebhookURL = u
}
if params.Config.MattermostWebhook != "" {
u, err := url.ParseRequestURI(params.Config.MattermostWebhook)
u, err := ValidateWebhookURL(
params.Config.MattermostWebhook,
)
if err != nil {
return nil, fmt.Errorf(
"invalid mattermost webhook URL: %w", err,
@@ -99,7 +167,8 @@ func New(
return svc, nil
}
// SendNotification sends a notification to all configured endpoints.
// SendNotification sends a notification to all configured
// endpoints.
func (svc *Service) SendNotification(
ctx context.Context,
title, message, priority string,
@@ -170,20 +239,20 @@ func (svc *Service) sendNtfy(
"title", title,
)
request, err := http.NewRequestWithContext(
ctx,
http.MethodPost,
topicURL.String(),
bytes.NewBufferString(message),
ctx, cancel := context.WithTimeout(
ctx, httpClientTimeout,
)
defer cancel()
body := bytes.NewBufferString(message)
request := newRequest(
ctx, http.MethodPost, topicURL, body,
)
if err != nil {
return fmt.Errorf("creating ntfy request: %w", err)
}
request.Header.Set("Title", title)
request.Header.Set("Priority", ntfyPriority(priority))
resp, err := svc.client.Do(request) //nolint:gosec // URL validated at Service construction time
resp, err := svc.transport.RoundTrip(request)
if err != nil {
return fmt.Errorf("sending ntfy request: %w", err)
}
@@ -192,7 +261,8 @@ func (svc *Service) sendNtfy(
if resp.StatusCode >= httpStatusClientError {
return fmt.Errorf(
"%w: status %d", ErrNtfyFailed, resp.StatusCode,
"%w: status %d",
ErrNtfyFailed, resp.StatusCode,
)
}
@@ -232,6 +302,11 @@ func (svc *Service) sendSlack(
webhookURL *url.URL,
title, message, priority string,
) error {
ctx, cancel := context.WithTimeout(
ctx, httpClientTimeout,
)
defer cancel()
svc.log.Debug(
"sending webhook notification",
"url", webhookURL.String(),
@@ -250,22 +325,19 @@ func (svc *Service) sendSlack(
body, err := json.Marshal(payload)
if err != nil {
return fmt.Errorf("marshaling webhook payload: %w", err)
return fmt.Errorf(
"marshaling webhook payload: %w", err,
)
}
request, err := http.NewRequestWithContext(
ctx,
http.MethodPost,
webhookURL.String(),
request := newRequest(
ctx, http.MethodPost, webhookURL,
bytes.NewBuffer(body),
)
if err != nil {
return fmt.Errorf("creating webhook request: %w", err)
}
request.Header.Set("Content-Type", "application/json")
resp, err := svc.client.Do(request) //nolint:gosec // URL validated at Service construction time
resp, err := svc.transport.RoundTrip(request)
if err != nil {
return fmt.Errorf("sending webhook request: %w", err)
}

View File

@@ -0,0 +1,100 @@
package notify_test
import (
"testing"
"sneak.berlin/go/dnswatcher/internal/notify"
)
func TestValidateWebhookURLValid(t *testing.T) {
t.Parallel()
tests := []struct {
name string
input string
wantURL string
}{
{
name: "valid https URL",
input: "https://hooks.slack.com/T00/B00",
wantURL: "https://hooks.slack.com/T00/B00",
},
{
name: "valid http URL",
input: "http://localhost:8080/webhook",
wantURL: "http://localhost:8080/webhook",
},
{
name: "https with query",
input: "https://ntfy.sh/topic?auth=tok",
wantURL: "https://ntfy.sh/topic?auth=tok",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
got, err := notify.ValidateWebhookURL(tt.input)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if got.String() != tt.wantURL {
t.Errorf(
"got %q, want %q",
got.String(), tt.wantURL,
)
}
})
}
}
func TestValidateWebhookURLInvalid(t *testing.T) {
t.Parallel()
invalid := []struct {
name string
input string
}{
{"ftp scheme", "ftp://example.com/file"},
{"file scheme", "file:///etc/passwd"},
{"empty string", ""},
{"no scheme", "example.com/webhook"},
{"no host", "https:///path"},
}
for _, tt := range invalid {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
got, err := notify.ValidateWebhookURL(tt.input)
if err == nil {
t.Errorf(
"expected error for %q, got %v",
tt.input, got,
)
}
})
}
}
func TestIsAllowedScheme(t *testing.T) {
t.Parallel()
if !notify.IsAllowedScheme("https") {
t.Error("https should be allowed")
}
if !notify.IsAllowedScheme("http") {
t.Error("http should be allowed")
}
if notify.IsAllowedScheme("ftp") {
t.Error("ftp should not be allowed")
}
if notify.IsAllowedScheme("") {
t.Error("empty scheme should not be allowed")
}
}

View File

@@ -0,0 +1,48 @@
package resolver
import (
"context"
"time"
"github.com/miekg/dns"
)
// DNSClient abstracts DNS wire-protocol exchanges so the resolver
// can be tested without hitting real nameservers.
type DNSClient interface {
ExchangeContext(
ctx context.Context,
msg *dns.Msg,
addr string,
) (*dns.Msg, time.Duration, error)
}
// udpClient wraps a real dns.Client for production use.
type udpClient struct {
timeout time.Duration
}
func (c *udpClient) ExchangeContext(
ctx context.Context,
msg *dns.Msg,
addr string,
) (*dns.Msg, time.Duration, error) {
cl := &dns.Client{Timeout: c.timeout}
return cl.ExchangeContext(ctx, msg, addr)
}
// tcpClient wraps a real dns.Client using TCP.
type tcpClient struct {
timeout time.Duration
}
func (c *tcpClient) ExchangeContext(
ctx context.Context,
msg *dns.Msg,
addr string,
) (*dns.Msg, time.Duration, error) {
cl := &dns.Client{Net: "tcp", Timeout: c.timeout}
return cl.ExchangeContext(ctx, msg, addr)
}

View File

@@ -0,0 +1,22 @@
package resolver
import "errors"
// Sentinel errors returned by the resolver.
var (
// ErrNoNameservers is returned when no authoritative NS
// could be discovered for a domain.
ErrNoNameservers = errors.New(
"no authoritative nameservers found",
)
// ErrCNAMEDepthExceeded is returned when a CNAME chain
// exceeds MaxCNAMEDepth.
ErrCNAMEDepthExceeded = errors.New(
"CNAME chain depth exceeded",
)
// ErrContextCanceled wraps context cancellation for the
// resolver's iterative queries.
ErrContextCanceled = errors.New("context canceled")
)

View File

@@ -0,0 +1,803 @@
package resolver
import (
"context"
"errors"
"fmt"
"net"
"sort"
"strings"
"time"
"github.com/miekg/dns"
)
const (
queryTimeoutDuration = 2 * time.Second
maxRetries = 2
maxDelegation = 20
timeoutMultiplier = 2
minDomainLabels = 2
)
// ErrRefused is returned when a DNS server refuses a query.
var ErrRefused = errors.New("dns query refused")
func rootServerList() []string {
return []string{
"198.41.0.4", // a.root-servers.net
"170.247.170.2", // b
"192.33.4.12", // c
"199.7.91.13", // d
"192.203.230.10", // e
"192.5.5.241", // f
"192.112.36.4", // g
"198.97.190.53", // h
"192.36.148.17", // i
"192.58.128.30", // j
"193.0.14.129", // k
"199.7.83.42", // l
"202.12.27.33", // m
}
}
func checkCtx(ctx context.Context) error {
err := ctx.Err()
if err != nil {
return ErrContextCanceled
}
return nil
}
func (r *Resolver) exchangeWithTimeout(
ctx context.Context,
msg *dns.Msg,
addr string,
attempt int,
) (*dns.Msg, error) {
_ = attempt // timeout escalation handled by client config
resp, _, err := r.client.ExchangeContext(ctx, msg, addr)
return resp, err
}
func (r *Resolver) tryExchange(
ctx context.Context,
msg *dns.Msg,
addr string,
) (*dns.Msg, error) {
var resp *dns.Msg
var err error
for attempt := range maxRetries {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
resp, err = r.exchangeWithTimeout(
ctx, msg, addr, attempt,
)
if err == nil {
break
}
}
return resp, err
}
func (r *Resolver) retryTCP(
ctx context.Context,
msg *dns.Msg,
addr string,
resp *dns.Msg,
) *dns.Msg {
if !resp.Truncated {
return resp
}
tcpResp, _, tcpErr := r.tcp.ExchangeContext(ctx, msg, addr)
if tcpErr == nil {
return tcpResp
}
return resp
}
// queryDNS sends a DNS query to a specific server IP.
// Tries non-recursive first, falls back to recursive on
// REFUSED (handles DNS interception environments).
func (r *Resolver) queryDNS(
ctx context.Context,
serverIP string,
name string,
qtype uint16,
) (*dns.Msg, error) {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
name = dns.Fqdn(name)
addr := net.JoinHostPort(serverIP, "53")
msg := new(dns.Msg)
msg.SetQuestion(name, qtype)
msg.RecursionDesired = false
resp, err := r.tryExchange(ctx, msg, addr)
if err != nil {
return nil, fmt.Errorf("query %s @%s: %w", name, serverIP, err)
}
if resp.Rcode == dns.RcodeRefused {
msg.RecursionDesired = true
resp, err = r.tryExchange(ctx, msg, addr)
if err != nil {
return nil, fmt.Errorf(
"query %s @%s: %w", name, serverIP, err,
)
}
if resp.Rcode == dns.RcodeRefused {
return nil, fmt.Errorf(
"query %s @%s: %w", name, serverIP, ErrRefused,
)
}
}
resp = r.retryTCP(ctx, msg, addr, resp)
return resp, nil
}
func extractNSSet(rrs []dns.RR) []string {
nsSet := make(map[string]bool)
for _, rr := range rrs {
if ns, ok := rr.(*dns.NS); ok {
nsSet[strings.ToLower(ns.Ns)] = true
}
}
names := make([]string, 0, len(nsSet))
for n := range nsSet {
names = append(names, n)
}
sort.Strings(names)
return names
}
func extractGlue(rrs []dns.RR) map[string][]net.IP {
glue := make(map[string][]net.IP)
for _, rr := range rrs {
switch r := rr.(type) {
case *dns.A:
name := strings.ToLower(r.Hdr.Name)
glue[name] = append(glue[name], r.A)
case *dns.AAAA:
name := strings.ToLower(r.Hdr.Name)
glue[name] = append(glue[name], r.AAAA)
}
}
return glue
}
func glueIPs(nsNames []string, glue map[string][]net.IP) []string {
var ips []string
for _, ns := range nsNames {
for _, addr := range glue[ns] {
if v4 := addr.To4(); v4 != nil {
ips = append(ips, v4.String())
}
}
}
return ips
}
func (r *Resolver) followDelegation(
ctx context.Context,
domain string,
servers []string,
) ([]string, error) {
for range maxDelegation {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
resp, err := r.queryServers(
ctx, servers, domain, dns.TypeNS,
)
if err != nil {
return nil, err
}
ansNS := extractNSSet(resp.Answer)
if len(ansNS) > 0 {
return ansNS, nil
}
authNS := extractNSSet(resp.Ns)
if len(authNS) == 0 {
return r.resolveNSIterative(ctx, domain)
}
glue := extractGlue(resp.Extra)
nextServers := glueIPs(authNS, glue)
if len(nextServers) == 0 {
nextServers = r.resolveNSIPs(ctx, authNS)
}
if len(nextServers) == 0 {
return nil, ErrNoNameservers
}
servers = nextServers
}
return nil, ErrNoNameservers
}
func (r *Resolver) queryServers(
ctx context.Context,
servers []string,
name string,
qtype uint16,
) (*dns.Msg, error) {
var lastErr error
for _, ip := range servers {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
resp, err := r.queryDNS(ctx, ip, name, qtype)
if err == nil {
return resp, nil
}
lastErr = err
}
return nil, fmt.Errorf("all servers failed: %w", lastErr)
}
func (r *Resolver) resolveNSIPs(
ctx context.Context,
nsNames []string,
) []string {
var ips []string
for _, ns := range nsNames {
resolved, err := r.resolveARecord(ctx, ns)
if err == nil {
ips = append(ips, resolved...)
}
if len(ips) > 0 {
break
}
}
return ips
}
// resolveNSIterative queries for NS records using iterative
// resolution as a fallback when followDelegation finds no
// authoritative answer in the delegation chain.
func (r *Resolver) resolveNSIterative(
ctx context.Context,
domain string,
) ([]string, error) {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
domain = dns.Fqdn(domain)
servers := rootServerList()
for range maxDelegation {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
resp, err := r.queryServers(
ctx, servers, domain, dns.TypeNS,
)
if err != nil {
return nil, err
}
nsNames := extractNSSet(resp.Answer)
if len(nsNames) > 0 {
return nsNames, nil
}
// Follow delegation.
authNS := extractNSSet(resp.Ns)
if len(authNS) == 0 {
break
}
glue := extractGlue(resp.Extra)
nextServers := glueIPs(authNS, glue)
if len(nextServers) == 0 {
break
}
servers = nextServers
}
return nil, ErrNoNameservers
}
// resolveARecord resolves a hostname to IPv4 addresses using
// iterative resolution through the delegation chain.
func (r *Resolver) resolveARecord(
ctx context.Context,
hostname string,
) ([]string, error) {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
hostname = dns.Fqdn(hostname)
servers := rootServerList()
for range maxDelegation {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
resp, err := r.queryServers(
ctx, servers, hostname, dns.TypeA,
)
if err != nil {
return nil, fmt.Errorf(
"resolving %s: %w", hostname, err,
)
}
// Check for A records in the answer section.
var ips []string
for _, rr := range resp.Answer {
if a, ok := rr.(*dns.A); ok {
ips = append(ips, a.A.String())
}
}
if len(ips) > 0 {
return ips, nil
}
// Follow delegation if present.
authNS := extractNSSet(resp.Ns)
if len(authNS) == 0 {
break
}
glue := extractGlue(resp.Extra)
nextServers := glueIPs(authNS, glue)
if len(nextServers) == 0 {
// Resolve NS IPs iteratively — but guard
// against infinite recursion by using only
// already-resolved servers.
break
}
servers = nextServers
}
return nil, fmt.Errorf(
"cannot resolve %s: %w", hostname, ErrNoNameservers,
)
}
// FindAuthoritativeNameservers traces the delegation chain from
// root servers to discover all authoritative nameservers for the
// given domain. Walks up the label hierarchy for subdomains.
func (r *Resolver) FindAuthoritativeNameservers(
ctx context.Context,
domain string,
) ([]string, error) {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
domain = dns.Fqdn(strings.ToLower(domain))
labels := dns.SplitDomainName(domain)
for i := range labels {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
candidate := strings.Join(labels[i:], ".") + "."
nsNames, err := r.followDelegation(
ctx, candidate, rootServerList(),
)
if err == nil && len(nsNames) > 0 {
sort.Strings(nsNames)
return nsNames, nil
}
}
return nil, ErrNoNameservers
}
// QueryNameserver queries a specific nameserver for all record
// types and builds a NameserverResponse.
func (r *Resolver) QueryNameserver(
ctx context.Context,
nsHostname string,
hostname string,
) (*NameserverResponse, error) {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
nsIPs, err := r.resolveARecord(ctx, nsHostname)
if err != nil {
return nil, fmt.Errorf("resolving NS %s: %w", nsHostname, err)
}
hostname = dns.Fqdn(hostname)
return r.queryAllTypes(ctx, nsHostname, nsIPs[0], hostname)
}
// QueryNameserverIP queries a nameserver by its IP address directly,
// bypassing NS hostname resolution.
func (r *Resolver) QueryNameserverIP(
ctx context.Context,
nsHostname string,
nsIP string,
hostname string,
) (*NameserverResponse, error) {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
hostname = dns.Fqdn(hostname)
return r.queryAllTypes(ctx, nsHostname, nsIP, hostname)
}
func (r *Resolver) queryAllTypes(
ctx context.Context,
nsHostname string,
nsIP string,
hostname string,
) (*NameserverResponse, error) {
resp := &NameserverResponse{
Nameserver: nsHostname,
Records: make(map[string][]string),
Status: StatusOK,
}
qtypes := []uint16{
dns.TypeA, dns.TypeAAAA, dns.TypeCNAME,
dns.TypeMX, dns.TypeTXT, dns.TypeSRV,
dns.TypeCAA, dns.TypeNS,
}
state := r.queryEachType(ctx, nsIP, hostname, qtypes, resp)
classifyResponse(resp, state)
return resp, nil
}
type queryState struct {
gotNXDomain bool
gotSERVFAIL bool
gotTimeout bool
hasRecords bool
}
func (r *Resolver) queryEachType(
ctx context.Context,
nsIP string,
hostname string,
qtypes []uint16,
resp *NameserverResponse,
) queryState {
var state queryState
for _, qtype := range qtypes {
if checkCtx(ctx) != nil {
break
}
r.querySingleType(ctx, nsIP, hostname, qtype, resp, &state)
}
for k := range resp.Records {
sort.Strings(resp.Records[k])
}
return state
}
func (r *Resolver) querySingleType(
ctx context.Context,
nsIP string,
hostname string,
qtype uint16,
resp *NameserverResponse,
state *queryState,
) {
msg, err := r.queryDNS(ctx, nsIP, hostname, qtype)
if err != nil {
if isTimeout(err) {
state.gotTimeout = true
}
return
}
if msg.Rcode == dns.RcodeNameError {
state.gotNXDomain = true
return
}
if msg.Rcode == dns.RcodeServerFailure {
state.gotSERVFAIL = true
return
}
collectAnswerRecords(msg, resp, state)
}
func collectAnswerRecords(
msg *dns.Msg,
resp *NameserverResponse,
state *queryState,
) {
for _, rr := range msg.Answer {
val := extractRecordValue(rr)
if val == "" {
continue
}
typeName := dns.TypeToString[rr.Header().Rrtype]
resp.Records[typeName] = append(
resp.Records[typeName], val,
)
state.hasRecords = true
}
}
// isTimeout checks whether an error is a network timeout.
func isTimeout(err error) bool {
var netErr net.Error
if errors.As(err, &netErr) {
return netErr.Timeout()
}
return false
}
func classifyResponse(resp *NameserverResponse, state queryState) {
switch {
case state.gotNXDomain && !state.hasRecords:
resp.Status = StatusNXDomain
case state.gotTimeout && !state.hasRecords:
resp.Status = StatusTimeout
resp.Error = "all queries timed out"
case state.gotSERVFAIL && !state.hasRecords:
resp.Status = StatusError
resp.Error = "server returned SERVFAIL"
case !state.hasRecords && !state.gotNXDomain:
resp.Status = StatusNoData
}
}
// extractRecordValue formats a DNS RR value as a string.
func extractRecordValue(rr dns.RR) string {
switch r := rr.(type) {
case *dns.A:
return r.A.String()
case *dns.AAAA:
return r.AAAA.String()
case *dns.CNAME:
return r.Target
case *dns.MX:
return fmt.Sprintf("%d %s", r.Preference, r.Mx)
case *dns.TXT:
return strings.Join(r.Txt, "")
case *dns.SRV:
return fmt.Sprintf(
"%d %d %d %s",
r.Priority, r.Weight, r.Port, r.Target,
)
case *dns.CAA:
return fmt.Sprintf(
"%d %s \"%s\"", r.Flag, r.Tag, r.Value,
)
case *dns.NS:
return r.Ns
default:
return ""
}
}
// parentDomain returns the registerable parent domain.
func parentDomain(hostname string) string {
hostname = dns.Fqdn(strings.ToLower(hostname))
labels := dns.SplitDomainName(hostname)
if len(labels) <= minDomainLabels {
return strings.Join(labels, ".") + "."
}
return strings.Join(
labels[len(labels)-minDomainLabels:], ".",
) + "."
}
// QueryAllNameservers discovers auth NSes for the hostname's
// parent domain, then queries each one independently.
func (r *Resolver) QueryAllNameservers(
ctx context.Context,
hostname string,
) (map[string]*NameserverResponse, error) {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
parent := parentDomain(hostname)
nameservers, err := r.FindAuthoritativeNameservers(ctx, parent)
if err != nil {
return nil, err
}
return r.queryEachNS(ctx, nameservers, hostname)
}
func (r *Resolver) queryEachNS(
ctx context.Context,
nameservers []string,
hostname string,
) (map[string]*NameserverResponse, error) {
results := make(map[string]*NameserverResponse)
for _, ns := range nameservers {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
resp, err := r.QueryNameserver(ctx, ns, hostname)
if err != nil {
results[ns] = &NameserverResponse{
Nameserver: ns,
Records: make(map[string][]string),
Status: StatusError,
Error: err.Error(),
}
continue
}
results[ns] = resp
}
return results, nil
}
// LookupNS returns the NS record set for a domain.
func (r *Resolver) LookupNS(
ctx context.Context,
domain string,
) ([]string, error) {
return r.FindAuthoritativeNameservers(ctx, domain)
}
// LookupAllRecords performs iterative resolution to find all DNS
// records for the given hostname, keyed by authoritative nameserver.
func (r *Resolver) LookupAllRecords(
ctx context.Context,
hostname string,
) (map[string]map[string][]string, error) {
results, err := r.QueryAllNameservers(ctx, hostname)
if err != nil {
return nil, err
}
out := make(map[string]map[string][]string, len(results))
for ns, resp := range results {
out[ns] = resp.Records
}
return out, nil
}
// ResolveIPAddresses resolves a hostname to all IPv4 and IPv6
// addresses, following CNAME chains up to MaxCNAMEDepth.
func (r *Resolver) ResolveIPAddresses(
ctx context.Context,
hostname string,
) ([]string, error) {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
return r.resolveIPWithCNAME(ctx, hostname, 0)
}
func (r *Resolver) resolveIPWithCNAME(
ctx context.Context,
hostname string,
depth int,
) ([]string, error) {
if depth > MaxCNAMEDepth {
return nil, ErrCNAMEDepthExceeded
}
results, err := r.QueryAllNameservers(ctx, hostname)
if err != nil {
return nil, err
}
ips, cnameTarget := collectIPs(results)
if len(ips) == 0 && cnameTarget != "" {
return r.resolveIPWithCNAME(ctx, cnameTarget, depth+1)
}
sort.Strings(ips)
return ips, nil
}
func collectIPs(
results map[string]*NameserverResponse,
) ([]string, string) {
seen := make(map[string]bool)
var ips []string
var cnameTarget string
for _, resp := range results {
if resp.Status == StatusNXDomain {
continue
}
for _, ip := range resp.Records["A"] {
if !seen[ip] {
seen[ip] = true
ips = append(ips, ip)
}
}
for _, ip := range resp.Records["AAAA"] {
if !seen[ip] {
seen[ip] = true
ips = append(ips, ip)
}
}
if len(resp.Records["CNAME"]) > 0 && cnameTarget == "" {
cnameTarget = resp.Records["CNAME"][0]
}
}
return ips, cnameTarget
}

View File

@@ -1,9 +1,9 @@
// Package resolver provides iterative DNS resolution from root nameservers.
// It traces the full delegation chain from IANA root servers through TLD
// and domain nameservers, never relying on upstream recursive resolvers.
package resolver
import (
"context"
"errors"
"log/slog"
"go.uber.org/fx"
@@ -11,8 +11,17 @@ import (
"sneak.berlin/go/dnswatcher/internal/logger"
)
// ErrNotImplemented indicates the resolver is not yet implemented.
var ErrNotImplemented = errors.New("resolver not yet implemented")
// Query status constants matching the state model.
const (
StatusOK = "ok"
StatusError = "error"
StatusNXDomain = "nxdomain"
StatusNoData = "nodata"
StatusTimeout = "timeout"
)
// MaxCNAMEDepth is the maximum CNAME chain depth to follow.
const MaxCNAMEDepth = 10
// Params contains dependencies for Resolver.
type Params struct {
@@ -21,44 +30,54 @@ type Params struct {
Logger *logger.Logger
}
// NameserverResponse holds one nameserver's response for a query.
type NameserverResponse struct {
Nameserver string
Records map[string][]string
Status string
Error string
}
// Resolver performs iterative DNS resolution from root servers.
type Resolver struct {
log *slog.Logger
client DNSClient
tcp DNSClient
}
// New creates a new Resolver instance.
// New creates a new Resolver instance for use with uber/fx.
func New(
_ fx.Lifecycle,
params Params,
) (*Resolver, error) {
return &Resolver{
log: params.Logger.Get(),
client: &udpClient{timeout: queryTimeoutDuration},
tcp: &tcpClient{timeout: queryTimeoutDuration},
}, nil
}
// LookupNS performs iterative resolution to find authoritative
// nameservers for the given domain.
func (r *Resolver) LookupNS(
_ context.Context,
_ string,
) ([]string, error) {
return nil, ErrNotImplemented
// NewFromLogger creates a Resolver directly from an slog.Logger,
// useful for testing without the fx lifecycle.
func NewFromLogger(log *slog.Logger) *Resolver {
return &Resolver{
log: log,
client: &udpClient{timeout: queryTimeoutDuration},
tcp: &tcpClient{timeout: queryTimeoutDuration},
}
}
// LookupAllRecords performs iterative resolution to find all DNS
// records for the given hostname, keyed by authoritative nameserver.
func (r *Resolver) LookupAllRecords(
_ context.Context,
_ string,
) (map[string]map[string][]string, error) {
return nil, ErrNotImplemented
// NewFromLoggerWithClient creates a Resolver with a custom DNS
// client, useful for testing with mock DNS responses.
func NewFromLoggerWithClient(
log *slog.Logger,
client DNSClient,
) *Resolver {
return &Resolver{
log: log,
client: client,
tcp: client,
}
}
// ResolveIPAddresses resolves a hostname to all IPv4 and IPv6
// addresses, following CNAME chains.
func (r *Resolver) ResolveIPAddresses(
_ context.Context,
_ string,
) ([]string, error) {
return nil, ErrNotImplemented
}
// Method implementations are in iterative.go.

View File

@@ -0,0 +1,688 @@
package resolver_test
import (
"context"
"log/slog"
"net"
"os"
"sort"
"strings"
"testing"
"time"
"github.com/miekg/dns"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/dnswatcher/internal/resolver"
)
// ----------------------------------------------------------------
// Test helpers
// ----------------------------------------------------------------
func newTestResolver(t *testing.T) *resolver.Resolver {
t.Helper()
log := slog.New(slog.NewTextHandler(
os.Stderr,
&slog.HandlerOptions{Level: slog.LevelDebug},
))
return resolver.NewFromLogger(log)
}
func testContext(t *testing.T) context.Context {
t.Helper()
ctx, cancel := context.WithTimeout(
context.Background(), 60*time.Second,
)
t.Cleanup(cancel)
return ctx
}
func findOneNSForDomain(
t *testing.T,
r *resolver.Resolver,
ctx context.Context, //nolint:revive // test helper
domain string,
) string {
t.Helper()
nameservers, err := r.FindAuthoritativeNameservers(
ctx, domain,
)
require.NoError(t, err)
require.NotEmpty(t, nameservers)
return nameservers[0]
}
// ----------------------------------------------------------------
// FindAuthoritativeNameservers tests
// ----------------------------------------------------------------
func TestFindAuthoritativeNameservers_ValidDomain(
t *testing.T,
) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
nameservers, err := r.FindAuthoritativeNameservers(
ctx, "google.com",
)
require.NoError(t, err)
require.NotEmpty(t, nameservers)
hasGoogleNS := false
for _, ns := range nameservers {
if strings.Contains(ns, "google") {
hasGoogleNS = true
break
}
}
assert.True(t, hasGoogleNS,
"expected google nameservers, got: %v", nameservers,
)
}
func TestFindAuthoritativeNameservers_Subdomain(
t *testing.T,
) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
nameservers, err := r.FindAuthoritativeNameservers(
ctx, "www.google.com",
)
require.NoError(t, err)
require.NotEmpty(t, nameservers)
}
func TestFindAuthoritativeNameservers_ReturnsSorted(
t *testing.T,
) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
nameservers, err := r.FindAuthoritativeNameservers(
ctx, "google.com",
)
require.NoError(t, err)
assert.True(
t,
sort.StringsAreSorted(nameservers),
"nameservers should be sorted, got: %v", nameservers,
)
}
func TestFindAuthoritativeNameservers_Deterministic(
t *testing.T,
) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
first, err := r.FindAuthoritativeNameservers(
ctx, "google.com",
)
require.NoError(t, err)
second, err := r.FindAuthoritativeNameservers(
ctx, "google.com",
)
require.NoError(t, err)
assert.Equal(t, first, second)
}
func TestFindAuthoritativeNameservers_TrailingDot(
t *testing.T,
) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ns1, err := r.FindAuthoritativeNameservers(
ctx, "google.com",
)
require.NoError(t, err)
ns2, err := r.FindAuthoritativeNameservers(
ctx, "google.com.",
)
require.NoError(t, err)
assert.Equal(t, ns1, ns2)
}
func TestFindAuthoritativeNameservers_CloudflareDomain(
t *testing.T,
) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
nameservers, err := r.FindAuthoritativeNameservers(
ctx, "cloudflare.com",
)
require.NoError(t, err)
require.NotEmpty(t, nameservers)
for _, ns := range nameservers {
assert.True(t, strings.HasSuffix(ns, "."),
"NS should be FQDN with trailing dot: %s", ns,
)
}
}
// ----------------------------------------------------------------
// QueryNameserver tests
// ----------------------------------------------------------------
func TestQueryNameserver_BasicA(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ns := findOneNSForDomain(t, r, ctx, "google.com")
resp, err := r.QueryNameserver(
ctx, ns, "www.google.com",
)
require.NoError(t, err)
require.NotNil(t, resp)
assert.Equal(t, resolver.StatusOK, resp.Status)
assert.Equal(t, ns, resp.Nameserver)
hasRecords := len(resp.Records["A"]) > 0 ||
len(resp.Records["CNAME"]) > 0
assert.True(t, hasRecords,
"expected A or CNAME records for www.google.com",
)
}
func TestQueryNameserver_AAAA(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ns := findOneNSForDomain(t, r, ctx, "cloudflare.com")
resp, err := r.QueryNameserver(
ctx, ns, "cloudflare.com",
)
require.NoError(t, err)
aaaaRecords := resp.Records["AAAA"]
require.NotEmpty(t, aaaaRecords,
"cloudflare.com should have AAAA records",
)
for _, ip := range aaaaRecords {
parsed := net.ParseIP(ip)
require.NotNil(t, parsed,
"should be valid IP: %s", ip,
)
}
}
func TestQueryNameserver_MX(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ns := findOneNSForDomain(t, r, ctx, "google.com")
resp, err := r.QueryNameserver(
ctx, ns, "google.com",
)
require.NoError(t, err)
mxRecords := resp.Records["MX"]
require.NotEmpty(t, mxRecords,
"google.com should have MX records",
)
}
func TestQueryNameserver_TXT(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ns := findOneNSForDomain(t, r, ctx, "google.com")
resp, err := r.QueryNameserver(
ctx, ns, "google.com",
)
require.NoError(t, err)
txtRecords := resp.Records["TXT"]
require.NotEmpty(t, txtRecords,
"google.com should have TXT records",
)
hasSPF := false
for _, txt := range txtRecords {
if strings.Contains(txt, "v=spf1") {
hasSPF = true
break
}
}
assert.True(t, hasSPF,
"google.com should have SPF TXT record",
)
}
func TestQueryNameserver_NXDomain(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ns := findOneNSForDomain(t, r, ctx, "google.com")
resp, err := r.QueryNameserver(
ctx, ns,
"this-surely-does-not-exist-xyz.google.com",
)
require.NoError(t, err)
assert.Equal(t, resolver.StatusNXDomain, resp.Status)
}
func TestQueryNameserver_RecordsSorted(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ns := findOneNSForDomain(t, r, ctx, "google.com")
resp, err := r.QueryNameserver(
ctx, ns, "google.com",
)
require.NoError(t, err)
for recordType, values := range resp.Records {
assert.True(
t,
sort.StringsAreSorted(values),
"%s records should be sorted", recordType,
)
}
}
func TestQueryNameserver_ResponseIncludesNameserver(
t *testing.T,
) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ns := findOneNSForDomain(t, r, ctx, "cloudflare.com")
resp, err := r.QueryNameserver(
ctx, ns, "cloudflare.com",
)
require.NoError(t, err)
assert.Equal(t, ns, resp.Nameserver)
}
func TestQueryNameserver_EmptyRecordsOnNXDomain(
t *testing.T,
) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ns := findOneNSForDomain(t, r, ctx, "google.com")
resp, err := r.QueryNameserver(
ctx, ns,
"this-surely-does-not-exist-xyz.google.com",
)
require.NoError(t, err)
totalRecords := 0
for _, values := range resp.Records {
totalRecords += len(values)
}
assert.Zero(t, totalRecords)
}
func TestQueryNameserver_TrailingDotHandling(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ns := findOneNSForDomain(t, r, ctx, "google.com")
resp1, err := r.QueryNameserver(
ctx, ns, "google.com",
)
require.NoError(t, err)
resp2, err := r.QueryNameserver(
ctx, ns, "google.com.",
)
require.NoError(t, err)
assert.Equal(t, resp1.Status, resp2.Status)
}
// ----------------------------------------------------------------
// QueryAllNameservers tests
// ----------------------------------------------------------------
func TestQueryAllNameservers_ReturnsAllNS(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
results, err := r.QueryAllNameservers(
ctx, "google.com",
)
require.NoError(t, err)
require.NotEmpty(t, results)
assert.GreaterOrEqual(t, len(results), 2)
for ns, resp := range results {
assert.Equal(t, ns, resp.Nameserver)
}
}
func TestQueryAllNameservers_AllReturnOK(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
results, err := r.QueryAllNameservers(
ctx, "google.com",
)
require.NoError(t, err)
for ns, resp := range results {
assert.Equal(
t, resolver.StatusOK, resp.Status,
"NS %s should return OK", ns,
)
}
}
func TestQueryAllNameservers_NXDomainFromAllNS(
t *testing.T,
) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
results, err := r.QueryAllNameservers(
ctx,
"this-surely-does-not-exist-xyz.google.com",
)
require.NoError(t, err)
for ns, resp := range results {
assert.Equal(
t, resolver.StatusNXDomain, resp.Status,
"NS %s should return nxdomain", ns,
)
}
}
// ----------------------------------------------------------------
// LookupNS tests
// ----------------------------------------------------------------
func TestLookupNS_ValidDomain(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
nameservers, err := r.LookupNS(ctx, "google.com")
require.NoError(t, err)
require.NotEmpty(t, nameservers)
for _, ns := range nameservers {
assert.True(t, strings.HasSuffix(ns, "."),
"NS should have trailing dot: %s", ns,
)
}
}
func TestLookupNS_Sorted(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
nameservers, err := r.LookupNS(ctx, "google.com")
require.NoError(t, err)
assert.True(t, sort.StringsAreSorted(nameservers))
}
func TestLookupNS_MatchesFindAuthoritative(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
fromLookup, err := r.LookupNS(ctx, "google.com")
require.NoError(t, err)
fromFind, err := r.FindAuthoritativeNameservers(
ctx, "google.com",
)
require.NoError(t, err)
assert.Equal(t, fromFind, fromLookup)
}
// ----------------------------------------------------------------
// ResolveIPAddresses tests
// ----------------------------------------------------------------
func TestResolveIPAddresses_ReturnsIPs(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ips, err := r.ResolveIPAddresses(ctx, "google.com")
require.NoError(t, err)
require.NotEmpty(t, ips)
for _, ip := range ips {
parsed := net.ParseIP(ip)
assert.NotNil(t, parsed,
"should be valid IP: %s", ip,
)
}
}
func TestResolveIPAddresses_Deduplicated(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ips, err := r.ResolveIPAddresses(ctx, "google.com")
require.NoError(t, err)
seen := make(map[string]bool)
for _, ip := range ips {
assert.False(t, seen[ip], "duplicate IP: %s", ip)
seen[ip] = true
}
}
func TestResolveIPAddresses_Sorted(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ips, err := r.ResolveIPAddresses(ctx, "google.com")
require.NoError(t, err)
assert.True(t, sort.StringsAreSorted(ips))
}
func TestResolveIPAddresses_NXDomainReturnsEmpty(
t *testing.T,
) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ips, err := r.ResolveIPAddresses(
ctx,
"this-surely-does-not-exist-xyz.google.com",
)
require.NoError(t, err)
assert.Empty(t, ips)
}
func TestResolveIPAddresses_CloudflareDomain(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx := testContext(t)
ips, err := r.ResolveIPAddresses(ctx, "cloudflare.com")
require.NoError(t, err)
require.NotEmpty(t, ips)
}
// ----------------------------------------------------------------
// Context cancellation tests
// ----------------------------------------------------------------
func TestFindAuthoritativeNameservers_ContextCanceled(
t *testing.T,
) {
t.Parallel()
r := newTestResolver(t)
ctx, cancel := context.WithCancel(context.Background())
cancel()
_, err := r.FindAuthoritativeNameservers(ctx, "google.com")
assert.Error(t, err)
}
func TestQueryNameserver_ContextCanceled(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx, cancel := context.WithCancel(context.Background())
cancel()
_, err := r.QueryNameserver(
ctx, "ns1.google.com.", "google.com",
)
assert.Error(t, err)
}
func TestQueryAllNameservers_ContextCanceled(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx, cancel := context.WithCancel(context.Background())
cancel()
_, err := r.QueryAllNameservers(ctx, "google.com")
assert.Error(t, err)
}
// ----------------------------------------------------------------
// Timeout tests
// ----------------------------------------------------------------
func TestQueryNameserverIP_Timeout(t *testing.T) {
t.Parallel()
log := slog.New(slog.NewTextHandler(
os.Stderr,
&slog.HandlerOptions{Level: slog.LevelDebug},
))
r := resolver.NewFromLoggerWithClient(
log, &timeoutClient{},
)
ctx, cancel := context.WithTimeout(
context.Background(), 10*time.Second,
)
t.Cleanup(cancel)
// Query any IP — the client always returns a timeout error.
resp, err := r.QueryNameserverIP(
ctx, "unreachable.test.", "192.0.2.1",
"example.com",
)
require.NoError(t, err)
assert.Equal(t, resolver.StatusTimeout, resp.Status)
assert.NotEmpty(t, resp.Error)
}
// timeoutClient simulates DNS timeout errors for testing.
type timeoutClient struct{}
func (c *timeoutClient) ExchangeContext(
_ context.Context,
_ *dns.Msg,
_ string,
) (*dns.Msg, time.Duration, error) {
return nil, 0, &net.OpError{
Op: "read",
Net: "udp",
Err: &timeoutError{},
}
}
type timeoutError struct{}
func (e *timeoutError) Error() string { return "i/o timeout" }
func (e *timeoutError) Timeout() bool { return true }
func (e *timeoutError) Temporary() bool { return true }
func TestResolveIPAddresses_ContextCanceled(t *testing.T) {
t.Parallel()
r := newTestResolver(t)
ctx, cancel := context.WithCancel(context.Background())
cancel()
_, err := r.ResolveIPAddresses(ctx, "google.com")
assert.Error(t, err)
}

View File

@@ -57,10 +57,49 @@ type HostnameState struct {
// PortState holds the monitoring state for a port.
type PortState struct {
Open bool `json:"open"`
Hostname string `json:"hostname"`
Hostnames []string `json:"hostnames"`
LastChecked time.Time `json:"lastChecked"`
}
// UnmarshalJSON implements custom unmarshaling to handle both
// the old single-hostname format and the new multi-hostname
// format for backward compatibility with existing state files.
func (ps *PortState) UnmarshalJSON(data []byte) error {
// Use an alias to prevent infinite recursion.
type portStateAlias struct {
Open bool `json:"open"`
Hostnames []string `json:"hostnames"`
LastChecked time.Time `json:"lastChecked"`
}
var alias portStateAlias
err := json.Unmarshal(data, &alias)
if err != nil {
return fmt.Errorf("unmarshaling port state: %w", err)
}
ps.Open = alias.Open
ps.Hostnames = alias.Hostnames
ps.LastChecked = alias.LastChecked
// If Hostnames is empty, try reading the old single-hostname
// format for backward compatibility.
if len(ps.Hostnames) == 0 {
var old struct {
Hostname string `json:"hostname"`
}
// Best-effort: ignore errors since the main unmarshal
// already succeeded.
if json.Unmarshal(data, &old) == nil && old.Hostname != "" {
ps.Hostnames = []string{old.Hostname}
}
}
return nil
}
// CertificateState holds TLS certificate monitoring state.
type CertificateState struct {
CommonName string `json:"commonName"`
@@ -156,8 +195,8 @@ func (s *State) Load() error {
// Save writes the current state to disk atomically.
func (s *State) Save() error {
s.mu.RLock()
defer s.mu.RUnlock()
s.mu.Lock()
defer s.mu.Unlock()
s.snapshot.LastUpdated = time.Now().UTC()
@@ -263,6 +302,27 @@ func (s *State) GetPortState(key string) (*PortState, bool) {
return ps, ok
}
// DeletePortState removes a port state entry.
func (s *State) DeletePortState(key string) {
s.mu.Lock()
defer s.mu.Unlock()
delete(s.snapshot.Ports, key)
}
// GetAllPortKeys returns all port state keys.
func (s *State) GetAllPortKeys() []string {
s.mu.RLock()
defer s.mu.RUnlock()
keys := make([]string, 0, len(s.snapshot.Ports))
for k := range s.snapshot.Ports {
keys = append(keys, k)
}
return keys
}
// SetCertificateState updates the state for a certificate.
func (s *State) SetCertificateState(
key string,

View File

@@ -0,0 +1,67 @@
package tlscheck_test
import (
"context"
"crypto/tls"
"errors"
"net"
"testing"
"time"
"sneak.berlin/go/dnswatcher/internal/tlscheck"
)
func TestCheckCertificateNoPeerCerts(t *testing.T) {
t.Parallel()
lc := &net.ListenConfig{}
ln, err := lc.Listen(
context.Background(), "tcp", "127.0.0.1:0",
)
if err != nil {
t.Fatal(err)
}
defer func() { _ = ln.Close() }()
addr, ok := ln.Addr().(*net.TCPAddr)
if !ok {
t.Fatal("unexpected address type")
}
// Accept and immediately close to cause TLS handshake failure.
go func() {
conn, err := ln.Accept()
if err != nil {
return
}
_ = conn.Close()
}()
checker := tlscheck.NewStandalone(
tlscheck.WithTimeout(2*time.Second),
tlscheck.WithTLSConfig(&tls.Config{
InsecureSkipVerify: true, //nolint:gosec // test
MinVersion: tls.VersionTLS12,
}),
tlscheck.WithPort(addr.Port),
)
_, err = checker.CheckCertificate(
context.Background(), "127.0.0.1", "localhost",
)
if err == nil {
t.Fatal("expected error when server presents no certs")
}
}
func TestErrNoPeerCertificatesIsSentinel(t *testing.T) {
t.Parallel()
err := tlscheck.ErrNoPeerCertificates
if !errors.Is(err, tlscheck.ErrNoPeerCertificates) {
t.Fatal("expected sentinel error to match")
}
}

View File

@@ -3,8 +3,12 @@ package tlscheck
import (
"context"
"crypto/tls"
"errors"
"fmt"
"log/slog"
"net"
"strconv"
"time"
"go.uber.org/fx"
@@ -12,11 +16,56 @@ import (
"sneak.berlin/go/dnswatcher/internal/logger"
)
// ErrNotImplemented indicates the TLS checker is not yet implemented.
var ErrNotImplemented = errors.New(
"tls checker not yet implemented",
const (
defaultTimeout = 10 * time.Second
defaultPort = 443
)
// ErrUnexpectedConnType indicates the connection was not a TLS
// connection.
var ErrUnexpectedConnType = errors.New(
"unexpected connection type",
)
// ErrNoPeerCertificates indicates the TLS connection had no peer
// certificates.
var ErrNoPeerCertificates = errors.New(
"no peer certificates",
)
// CertificateInfo holds information about a TLS certificate.
type CertificateInfo struct {
CommonName string
Issuer string
NotAfter time.Time
SubjectAlternativeNames []string
SerialNumber string
}
// Option configures a Checker.
type Option func(*Checker)
// WithTimeout sets the connection timeout.
func WithTimeout(d time.Duration) Option {
return func(c *Checker) {
c.timeout = d
}
}
// WithTLSConfig sets a custom TLS configuration.
func WithTLSConfig(cfg *tls.Config) Option {
return func(c *Checker) {
c.tlsConfig = cfg
}
}
// WithPort sets the TLS port to connect to.
func WithPort(port int) Option {
return func(c *Checker) {
c.port = port
}
}
// Params contains dependencies for Checker.
type Params struct {
fx.In
@@ -27,14 +76,9 @@ type Params struct {
// Checker performs TLS certificate inspection.
type Checker struct {
log *slog.Logger
}
// CertificateInfo holds information about a TLS certificate.
type CertificateInfo struct {
CommonName string
Issuer string
NotAfter time.Time
SubjectAlternativeNames []string
timeout time.Duration
tlsConfig *tls.Config
port int
}
// New creates a new TLS Checker instance.
@@ -44,15 +88,109 @@ func New(
) (*Checker, error) {
return &Checker{
log: params.Logger.Get(),
timeout: defaultTimeout,
port: defaultPort,
}, nil
}
// CheckCertificate connects to the given IP:port using SNI and
// returns certificate information.
func (c *Checker) CheckCertificate(
_ context.Context,
_ string,
_ string,
) (*CertificateInfo, error) {
return nil, ErrNotImplemented
// NewStandalone creates a Checker without fx dependencies.
func NewStandalone(opts ...Option) *Checker {
checker := &Checker{
log: slog.Default(),
timeout: defaultTimeout,
port: defaultPort,
}
for _, opt := range opts {
opt(checker)
}
return checker
}
// CheckCertificate connects to the given IP address using the
// specified SNI hostname and returns certificate information.
func (c *Checker) CheckCertificate(
ctx context.Context,
ipAddress string,
sniHostname string,
) (*CertificateInfo, error) {
target := net.JoinHostPort(
ipAddress, strconv.Itoa(c.port),
)
tlsCfg := c.buildTLSConfig(sniHostname)
dialer := &tls.Dialer{
NetDialer: &net.Dialer{Timeout: c.timeout},
Config: tlsCfg,
}
conn, err := dialer.DialContext(ctx, "tcp", target)
if err != nil {
return nil, fmt.Errorf(
"TLS dial to %s: %w", target, err,
)
}
defer func() {
closeErr := conn.Close()
if closeErr != nil {
c.log.Debug(
"closing TLS connection",
"target", target,
"error", closeErr.Error(),
)
}
}()
tlsConn, ok := conn.(*tls.Conn)
if !ok {
return nil, fmt.Errorf(
"%s: %w", target, ErrUnexpectedConnType,
)
}
return c.extractCertInfo(tlsConn)
}
func (c *Checker) buildTLSConfig(
sniHostname string,
) *tls.Config {
if c.tlsConfig != nil {
cfg := c.tlsConfig.Clone()
cfg.ServerName = sniHostname
return cfg
}
return &tls.Config{
ServerName: sniHostname,
MinVersion: tls.VersionTLS12,
}
}
func (c *Checker) extractCertInfo(
conn *tls.Conn,
) (*CertificateInfo, error) {
state := conn.ConnectionState()
if len(state.PeerCertificates) == 0 {
return nil, ErrNoPeerCertificates
}
cert := state.PeerCertificates[0]
sans := make([]string, 0, len(cert.DNSNames)+len(cert.IPAddresses))
sans = append(sans, cert.DNSNames...)
for _, ip := range cert.IPAddresses {
sans = append(sans, ip.String())
}
return &CertificateInfo{
CommonName: cert.Subject.CommonName,
Issuer: cert.Issuer.CommonName,
NotAfter: cert.NotAfter,
SubjectAlternativeNames: sans,
SerialNumber: cert.SerialNumber.String(),
}, nil
}

View File

@@ -0,0 +1,169 @@
package tlscheck_test
import (
"context"
"crypto/tls"
"net"
"net/http"
"net/http/httptest"
"testing"
"time"
"sneak.berlin/go/dnswatcher/internal/tlscheck"
)
func startTLSServer(
t *testing.T,
) (*httptest.Server, string, int) {
t.Helper()
srv := httptest.NewTLSServer(
http.HandlerFunc(
func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusOK)
},
),
)
addr, ok := srv.Listener.Addr().(*net.TCPAddr)
if !ok {
t.Fatal("unexpected address type")
}
return srv, addr.IP.String(), addr.Port
}
func TestCheckCertificateValid(t *testing.T) {
t.Parallel()
srv, ip, port := startTLSServer(t)
defer srv.Close()
checker := tlscheck.NewStandalone(
tlscheck.WithTimeout(5*time.Second),
tlscheck.WithTLSConfig(&tls.Config{
//nolint:gosec // test uses self-signed cert
InsecureSkipVerify: true,
}),
tlscheck.WithPort(port),
)
info, err := checker.CheckCertificate(
context.Background(), ip, "localhost",
)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if info == nil {
t.Fatal("expected non-nil CertificateInfo")
}
if info.NotAfter.IsZero() {
t.Error("expected non-zero NotAfter")
}
if info.SerialNumber == "" {
t.Error("expected non-empty SerialNumber")
}
}
func TestCheckCertificateConnectionRefused(t *testing.T) {
t.Parallel()
lc := &net.ListenConfig{}
ln, err := lc.Listen(
context.Background(), "tcp", "127.0.0.1:0",
)
if err != nil {
t.Fatalf("failed to listen: %v", err)
}
addr, ok := ln.Addr().(*net.TCPAddr)
if !ok {
t.Fatal("unexpected address type")
}
port := addr.Port
_ = ln.Close()
checker := tlscheck.NewStandalone(
tlscheck.WithTimeout(2*time.Second),
tlscheck.WithPort(port),
)
_, err = checker.CheckCertificate(
context.Background(), "127.0.0.1", "localhost",
)
if err == nil {
t.Fatal("expected error for connection refused")
}
}
func TestCheckCertificateContextCanceled(t *testing.T) {
t.Parallel()
ctx, cancel := context.WithCancel(context.Background())
cancel()
checker := tlscheck.NewStandalone(
tlscheck.WithTimeout(2*time.Second),
tlscheck.WithPort(1),
)
_, err := checker.CheckCertificate(
ctx, "127.0.0.1", "localhost",
)
if err == nil {
t.Fatal("expected error for canceled context")
}
}
func TestCheckCertificateTimeout(t *testing.T) {
t.Parallel()
checker := tlscheck.NewStandalone(
tlscheck.WithTimeout(1*time.Millisecond),
tlscheck.WithPort(1),
)
_, err := checker.CheckCertificate(
context.Background(),
"192.0.2.1",
"example.com",
)
if err == nil {
t.Fatal("expected error for timeout")
}
}
func TestCheckCertificateSANs(t *testing.T) {
t.Parallel()
srv, ip, port := startTLSServer(t)
defer srv.Close()
checker := tlscheck.NewStandalone(
tlscheck.WithTimeout(5*time.Second),
tlscheck.WithTLSConfig(&tls.Config{
//nolint:gosec // test uses self-signed cert
InsecureSkipVerify: true,
}),
tlscheck.WithPort(port),
)
info, err := checker.CheckCertificate(
context.Background(), ip, "localhost",
)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if info.CommonName == "" && len(info.SubjectAlternativeNames) == 0 {
t.Error("expected CN or SANs to be populated")
}
}

View File

@@ -6,6 +6,7 @@ import (
"log/slog"
"sort"
"strings"
"sync"
"time"
"go.uber.org/fx"
@@ -49,6 +50,8 @@ type Watcher struct {
notify Notifier
cancel context.CancelFunc
firstRun bool
expiryNotifiedMu sync.Mutex
expiryNotified map[string]time.Time
}
// New creates a new Watcher instance wired into the fx lifecycle.
@@ -65,16 +68,19 @@ func New(
tlsCheck: params.TLSCheck,
notify: params.Notify,
firstRun: true,
expiryNotified: make(map[string]time.Time),
}
lifecycle.Append(fx.Hook{
OnStart: func(startCtx context.Context) error {
ctx, cancel := context.WithCancel(
context.WithoutCancel(startCtx),
)
OnStart: func(_ context.Context) error {
// Use context.Background() — the fx startup context
// expires after startup completes, so deriving from it
// would cancel the watcher immediately. The watcher's
// lifetime is controlled by w.cancel in OnStop.
ctx, cancel := context.WithCancel(context.Background())
w.cancel = cancel
go w.Run(ctx)
go w.Run(ctx) //nolint:contextcheck // intentionally not derived from startCtx
return nil
},
@@ -108,6 +114,7 @@ func NewForTest(
tlsCheck: tc,
notify: n,
firstRun: true,
expiryNotified: make(map[string]time.Time),
}
}
@@ -136,9 +143,16 @@ func (w *Watcher) Run(ctx context.Context) {
return
case <-dnsTicker.C:
w.runDNSAndPortChecks(ctx)
w.runDNSChecks(ctx)
w.checkAllPorts(ctx)
w.saveState()
case <-tlsTicker.C:
// Run DNS first so TLS checks use freshly
// resolved IP addresses, not stale ones from
// a previous cycle.
w.runDNSChecks(ctx)
w.runTLSChecks(ctx)
w.saveState()
}
@@ -146,10 +160,26 @@ func (w *Watcher) Run(ctx context.Context) {
}
// RunOnce performs a single complete monitoring cycle.
// DNS checks run first so that port and TLS checks use
// freshly resolved IP addresses. Port checks run before
// TLS because TLS checks only target IPs with an open
// port 443.
func (w *Watcher) RunOnce(ctx context.Context) {
w.detectFirstRun()
w.runDNSAndPortChecks(ctx)
// Phase 1: DNS resolution must complete first so that
// subsequent checks use fresh IP addresses.
w.runDNSChecks(ctx)
// Phase 2: Port checks populate port state that TLS
// checks depend on (TLS only targets IPs where port
// 443 is open).
w.checkAllPorts(ctx)
// Phase 3: TLS checks use fresh DNS IPs and current
// port state.
w.runTLSChecks(ctx)
w.saveState()
w.firstRun = false
}
@@ -166,7 +196,11 @@ func (w *Watcher) detectFirstRun() {
}
}
func (w *Watcher) runDNSAndPortChecks(ctx context.Context) {
// runDNSChecks performs DNS resolution for all configured domains
// and hostnames, updating state with freshly resolved records.
// This must complete before port or TLS checks run so those
// checks operate on current IP addresses.
func (w *Watcher) runDNSChecks(ctx context.Context) {
for _, domain := range w.config.Domains {
w.checkDomain(ctx, domain)
}
@@ -174,8 +208,6 @@ func (w *Watcher) runDNSAndPortChecks(ctx context.Context) {
for _, hostname := range w.config.Hostnames {
w.checkHostname(ctx, hostname)
}
w.checkAllPorts(ctx)
}
func (w *Watcher) checkDomain(
@@ -206,6 +238,28 @@ func (w *Watcher) checkDomain(
Nameservers: nameservers,
LastChecked: now,
})
// Also look up A/AAAA records for the apex domain so that
// port and TLS checks (which read HostnameState) can find
// the domain's IP addresses.
records, err := w.resolver.LookupAllRecords(ctx, domain)
if err != nil {
w.log.Error(
"failed to lookup records for domain",
"domain", domain,
"error", err,
)
return
}
prevHS, hasPrevHS := w.state.GetHostnameState(domain)
if hasPrevHS && !w.firstRun {
w.detectHostnameChanges(ctx, domain, prevHS, records)
}
newState := buildHostnameState(records, now)
w.state.SetHostnameState(domain, newState)
}
func (w *Watcher) detectNSChanges(
@@ -421,24 +475,94 @@ func (w *Watcher) detectInconsistencies(
}
func (w *Watcher) checkAllPorts(ctx context.Context) {
for _, hostname := range w.config.Hostnames {
w.checkPortsForHostname(ctx, hostname)
// Phase 1: Build current IP:port → hostname associations
// from fresh DNS data.
associations := w.buildPortAssociations()
// Phase 2: Check each unique IP:port and update state
// with the full set of associated hostnames.
for key, hostnames := range associations {
ip, port := parsePortKey(key)
if port == 0 {
continue
}
for _, domain := range w.config.Domains {
w.checkPortsForHostname(ctx, domain)
}
w.checkSinglePort(ctx, ip, port, hostnames)
}
func (w *Watcher) checkPortsForHostname(
ctx context.Context,
hostname string,
) {
ips := w.collectIPs(hostname)
// Phase 3: Remove port state entries that no longer have
// any hostname referencing them.
w.cleanupStalePorts(associations)
}
// buildPortAssociations constructs a map from IP:port keys to
// the sorted set of hostnames currently resolving to that IP.
func (w *Watcher) buildPortAssociations() map[string][]string {
assoc := make(map[string]map[string]bool)
allNames := make(
[]string, 0,
len(w.config.Hostnames)+len(w.config.Domains),
)
allNames = append(allNames, w.config.Hostnames...)
allNames = append(allNames, w.config.Domains...)
for _, name := range allNames {
ips := w.collectIPs(name)
for _, ip := range ips {
for _, port := range monitoredPorts {
w.checkSinglePort(ctx, ip, port, hostname)
key := fmt.Sprintf("%s:%d", ip, port)
if assoc[key] == nil {
assoc[key] = make(map[string]bool)
}
assoc[key][name] = true
}
}
}
result := make(map[string][]string, len(assoc))
for key, set := range assoc {
hostnames := make([]string, 0, len(set))
for h := range set {
hostnames = append(hostnames, h)
}
sort.Strings(hostnames)
result[key] = hostnames
}
return result
}
// parsePortKey splits an "ip:port" key into its components.
func parsePortKey(key string) (string, int) {
lastColon := strings.LastIndex(key, ":")
if lastColon < 0 {
return key, 0
}
ip := key[:lastColon]
var p int
_, err := fmt.Sscanf(key[lastColon+1:], "%d", &p)
if err != nil {
return ip, 0
}
return ip, p
}
// cleanupStalePorts removes port state entries that are no
// longer referenced by any hostname in the current DNS data.
func (w *Watcher) cleanupStalePorts(
currentAssociations map[string][]string,
) {
for _, key := range w.state.GetAllPortKeys() {
if _, exists := currentAssociations[key]; !exists {
w.state.DeletePortState(key)
}
}
}
@@ -475,7 +599,7 @@ func (w *Watcher) checkSinglePort(
ctx context.Context,
ip string,
port int,
hostname string,
hostnames []string,
) {
result, err := w.portCheck.CheckPort(ctx, ip, port)
if err != nil {
@@ -500,8 +624,8 @@ func (w *Watcher) checkSinglePort(
}
msg := fmt.Sprintf(
"Host: %s\nAddress: %s\nPort now %s",
hostname, key, stateStr,
"Hosts: %s\nAddress: %s\nPort now %s",
strings.Join(hostnames, ", "), key, stateStr,
)
w.notify.SendNotification(
@@ -514,7 +638,7 @@ func (w *Watcher) checkSinglePort(
w.state.SetPortState(key, &state.PortState{
Open: result.Open,
Hostname: hostname,
Hostnames: hostnames,
LastChecked: now,
})
}
@@ -691,6 +815,22 @@ func (w *Watcher) checkTLSExpiry(
return
}
// Deduplicate expiry warnings: don't re-notify for the same
// hostname within the TLS check interval.
dedupKey := fmt.Sprintf("expiry:%s:%s", hostname, ip)
w.expiryNotifiedMu.Lock()
lastNotified, seen := w.expiryNotified[dedupKey]
if seen && time.Since(lastNotified) < w.config.TLSInterval {
w.expiryNotifiedMu.Unlock()
return
}
w.expiryNotified[dedupKey] = time.Now()
w.expiryNotifiedMu.Unlock()
msg := fmt.Sprintf(
"Host: %s\nIP: %s\nCN: %s\n"+
"Expires: %s (%.0f days)",

View File

@@ -273,6 +273,10 @@ func setupBaselineMocks(deps *testDeps) {
"ns1.example.com.",
"ns2.example.com.",
}
deps.resolver.allRecords["example.com"] = map[string]map[string][]string{
"ns1.example.com.": {"A": {"93.184.216.34"}},
"ns2.example.com.": {"A": {"93.184.216.34"}},
}
deps.resolver.allRecords["www.example.com"] = map[string]map[string][]string{
"ns1.example.com.": {"A": {"93.184.216.34"}},
"ns2.example.com.": {"A": {"93.184.216.34"}},
@@ -290,6 +294,14 @@ func setupBaselineMocks(deps *testDeps) {
"www.example.com",
},
}
deps.tlsChecker.certs["93.184.216.34:example.com"] = &tlscheck.CertificateInfo{
CommonName: "example.com",
Issuer: "DigiCert",
NotAfter: time.Now().Add(90 * 24 * time.Hour),
SubjectAlternativeNames: []string{
"example.com",
},
}
}
func assertNoNotifications(
@@ -322,14 +334,74 @@ func assertStatePopulated(
)
}
if len(snap.Hostnames) != 1 {
// Hostnames includes both explicit hostnames and domains
// (domains now also get hostname state for port/TLS checks).
if len(snap.Hostnames) < 1 {
t.Errorf(
"expected 1 hostname in state, got %d",
"expected at least 1 hostname in state, got %d",
len(snap.Hostnames),
)
}
}
func TestDomainPortAndTLSChecks(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Domains = []string{"example.com"}
w, deps := newTestWatcher(t, cfg)
deps.resolver.nsRecords["example.com"] = []string{
"ns1.example.com.",
}
deps.resolver.allRecords["example.com"] = map[string]map[string][]string{
"ns1.example.com.": {"A": {"93.184.216.34"}},
}
deps.portChecker.results["93.184.216.34:80"] = true
deps.portChecker.results["93.184.216.34:443"] = true
deps.tlsChecker.certs["93.184.216.34:example.com"] = &tlscheck.CertificateInfo{
CommonName: "example.com",
Issuer: "DigiCert",
NotAfter: time.Now().Add(90 * 24 * time.Hour),
SubjectAlternativeNames: []string{
"example.com",
},
}
w.RunOnce(t.Context())
snap := deps.state.GetSnapshot()
// Domain should have port state populated
if len(snap.Ports) == 0 {
t.Error("expected port state for domain, got none")
}
// Domain should have certificate state populated
if len(snap.Certificates) == 0 {
t.Error("expected certificate state for domain, got none")
}
// Verify port checker was actually called
deps.portChecker.mu.Lock()
calls := deps.portChecker.calls
deps.portChecker.mu.Unlock()
if calls == 0 {
t.Error("expected port checker to be called for domain")
}
// Verify TLS checker was actually called
deps.tlsChecker.mu.Lock()
tlsCalls := deps.tlsChecker.calls
deps.tlsChecker.mu.Unlock()
if tlsCalls == 0 {
t.Error("expected TLS checker to be called for domain")
}
}
func TestNSChangeDetection(t *testing.T) {
t.Parallel()
@@ -342,6 +414,12 @@ func TestNSChangeDetection(t *testing.T) {
"ns1.example.com.",
"ns2.example.com.",
}
deps.resolver.allRecords["example.com"] = map[string]map[string][]string{
"ns1.example.com.": {"A": {"1.2.3.4"}},
"ns2.example.com.": {"A": {"1.2.3.4"}},
}
deps.portChecker.results["1.2.3.4:80"] = false
deps.portChecker.results["1.2.3.4:443"] = false
ctx := t.Context()
w.RunOnce(ctx)
@@ -351,6 +429,10 @@ func TestNSChangeDetection(t *testing.T) {
"ns1.example.com.",
"ns3.example.com.",
}
deps.resolver.allRecords["example.com"] = map[string]map[string][]string{
"ns1.example.com.": {"A": {"1.2.3.4"}},
"ns3.example.com.": {"A": {"1.2.3.4"}},
}
deps.resolver.mu.Unlock()
w.RunOnce(ctx)
@@ -506,6 +588,61 @@ func TestTLSExpiryWarning(t *testing.T) {
}
}
func TestTLSExpiryWarningDedup(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Hostnames = []string{"www.example.com"}
cfg.TLSInterval = 24 * time.Hour
w, deps := newTestWatcher(t, cfg)
deps.resolver.allRecords["www.example.com"] = map[string]map[string][]string{
"ns1.example.com.": {"A": {"1.2.3.4"}},
}
deps.resolver.ipAddresses["www.example.com"] = []string{
"1.2.3.4",
}
deps.portChecker.results["1.2.3.4:80"] = true
deps.portChecker.results["1.2.3.4:443"] = true
deps.tlsChecker.certs["1.2.3.4:www.example.com"] = &tlscheck.CertificateInfo{
CommonName: "www.example.com",
Issuer: "DigiCert",
NotAfter: time.Now().Add(3 * 24 * time.Hour),
SubjectAlternativeNames: []string{
"www.example.com",
},
}
ctx := t.Context()
// First run = baseline, no notifications
w.RunOnce(ctx)
// Second run should fire one expiry warning
w.RunOnce(ctx)
// Third run should NOT fire another warning (dedup)
w.RunOnce(ctx)
notifications := deps.notifier.getNotifications()
expiryCount := 0
for _, n := range notifications {
if n.Title == "TLS Expiry Warning: www.example.com" {
expiryCount++
}
}
if expiryCount != 1 {
t.Errorf(
"expected exactly 1 expiry warning (dedup), got %d",
expiryCount,
)
}
}
func TestGracefulShutdown(t *testing.T) {
t.Parallel()
@@ -519,6 +656,11 @@ func TestGracefulShutdown(t *testing.T) {
deps.resolver.nsRecords["example.com"] = []string{
"ns1.example.com.",
}
deps.resolver.allRecords["example.com"] = map[string]map[string][]string{
"ns1.example.com.": {"A": {"1.2.3.4"}},
}
deps.portChecker.results["1.2.3.4:80"] = false
deps.portChecker.results["1.2.3.4:443"] = false
ctx, cancel := context.WithCancel(t.Context())
@@ -540,6 +682,80 @@ func TestGracefulShutdown(t *testing.T) {
}
}
func setupHostnameIP(
deps *testDeps,
hostname, ip string,
) {
deps.resolver.allRecords[hostname] = map[string]map[string][]string{
"ns1.example.com.": {"A": {ip}},
}
deps.portChecker.results[ip+":80"] = true
deps.portChecker.results[ip+":443"] = true
deps.tlsChecker.certs[ip+":"+hostname] = &tlscheck.CertificateInfo{
CommonName: hostname,
Issuer: "DigiCert",
NotAfter: time.Now().Add(90 * 24 * time.Hour),
SubjectAlternativeNames: []string{hostname},
}
}
func updateHostnameIP(deps *testDeps, hostname, ip string) {
deps.resolver.mu.Lock()
deps.resolver.allRecords[hostname] = map[string]map[string][]string{
"ns1.example.com.": {"A": {ip}},
}
deps.resolver.mu.Unlock()
deps.portChecker.mu.Lock()
deps.portChecker.results[ip+":80"] = true
deps.portChecker.results[ip+":443"] = true
deps.portChecker.mu.Unlock()
deps.tlsChecker.mu.Lock()
deps.tlsChecker.certs[ip+":"+hostname] = &tlscheck.CertificateInfo{
CommonName: hostname,
Issuer: "DigiCert",
NotAfter: time.Now().Add(90 * 24 * time.Hour),
SubjectAlternativeNames: []string{hostname},
}
deps.tlsChecker.mu.Unlock()
}
func TestDNSRunsBeforePortAndTLSChecks(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Hostnames = []string{"www.example.com"}
w, deps := newTestWatcher(t, cfg)
setupHostnameIP(deps, "www.example.com", "10.0.0.1")
ctx := t.Context()
w.RunOnce(ctx)
snap := deps.state.GetSnapshot()
if _, ok := snap.Ports["10.0.0.1:80"]; !ok {
t.Fatal("expected port state for 10.0.0.1:80")
}
// DNS changes to a new IP; port and TLS must pick it up.
updateHostnameIP(deps, "www.example.com", "10.0.0.2")
w.RunOnce(ctx)
snap = deps.state.GetSnapshot()
if _, ok := snap.Ports["10.0.0.2:80"]; !ok {
t.Error("port check used stale DNS: missing 10.0.0.2:80")
}
certKey := "10.0.0.2:443:www.example.com"
if _, ok := snap.Certificates[certKey]; !ok {
t.Error("TLS check used stale DNS: missing " + certKey)
}
}
func TestNSFailureAndRecovery(t *testing.T) {
t.Parallel()