4 Commits

34 changed files with 255 additions and 5236 deletions

View File

@@ -1,6 +1,6 @@
.git/
bin/
*.md
LICENSE
.editorconfig
.gitignore
.git
bin
data
.env
.DS_Store
*.exe

View File

@@ -8,5 +8,8 @@ charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
[*.go]
indent_style = tab
[Makefile]
indent_style = tab

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 sneak
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -1,8 +1,9 @@
.PHONY: all build lint fmt fmt-check test check clean hooks docker
.PHONY: all build lint fmt fmt-check test check clean docker hooks
BINARY := dnswatcher
VERSION := $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
LDFLAGS := -X main.Version=$(VERSION)
BUILDARCH := $(shell go env GOARCH)
LDFLAGS := -X main.Version=$(VERSION) -X main.Buildarch=$(BUILDARCH)
all: check build
@@ -17,32 +18,25 @@ fmt:
goimports -w .
fmt-check:
@test -z "$$(gofmt -l .)" || (echo "gofmt: files not formatted:" && gofmt -l . && exit 1)
@test -z "$$(gofmt -l .)" || (echo "Files not formatted:" && gofmt -l . && exit 1)
test:
go test -v -race -timeout 30s -cover ./...
go test -v -race -cover -timeout 30s ./...
# Check runs all validation without making changes
# Used by CI and Docker build - fails if anything is wrong
check:
@echo "==> Checking formatting..."
@test -z "$$(gofmt -l .)" || (echo "Files not formatted:" && gofmt -l . && exit 1)
@echo "==> Running linter..."
golangci-lint run --config .golangci.yml ./...
@echo "==> Running tests..."
go test -v -race -timeout 30s ./...
check: fmt-check lint test
@echo "==> Building..."
go build -ldflags "$(LDFLAGS)" -o /dev/null ./cmd/dnswatcher
@echo "==> All checks passed!"
clean:
rm -rf bin/
docker:
docker build .
hooks:
@echo '#!/bin/sh' > .git/hooks/pre-commit
@echo 'make check' >> .git/hooks/pre-commit
@printf '#!/bin/sh\nset -e\nmake check\n' > .git/hooks/pre-commit
@chmod +x .git/hooks/pre-commit
@echo "Pre-commit hook installed."
docker:
docker build .
clean:
rm -rf bin/

View File

@@ -1,10 +1,9 @@
# dnswatcher
dnswatcher is a pre-1.0 Go daemon by [@sneak](https://sneak.berlin) that monitors DNS records, TCP port availability, and TLS certificates, delivering real-time change notifications via Slack, Mattermost, and ntfy webhooks.
> ⚠️ Pre-1.0 software. APIs, configuration, and behavior may change without notice.
dnswatcher watches configured DNS domains and hostnames for changes, monitors TCP
dnswatcher is a production DNS and infrastructure monitoring daemon written in
Go. It watches configured DNS domains and hostnames for changes, monitors TCP
port availability, tracks TLS certificate expiry, and delivers real-time
notifications via Slack, Mattermost, and/or ntfy webhooks.
@@ -52,6 +51,10 @@ without requiring an external database.
responding again.
- **Inconsistency detected**: Two nameservers that previously agreed
now return different record sets for the same hostname.
- **Inconsistency resolved**: Nameservers that previously disagreed
are now back in agreement.
- **Empty response**: A nameserver that previously returned records
now returns an authoritative empty response (NODATA/NXDOMAIN).
### TCP Port Monitoring
@@ -106,8 +109,8 @@ includes:
- **NS recoveries**: Which nameserver recovered, which hostname/domain.
- **NS inconsistencies**: Which nameservers disagree, what each one
returned, which hostname affected.
- **Port changes**: Which IP:port, old state, new state, all associated
hostnames.
- **Port changes**: Which IP:port, old state, new state, associated
hostname.
- **TLS expiry warnings**: Which certificate, days remaining, CN,
issuer, associated hostname and IP.
- **TLS certificate changes**: Old and new CN/issuer/SANs, associated
@@ -124,42 +127,16 @@ includes:
- State is written atomically (write to temp file, then rename) to prevent
corruption.
### Web Dashboard
dnswatcher includes an unauthenticated, read-only web dashboard at the
root URL (`/`). It displays:
- **Summary counts** for monitored domains, hostnames, ports, and
certificates.
- **Domains** with their discovered nameservers.
- **Hostnames** with per-nameserver DNS records and status.
- **Ports** with open/closed state and associated hostnames.
- **TLS certificates** with CN, issuer, expiry, and status.
- **Recent alerts** (last 100 notifications sent since the process
started), displayed in reverse chronological order.
Every data point shows its age (e.g. "5m ago") so you can tell at a
glance how fresh the information is. The page auto-refreshes every 30
seconds.
The dashboard intentionally does not expose any configuration details
such as webhook URLs, notification endpoints, or API tokens.
All assets (CSS) are embedded in the binary and served from the
application itself. The dashboard makes zero external HTTP requests —
no CDN dependencies or third-party resources are loaded at runtime.
### HTTP API
dnswatcher exposes a lightweight HTTP API for operational visibility:
| Endpoint | Description |
|---------------------------------------|--------------------------------|
| `GET /` | Web dashboard (HTML) |
| `GET /s/...` | Static assets (embedded CSS) |
| `GET /.well-known/healthcheck` | Health check (JSON) |
| `GET /health` | Health check (JSON, legacy) |
| `GET /health` | Health check (JSON) |
| `GET /api/v1/status` | Current monitoring state |
| `GET /api/v1/domains` | Configured domains and status |
| `GET /api/v1/hostnames` | Configured hostnames and status|
| `GET /metrics` | Prometheus metrics (optional) |
---
@@ -171,7 +148,7 @@ cmd/dnswatcher/main.go Entry point (uber/fx bootstrap)
internal/
config/config.go Viper-based configuration
globals/globals.go Build-time variables (version)
globals/globals.go Build-time variables (version, arch)
logger/logger.go slog structured logging (TTY detection)
healthcheck/healthcheck.go Health check service
middleware/middleware.go HTTP middleware (logging, CORS, metrics auth)
@@ -231,13 +208,6 @@ the following precedence (highest to lowest):
| `DNSWATCHER_MAINTENANCE_MODE` | Enable maintenance mode | `false` |
| `DNSWATCHER_METRICS_USERNAME` | Basic auth username for /metrics | `""` |
| `DNSWATCHER_METRICS_PASSWORD` | Basic auth password for /metrics | `""` |
| `DNSWATCHER_SEND_TEST_NOTIFICATION` | Send a test notification after first scan completes | `false` |
**`DNSWATCHER_TARGETS` is required.** dnswatcher will refuse to start if no
monitoring targets are configured. A monitoring daemon with nothing to monitor
is a misconfiguration, so dnswatcher fails fast with a clear error message
rather than running silently. Set `DNSWATCHER_TARGETS` to a comma-separated
list of DNS names before starting.
### Example `.env`
@@ -249,7 +219,6 @@ DNSWATCHER_TARGETS=example.com,example.org,www.example.com,api.example.com,mail.
DNSWATCHER_SLACK_WEBHOOK=https://hooks.slack.com/services/T.../B.../xxx
DNSWATCHER_MATTERMOST_WEBHOOK=https://mattermost.example.com/hooks/xxx
DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-dns-alerts
DNSWATCHER_SEND_TEST_NOTIFICATION=true
```
---
@@ -320,12 +289,12 @@ not as a merged view, to enable inconsistency detection.
"ports": {
"93.184.216.34:80": {
"open": true,
"hostnames": ["www.example.com"],
"hostname": "www.example.com",
"lastChecked": "2026-02-19T12:00:00Z"
},
"93.184.216.34:443": {
"open": true,
"hostnames": ["www.example.com"],
"hostname": "www.example.com",
"lastChecked": "2026-02-19T12:00:00Z"
}
},
@@ -349,6 +318,8 @@ tracks reachability:
|-------------|-------------------------------------------------|
| `ok` | Query succeeded, records are current |
| `error` | Query failed (timeout, SERVFAIL, network error) |
| `nxdomain` | Authoritative NXDOMAIN response |
| `nodata` | Authoritative empty response (NODATA) |
---
@@ -356,19 +327,23 @@ tracks reachability:
```sh
make build # Build binary to bin/dnswatcher
make test # Run tests with race detector
make test # Run tests with race detector and 30s timeout
make lint # Run golangci-lint
make fmt # Format code
make check # Run all checks (format, lint, test, build)
make fmt # Format code (writes)
make fmt-check # Read-only format check
make check # Run all checks (fmt-check, lint, test, build)
make docker # Build Docker image
make hooks # Install pre-commit hook
make clean # Remove build artifacts
```
### Build-Time Variables
Version is injected via `-ldflags`:
Version and architecture are injected via `-ldflags`:
```sh
go build -ldflags "-X main.Version=$(git describe --tags --always)" ./cmd/dnswatcher
go build -ldflags "-X main.Version=$(git describe --tags --always) \
-X main.Buildarch=$(go env GOARCH)" ./cmd/dnswatcher
```
---
@@ -382,7 +357,6 @@ docker run -d \
-v dnswatcher-data:/var/lib/dnswatcher \
-e DNSWATCHER_TARGETS=example.com,www.example.com \
-e DNSWATCHER_NTFY_TOPIC=https://ntfy.sh/my-alerts \
-e DNSWATCHER_SEND_TEST_NOTIFICATION=true \
dnswatcher
```
@@ -395,15 +369,9 @@ docker run -d \
triggering change notifications).
2. **Initial check**: Immediately perform all DNS, port, and TLS checks
on startup.
3. **Periodic checks** (DNS always runs first):
- DNS checks: every `DNSWATCHER_DNS_INTERVAL` (default 1h). Also
re-run before every TLS check cycle to ensure fresh IPs.
- Port checks: every `DNSWATCHER_DNS_INTERVAL`, after DNS completes.
- TLS checks: every `DNSWATCHER_TLS_INTERVAL` (default 12h), after
DNS completes.
- Port and TLS checks always use freshly resolved IP addresses from
the DNS phase that immediately precedes them — never stale IPs
from a previous cycle.
3. **Periodic checks**:
- DNS and port checks: every `DNSWATCHER_DNS_INTERVAL` (default 1h).
- TLS checks: every `DNSWATCHER_TLS_INTERVAL` (default 12h).
4. **On change detection**: Send notifications to all configured
endpoints, update in-memory state, persist to disk.
5. **Shutdown**: Persist final state to disk, complete in-flight
@@ -429,8 +397,7 @@ Viper for configuration.
## License
License has not yet been chosen for this project. Pending decision by the
author (MIT, GPL, or WTFPL).
MIT — see [LICENSE](LICENSE).
## Author

View File

@@ -27,11 +27,13 @@ import (
var (
Appname = "dnswatcher"
Version string
Buildarch string
)
func main() {
globals.SetAppname(Appname)
globals.SetVersion(Version)
globals.SetBuildarch(Buildarch)
fx.New(
fx.Provide(

View File

@@ -23,11 +23,6 @@ const (
defaultTLSExpiryWarning = 7
)
// ErrNoTargets is returned when no monitoring targets are configured.
var ErrNoTargets = errors.New(
"no monitoring targets configured: set DNSWATCHER_TARGETS environment variable",
)
// Params contains dependencies for Config.
type Params struct {
fx.In
@@ -53,7 +48,6 @@ type Config struct {
MaintenanceMode bool
MetricsUsername string
MetricsPassword string
SendTestNotification bool
params *Params
log *slog.Logger
}
@@ -106,7 +100,6 @@ func setupViper(name string) {
viper.SetDefault("MAINTENANCE_MODE", false)
viper.SetDefault("METRICS_USERNAME", "")
viper.SetDefault("METRICS_PASSWORD", "")
viper.SetDefault("SEND_TEST_NOTIFICATION", false)
}
func buildConfig(
@@ -139,9 +132,11 @@ func buildConfig(
tlsInterval = defaultTLSInterval
}
domains, hostnames, err := parseAndValidateTargets()
domains, hostnames, err := ClassifyTargets(
parseCSV(viper.GetString("TARGETS")),
)
if err != nil {
return nil, err
return nil, fmt.Errorf("invalid targets configuration: %w", err)
}
cfg := &Config{
@@ -160,7 +155,6 @@ func buildConfig(
MaintenanceMode: viper.GetBool("MAINTENANCE_MODE"),
MetricsUsername: viper.GetString("METRICS_USERNAME"),
MetricsPassword: viper.GetString("METRICS_PASSWORD"),
SendTestNotification: viper.GetBool("SEND_TEST_NOTIFICATION"),
params: params,
log: log,
}
@@ -168,23 +162,6 @@ func buildConfig(
return cfg, nil
}
func parseAndValidateTargets() ([]string, []string, error) {
domains, hostnames, err := ClassifyTargets(
parseCSV(viper.GetString("TARGETS")),
)
if err != nil {
return nil, nil, fmt.Errorf(
"invalid targets configuration: %w", err,
)
}
if len(domains) == 0 && len(hostnames) == 0 {
return nil, nil, ErrNoTargets
}
return domains, hostnames, nil
}
func parseCSV(input string) []string {
if input == "" {
return nil

View File

@@ -1,262 +0,0 @@
package config_test
import (
"testing"
"time"
"github.com/spf13/viper"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/dnswatcher/internal/config"
"sneak.berlin/go/dnswatcher/internal/globals"
"sneak.berlin/go/dnswatcher/internal/logger"
)
// newTestParams creates config.Params suitable for testing
// without requiring the fx dependency injection framework.
func newTestParams(t *testing.T) config.Params {
t.Helper()
g := &globals.Globals{
Appname: "dnswatcher",
Version: "test",
}
l, err := logger.New(nil, logger.Params{Globals: g})
require.NoError(t, err, "failed to create logger")
return config.Params{
Globals: g,
Logger: l,
}
}
// These tests exercise viper global state and MUST NOT use
// t.Parallel(). Each test resets viper for isolation.
func TestNew_DefaultValues(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com,www.example.com")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 8080, cfg.Port)
assert.False(t, cfg.Debug)
assert.Equal(t, "./data", cfg.DataDir)
assert.Equal(t, time.Hour, cfg.DNSInterval)
assert.Equal(t, 12*time.Hour, cfg.TLSInterval)
assert.Equal(t, 7, cfg.TLSExpiryWarning)
assert.False(t, cfg.MaintenanceMode)
assert.Empty(t, cfg.SlackWebhook)
assert.Empty(t, cfg.MattermostWebhook)
assert.Empty(t, cfg.NtfyTopic)
assert.Empty(t, cfg.SentryDSN)
assert.Empty(t, cfg.MetricsUsername)
assert.Empty(t, cfg.MetricsPassword)
assert.False(t, cfg.SendTestNotification)
}
func TestNew_EnvironmentOverrides(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("PORT", "9090")
t.Setenv("DNSWATCHER_DEBUG", "true")
t.Setenv("DNSWATCHER_DATA_DIR", "/tmp/test-data")
t.Setenv("DNSWATCHER_DNS_INTERVAL", "30m")
t.Setenv("DNSWATCHER_TLS_INTERVAL", "6h")
t.Setenv("DNSWATCHER_TLS_EXPIRY_WARNING", "14")
t.Setenv("DNSWATCHER_SLACK_WEBHOOK", "https://hooks.slack.com/t")
t.Setenv("DNSWATCHER_MATTERMOST_WEBHOOK", "https://mm.test/hooks/t")
t.Setenv("DNSWATCHER_NTFY_TOPIC", "https://ntfy.sh/test")
t.Setenv("DNSWATCHER_SENTRY_DSN", "https://sentry.test/1")
t.Setenv("DNSWATCHER_MAINTENANCE_MODE", "true")
t.Setenv("DNSWATCHER_METRICS_USERNAME", "admin")
t.Setenv("DNSWATCHER_METRICS_PASSWORD", "secret")
t.Setenv("DNSWATCHER_SEND_TEST_NOTIFICATION", "true")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 9090, cfg.Port)
assert.True(t, cfg.Debug)
assert.Equal(t, "/tmp/test-data", cfg.DataDir)
assert.Equal(t, 30*time.Minute, cfg.DNSInterval)
assert.Equal(t, 6*time.Hour, cfg.TLSInterval)
assert.Equal(t, 14, cfg.TLSExpiryWarning)
assert.Equal(t, "https://hooks.slack.com/t", cfg.SlackWebhook)
assert.Equal(t, "https://mm.test/hooks/t", cfg.MattermostWebhook)
assert.Equal(t, "https://ntfy.sh/test", cfg.NtfyTopic)
assert.Equal(t, "https://sentry.test/1", cfg.SentryDSN)
assert.True(t, cfg.MaintenanceMode)
assert.Equal(t, "admin", cfg.MetricsUsername)
assert.Equal(t, "secret", cfg.MetricsPassword)
assert.True(t, cfg.SendTestNotification)
}
func TestNew_NoTargetsError(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "")
_, err := config.New(nil, newTestParams(t))
require.Error(t, err)
assert.ErrorIs(t, err, config.ErrNoTargets)
}
func TestNew_OnlyEmptyCSVSegments(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", " , , ")
_, err := config.New(nil, newTestParams(t))
require.Error(t, err)
assert.ErrorIs(t, err, config.ErrNoTargets)
}
func TestNew_InvalidDNSInterval_FallsBackToDefault(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DNS_INTERVAL", "banana")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, time.Hour, cfg.DNSInterval,
"invalid DNS interval should fall back to 1h default")
}
func TestNew_InvalidTLSInterval_FallsBackToDefault(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_TLS_INTERVAL", "notaduration")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 12*time.Hour, cfg.TLSInterval,
"invalid TLS interval should fall back to 12h default")
}
func TestNew_BothIntervalsInvalid(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DNS_INTERVAL", "xyz")
t.Setenv("DNSWATCHER_TLS_INTERVAL", "abc")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, time.Hour, cfg.DNSInterval)
assert.Equal(t, 12*time.Hour, cfg.TLSInterval)
}
func TestNew_DebugEnablesDebugLogging(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DEBUG", "true")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.True(t, cfg.Debug)
}
func TestNew_PortEnvNotPrefixed(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("PORT", "3000")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 3000, cfg.Port,
"PORT env should work without DNSWATCHER_ prefix")
}
func TestNew_TargetClassification(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS",
"example.com,www.example.com,api.example.com,example.org")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
// example.com and example.org are apex domains
assert.Len(t, cfg.Domains, 2)
// www.example.com and api.example.com are hostnames
assert.Len(t, cfg.Hostnames, 2)
}
func TestNew_InvalidTargetPublicSuffix(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "co.uk")
_, err := config.New(nil, newTestParams(t))
require.Error(t, err, "public suffix should be rejected")
}
func TestNew_EmptyAppnameDefaultsToDnswatcher(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
g := &globals.Globals{Appname: "", Version: "test"}
l, err := logger.New(nil, logger.Params{Globals: g})
require.NoError(t, err)
cfg, err := config.New(
nil, config.Params{Globals: g, Logger: l},
)
require.NoError(t, err)
assert.Equal(t, 8080, cfg.Port,
"defaults should load when appname is empty")
}
func TestNew_TargetsWithWhitespace(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", " example.com , www.example.com ")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 2, len(cfg.Domains)+len(cfg.Hostnames),
"whitespace around targets should be trimmed")
}
func TestNew_TargetsWithTrailingComma(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com,www.example.com,")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 2, len(cfg.Domains)+len(cfg.Hostnames),
"trailing comma should be ignored")
}
func TestNew_CustomDNSIntervalDuration(t *testing.T) {
viper.Reset()
t.Setenv("DNSWATCHER_TARGETS", "example.com")
t.Setenv("DNSWATCHER_DNS_INTERVAL", "5s")
cfg, err := config.New(nil, newTestParams(t))
require.NoError(t, err)
assert.Equal(t, 5*time.Second, cfg.DNSInterval)
}
func TestStatePath(t *testing.T) {
t.Parallel()
tests := []struct {
name string
dataDir string
want string
}{
{"default", "./data", "./data/state.json"},
{"absolute", "/var/lib/dw", "/var/lib/dw/state.json"},
{"nested", "/opt/app/data", "/opt/app/data/state.json"},
{"empty", "", "/state.json"},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
cfg := &config.Config{DataDir: tt.dataDir}
assert.Equal(t, tt.want, cfg.StatePath())
})
}
}

View File

@@ -1,6 +0,0 @@
package config
// ParseCSVForTest exports parseCSV for use in external tests.
func ParseCSVForTest(input string) []string {
return parseCSV(input)
}

View File

@@ -1,44 +0,0 @@
package config_test
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"sneak.berlin/go/dnswatcher/internal/config"
)
func TestParseCSV(t *testing.T) {
t.Parallel()
tests := []struct {
name string
input string
want []string
}{
{"empty string", "", nil},
{"single value", "a", []string{"a"}},
{"multiple values", "a,b,c", []string{"a", "b", "c"}},
{"whitespace trimmed", " a , b ", []string{"a", "b"}},
{"trailing comma", "a,b,", []string{"a", "b"}},
{"leading comma", ",a,b", []string{"a", "b"}},
{"consecutive commas", "a,,b", []string{"a", "b"}},
{"all empty segments", ",,,", nil},
{"whitespace only", " , , ", nil},
{"tabs", "\ta\t,\tb\t", []string{"a", "b"}},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
got := config.ParseCSVForTest(tt.input)
require.Len(t, got, len(tt.want))
for i, w := range tt.want {
assert.Equal(t, w, got[i])
}
})
}
}

View File

@@ -15,12 +15,14 @@ var (
mu sync.RWMutex
appname string
version string
buildarch string
)
// Globals holds build-time variables for dependency injection.
type Globals struct {
Appname string
Version string
Buildarch string
}
// New creates a new Globals instance from package-level variables.
@@ -31,6 +33,7 @@ func New(_ fx.Lifecycle) (*Globals, error) {
return &Globals{
Appname: appname,
Version: version,
Buildarch: buildarch,
}, nil
}
@@ -49,3 +52,11 @@ func SetVersion(ver string) {
version = ver
}
// SetBuildarch sets the build architecture.
func SetBuildarch(arch string) {
mu.Lock()
defer mu.Unlock()
buildarch = arch
}

View File

@@ -1,151 +0,0 @@
package handlers
import (
"embed"
"fmt"
"html/template"
"math"
"net/http"
"strings"
"time"
"sneak.berlin/go/dnswatcher/internal/notify"
"sneak.berlin/go/dnswatcher/internal/state"
)
//go:embed templates/dashboard.html
var dashboardFS embed.FS
// Time unit constants for relative time calculations.
const (
secondsPerMinute = 60
minutesPerHour = 60
hoursPerDay = 24
)
// newDashboardTemplate parses the embedded dashboard HTML
// template with helper functions.
func newDashboardTemplate() *template.Template {
funcs := template.FuncMap{
"relTime": relTime,
"joinStrings": joinStrings,
"formatRecords": formatRecords,
"expiryDays": expiryDays,
}
return template.Must(
template.New("dashboard.html").
Funcs(funcs).
ParseFS(dashboardFS, "templates/dashboard.html"),
)
}
// dashboardData is the data passed to the dashboard template.
type dashboardData struct {
Snapshot state.Snapshot
Alerts []notify.AlertEntry
StateAge string
GeneratedAt string
}
// HandleDashboard returns the dashboard page handler.
func (h *Handlers) HandleDashboard() http.HandlerFunc {
tmpl := newDashboardTemplate()
return func(
writer http.ResponseWriter,
_ *http.Request,
) {
snap := h.state.GetSnapshot()
alerts := h.notifyHistory.Recent()
data := dashboardData{
Snapshot: snap,
Alerts: alerts,
StateAge: relTime(snap.LastUpdated),
GeneratedAt: time.Now().UTC().Format("2006-01-02 15:04:05"),
}
writer.Header().Set(
"Content-Type", "text/html; charset=utf-8",
)
err := tmpl.Execute(writer, data)
if err != nil {
h.log.Error(
"dashboard template error",
"error", err,
)
}
}
}
// relTime returns a human-readable relative time string such
// as "2 minutes ago" or "never" for zero times.
func relTime(t time.Time) string {
if t.IsZero() {
return "never"
}
d := time.Since(t)
if d < 0 {
return "just now"
}
seconds := int(math.Round(d.Seconds()))
if seconds < secondsPerMinute {
return fmt.Sprintf("%ds ago", seconds)
}
minutes := seconds / secondsPerMinute
if minutes < minutesPerHour {
return fmt.Sprintf("%dm ago", minutes)
}
hours := minutes / minutesPerHour
if hours < hoursPerDay {
return fmt.Sprintf(
"%dh %dm ago", hours, minutes%minutesPerHour,
)
}
days := hours / hoursPerDay
return fmt.Sprintf(
"%dd %dh ago", days, hours%hoursPerDay,
)
}
// joinStrings joins a string slice with a separator.
func joinStrings(items []string, sep string) string {
return strings.Join(items, sep)
}
// formatRecords formats a map of record type → values into a
// compact display string.
func formatRecords(records map[string][]string) string {
if len(records) == 0 {
return "-"
}
var parts []string
for rtype, values := range records {
for _, v := range values {
parts = append(parts, rtype+": "+v)
}
}
return strings.Join(parts, ", ")
}
// expiryDays returns the number of days until the given time,
// rounded down. Returns 0 if already expired.
func expiryDays(t time.Time) int {
d := time.Until(t).Hours() / hoursPerDay
if d < 0 {
return 0
}
return int(d)
}

View File

@@ -1,80 +0,0 @@
package handlers_test
import (
"testing"
"time"
"sneak.berlin/go/dnswatcher/internal/handlers"
)
func TestRelTime(t *testing.T) {
t.Parallel()
tests := []struct {
name string
dur time.Duration
want string
}{
{"zero", 0, "never"},
{"seconds", 30 * time.Second, "30s ago"},
{"minutes", 5 * time.Minute, "5m ago"},
{"hours", 2*time.Hour + 15*time.Minute, "2h 15m ago"},
{"days", 48*time.Hour + 3*time.Hour, "2d 3h ago"},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
var input time.Time
if tt.dur > 0 {
input = time.Now().Add(-tt.dur)
}
got := handlers.RelTime(input)
if got != tt.want {
t.Errorf(
"RelTime(%v) = %q, want %q",
tt.dur, got, tt.want,
)
}
})
}
}
func TestExpiryDays(t *testing.T) {
t.Parallel()
// 10 days from now.
future := time.Now().Add(10 * 24 * time.Hour)
days := handlers.ExpiryDays(future)
if days < 9 || days > 10 {
t.Errorf("expected ~10 days, got %d", days)
}
// Already expired.
past := time.Now().Add(-24 * time.Hour)
days = handlers.ExpiryDays(past)
if days != 0 {
t.Errorf("expected 0 for expired, got %d", days)
}
}
func TestFormatRecords(t *testing.T) {
t.Parallel()
got := handlers.FormatRecords(nil)
if got != "-" {
t.Errorf("expected -, got %q", got)
}
got = handlers.FormatRecords(map[string][]string{
"A": {"1.2.3.4"},
})
if got != "A: 1.2.3.4" {
t.Errorf("unexpected format: %q", got)
}
}

View File

@@ -1,18 +0,0 @@
package handlers
import "time"
// RelTime exports relTime for testing.
func RelTime(t time.Time) string {
return relTime(t)
}
// ExpiryDays exports expiryDays for testing.
func ExpiryDays(t time.Time) int {
return expiryDays(t)
}
// FormatRecords exports formatRecords for testing.
func FormatRecords(records map[string][]string) string {
return formatRecords(records)
}

View File

@@ -11,8 +11,6 @@ import (
"sneak.berlin/go/dnswatcher/internal/globals"
"sneak.berlin/go/dnswatcher/internal/healthcheck"
"sneak.berlin/go/dnswatcher/internal/logger"
"sneak.berlin/go/dnswatcher/internal/notify"
"sneak.berlin/go/dnswatcher/internal/state"
)
// Params contains dependencies for Handlers.
@@ -22,8 +20,6 @@ type Params struct {
Logger *logger.Logger
Globals *globals.Globals
Healthcheck *healthcheck.Healthcheck
State *state.State
Notify *notify.Service
}
// Handlers provides HTTP request handlers.
@@ -32,8 +28,6 @@ type Handlers struct {
params *Params
globals *globals.Globals
hc *healthcheck.Healthcheck
state *state.State
notifyHistory *notify.AlertHistory
}
// New creates a new Handlers instance.
@@ -43,8 +37,6 @@ func New(_ fx.Lifecycle, params Params) (*Handlers, error) {
params: &params,
globals: params.Globals,
hc: params.Healthcheck,
state: params.State,
notifyHistory: params.Notify.History(),
}, nil
}

View File

@@ -2,217 +2,22 @@ package handlers
import (
"net/http"
"sort"
"time"
"sneak.berlin/go/dnswatcher/internal/state"
)
// statusDomainInfo holds status information for a monitored domain.
type statusDomainInfo struct {
Nameservers []string `json:"nameservers"`
LastChecked time.Time `json:"lastChecked"`
}
// statusHostnameNSInfo holds per-nameserver status for a hostname.
type statusHostnameNSInfo struct {
Records map[string][]string `json:"records"`
Status string `json:"status"`
LastChecked time.Time `json:"lastChecked"`
}
// statusHostnameInfo holds status information for a monitored hostname.
type statusHostnameInfo struct {
Nameservers map[string]*statusHostnameNSInfo `json:"nameservers"`
LastChecked time.Time `json:"lastChecked"`
}
// statusPortInfo holds status information for a monitored port.
type statusPortInfo struct {
Open bool `json:"open"`
Hostnames []string `json:"hostnames"`
LastChecked time.Time `json:"lastChecked"`
}
// statusCertificateInfo holds status information for a TLS certificate.
type statusCertificateInfo struct {
CommonName string `json:"commonName"`
Issuer string `json:"issuer"`
NotAfter time.Time `json:"notAfter"`
SubjectAlternativeNames []string `json:"subjectAlternativeNames"`
Status string `json:"status"`
LastChecked time.Time `json:"lastChecked"`
}
// statusCounts holds summary counts of monitored resources.
type statusCounts struct {
Domains int `json:"domains"`
Hostnames int `json:"hostnames"`
Ports int `json:"ports"`
PortsOpen int `json:"portsOpen"`
Certificates int `json:"certificates"`
CertsOK int `json:"certificatesOk"`
CertsError int `json:"certificatesError"`
}
// statusResponse is the full /api/v1/status response.
type statusResponse struct {
Status string `json:"status"`
LastUpdated time.Time `json:"lastUpdated"`
Counts statusCounts `json:"counts"`
Domains map[string]*statusDomainInfo `json:"domains"`
Hostnames map[string]*statusHostnameInfo `json:"hostnames"`
Ports map[string]*statusPortInfo `json:"ports"`
Certificates map[string]*statusCertificateInfo `json:"certificates"`
}
// HandleStatus returns the monitoring status handler.
func (h *Handlers) HandleStatus() http.HandlerFunc {
type response struct {
Status string `json:"status"`
}
return func(
writer http.ResponseWriter,
request *http.Request,
) {
snap := h.state.GetSnapshot()
resp := buildStatusResponse(snap)
h.respondJSON(
writer, request,
resp,
&response{Status: "ok"},
http.StatusOK,
)
}
}
// buildStatusResponse constructs the full status response from
// the current monitoring snapshot.
func buildStatusResponse(
snap state.Snapshot,
) *statusResponse {
resp := &statusResponse{
Status: "ok",
LastUpdated: snap.LastUpdated,
Domains: make(map[string]*statusDomainInfo),
Hostnames: make(map[string]*statusHostnameInfo),
Ports: make(map[string]*statusPortInfo),
Certificates: make(map[string]*statusCertificateInfo),
}
buildDomains(snap, resp)
buildHostnames(snap, resp)
buildPorts(snap, resp)
buildCertificates(snap, resp)
buildCounts(resp)
return resp
}
func buildDomains(
snap state.Snapshot,
resp *statusResponse,
) {
for name, ds := range snap.Domains {
ns := make([]string, len(ds.Nameservers))
copy(ns, ds.Nameservers)
sort.Strings(ns)
resp.Domains[name] = &statusDomainInfo{
Nameservers: ns,
LastChecked: ds.LastChecked,
}
}
}
func buildHostnames(
snap state.Snapshot,
resp *statusResponse,
) {
for name, hs := range snap.Hostnames {
info := &statusHostnameInfo{
Nameservers: make(map[string]*statusHostnameNSInfo),
LastChecked: hs.LastChecked,
}
for ns, nsState := range hs.RecordsByNameserver {
recs := make(map[string][]string, len(nsState.Records))
for rtype, vals := range nsState.Records {
copied := make([]string, len(vals))
copy(copied, vals)
recs[rtype] = copied
}
info.Nameservers[ns] = &statusHostnameNSInfo{
Records: recs,
Status: nsState.Status,
LastChecked: nsState.LastChecked,
}
}
resp.Hostnames[name] = info
}
}
func buildPorts(
snap state.Snapshot,
resp *statusResponse,
) {
for key, ps := range snap.Ports {
hostnames := make([]string, len(ps.Hostnames))
copy(hostnames, ps.Hostnames)
sort.Strings(hostnames)
resp.Ports[key] = &statusPortInfo{
Open: ps.Open,
Hostnames: hostnames,
LastChecked: ps.LastChecked,
}
}
}
func buildCertificates(
snap state.Snapshot,
resp *statusResponse,
) {
for key, cs := range snap.Certificates {
sans := make([]string, len(cs.SubjectAlternativeNames))
copy(sans, cs.SubjectAlternativeNames)
resp.Certificates[key] = &statusCertificateInfo{
CommonName: cs.CommonName,
Issuer: cs.Issuer,
NotAfter: cs.NotAfter,
SubjectAlternativeNames: sans,
Status: cs.Status,
LastChecked: cs.LastChecked,
}
}
}
func buildCounts(resp *statusResponse) {
var portsOpen, certsOK, certsError int
for _, ps := range resp.Ports {
if ps.Open {
portsOpen++
}
}
for _, cs := range resp.Certificates {
switch cs.Status {
case "ok":
certsOK++
case "error":
certsError++
}
}
resp.Counts = statusCounts{
Domains: len(resp.Domains),
Hostnames: len(resp.Hostnames),
Ports: len(resp.Ports),
PortsOpen: portsOpen,
Certificates: len(resp.Certificates),
CertsOK: certsOK,
CertsError: certsError,
}
}

View File

@@ -1,370 +0,0 @@
<!doctype html>
<html lang="en" class="bg-slate-950">
<head>
<meta charset="utf-8" />
<meta http-equiv="refresh" content="30" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>dnswatcher</title>
<link rel="stylesheet" href="/s/css/tailwind.min.css" />
</head>
<body
class="bg-surface-950 text-slate-300 font-mono text-sm min-h-screen antialiased"
>
<div class="max-w-6xl mx-auto px-4 py-8">
{{/* ---- Header ---- */}}
<div class="mb-8">
<h1 class="text-2xl font-bold text-teal-400 tracking-tight">
dnswatcher
</h1>
<p class="text-xs text-slate-500 mt-1">
state updated {{ .StateAge }} &middot; page generated
{{ .GeneratedAt }} UTC &middot; auto-refresh 30s
</p>
</div>
{{/* ---- Summary bar ---- */}}
<div
class="grid grid-cols-2 sm:grid-cols-4 gap-3 mb-8"
>
<div class="bg-surface-800 border border-slate-700/50 rounded-lg p-4">
<div class="text-xs text-slate-500 uppercase tracking-wider">
Domains
</div>
<div class="text-2xl font-bold text-teal-400 mt-1">
{{ len .Snapshot.Domains }}
</div>
</div>
<div class="bg-surface-800 border border-slate-700/50 rounded-lg p-4">
<div class="text-xs text-slate-500 uppercase tracking-wider">
Hostnames
</div>
<div class="text-2xl font-bold text-teal-400 mt-1">
{{ len .Snapshot.Hostnames }}
</div>
</div>
<div class="bg-surface-800 border border-slate-700/50 rounded-lg p-4">
<div class="text-xs text-slate-500 uppercase tracking-wider">
Ports
</div>
<div class="text-2xl font-bold text-teal-400 mt-1">
{{ len .Snapshot.Ports }}
</div>
</div>
<div class="bg-surface-800 border border-slate-700/50 rounded-lg p-4">
<div class="text-xs text-slate-500 uppercase tracking-wider">
Certificates
</div>
<div class="text-2xl font-bold text-teal-400 mt-1">
{{ len .Snapshot.Certificates }}
</div>
</div>
</div>
{{/* ---- Domains ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Domains
</h2>
{{ if .Snapshot.Domains }}
<div class="overflow-x-auto">
<table class="w-full text-left text-xs">
<thead>
<tr class="text-slate-500 uppercase tracking-wider">
<th class="py-2 px-3">Domain</th>
<th class="py-2 px-3">Nameservers</th>
<th class="py-2 px-3">Checked</th>
</tr>
</thead>
<tbody class="divide-y divide-slate-800">
{{ range $name, $ds := .Snapshot.Domains }}
<tr class="hover:bg-surface-800/50">
<td class="py-2 px-3 text-slate-200 font-medium">
{{ $name }}
</td>
<td class="py-2 px-3 text-slate-400 break-all">
{{ joinStrings $ds.Nameservers ", " }}
</td>
<td class="py-2 px-3 text-slate-500 whitespace-nowrap">
{{ relTime $ds.LastChecked }}
</td>
</tr>
{{ end }}
</tbody>
</table>
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No domains configured.
</p>
{{ end }}
</section>
{{/* ---- Hostnames ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Hostnames
</h2>
{{ if .Snapshot.Hostnames }}
<div class="overflow-x-auto">
<table class="w-full text-left text-xs">
<thead>
<tr class="text-slate-500 uppercase tracking-wider">
<th class="py-2 px-3">Hostname</th>
<th class="py-2 px-3">NS</th>
<th class="py-2 px-3">Status</th>
<th class="py-2 px-3">Records</th>
<th class="py-2 px-3">Checked</th>
</tr>
</thead>
<tbody class="divide-y divide-slate-800">
{{ range $name, $hs := .Snapshot.Hostnames }}
{{ range $ns, $nsr := $hs.RecordsByNameserver }}
<tr class="hover:bg-surface-800/50">
<td class="py-2 px-3 text-slate-200 font-medium">
{{ $name }}
</td>
<td class="py-2 px-3 text-slate-400 break-all">
{{ $ns }}
</td>
<td class="py-2 px-3">
{{ if eq $nsr.Status "ok" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-teal-900/50 text-teal-400 border border-teal-700/30"
>ok</span
>
{{ else }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-red-900/50 text-red-400 border border-red-700/30"
>{{ $nsr.Status }}</span
>
{{ end }}
</td>
<td
class="py-2 px-3 text-slate-400 break-all max-w-xs"
>
{{ formatRecords $nsr.Records }}
</td>
<td class="py-2 px-3 text-slate-500 whitespace-nowrap">
{{ relTime $nsr.LastChecked }}
</td>
</tr>
{{ end }}
{{ end }}
</tbody>
</table>
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No hostnames configured.
</p>
{{ end }}
</section>
{{/* ---- Ports ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Ports
</h2>
{{ if .Snapshot.Ports }}
<div class="overflow-x-auto">
<table class="w-full text-left text-xs">
<thead>
<tr class="text-slate-500 uppercase tracking-wider">
<th class="py-2 px-3">Address</th>
<th class="py-2 px-3">State</th>
<th class="py-2 px-3">Hostnames</th>
<th class="py-2 px-3">Checked</th>
</tr>
</thead>
<tbody class="divide-y divide-slate-800">
{{ range $key, $ps := .Snapshot.Ports }}
<tr class="hover:bg-surface-800/50">
<td class="py-2 px-3 text-slate-200 font-medium">
{{ $key }}
</td>
<td class="py-2 px-3">
{{ if $ps.Open }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-teal-900/50 text-teal-400 border border-teal-700/30"
>open</span
>
{{ else }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-red-900/50 text-red-400 border border-red-700/30"
>closed</span
>
{{ end }}
</td>
<td class="py-2 px-3 text-slate-400 break-all">
{{ joinStrings $ps.Hostnames ", " }}
</td>
<td class="py-2 px-3 text-slate-500 whitespace-nowrap">
{{ relTime $ps.LastChecked }}
</td>
</tr>
{{ end }}
</tbody>
</table>
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No port data yet.
</p>
{{ end }}
</section>
{{/* ---- Certificates ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Certificates
</h2>
{{ if .Snapshot.Certificates }}
<div class="overflow-x-auto">
<table class="w-full text-left text-xs">
<thead>
<tr class="text-slate-500 uppercase tracking-wider">
<th class="py-2 px-3">Endpoint</th>
<th class="py-2 px-3">Status</th>
<th class="py-2 px-3">CN</th>
<th class="py-2 px-3">Issuer</th>
<th class="py-2 px-3">Expires</th>
<th class="py-2 px-3">Checked</th>
</tr>
</thead>
<tbody class="divide-y divide-slate-800">
{{ range $key, $cs := .Snapshot.Certificates }}
<tr class="hover:bg-surface-800/50">
<td class="py-2 px-3 text-slate-400 break-all">
{{ $key }}
</td>
<td class="py-2 px-3">
{{ if eq $cs.Status "ok" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-teal-900/50 text-teal-400 border border-teal-700/30"
>ok</span
>
{{ else }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-red-900/50 text-red-400 border border-red-700/30"
>{{ $cs.Status }}</span
>
{{ end }}
</td>
<td class="py-2 px-3 text-slate-200">
{{ $cs.CommonName }}
</td>
<td class="py-2 px-3 text-slate-400 break-all">
{{ $cs.Issuer }}
</td>
<td class="py-2 px-3 whitespace-nowrap">
{{ if not $cs.NotAfter.IsZero }}
{{ $days := expiryDays $cs.NotAfter }}
{{ if lt $days 7 }}
<span class="text-red-400 font-medium"
>{{ $cs.NotAfter.Format "2006-01-02" }}
({{ $days }}d)</span
>
{{ else if lt $days 30 }}
<span class="text-amber-400"
>{{ $cs.NotAfter.Format "2006-01-02" }}
({{ $days }}d)</span
>
{{ else }}
<span class="text-slate-400"
>{{ $cs.NotAfter.Format "2006-01-02" }}
({{ $days }}d)</span
>
{{ end }}
{{ end }}
</td>
<td class="py-2 px-3 text-slate-500 whitespace-nowrap">
{{ relTime $cs.LastChecked }}
</td>
</tr>
{{ end }}
</tbody>
</table>
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No certificate data yet.
</p>
{{ end }}
</section>
{{/* ---- Recent Alerts ---- */}}
<section class="mb-8">
<h2
class="text-sm font-semibold text-teal-300 uppercase tracking-wider mb-3 border-b border-slate-700/50 pb-2"
>
Recent Alerts ({{ len .Alerts }})
</h2>
{{ if .Alerts }}
<div class="space-y-2">
{{ range .Alerts }}
<div
class="bg-surface-800 border rounded-lg px-4 py-3 {{ if eq .Priority "error" }}border-red-700/40{{ else if eq .Priority "warning" }}border-amber-700/40{{ else if eq .Priority "success" }}border-teal-700/40{{ else }}border-blue-700/40{{ end }}"
>
<div class="flex items-center gap-3 mb-1">
{{ if eq .Priority "error" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-red-900/50 text-red-400 border border-red-700/30"
>error</span
>
{{ else if eq .Priority "warning" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-amber-900/50 text-amber-400 border border-amber-700/30"
>warning</span
>
{{ else if eq .Priority "success" }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-teal-900/50 text-teal-400 border border-teal-700/30"
>success</span
>
{{ else }}
<span
class="inline-block px-1.5 py-0.5 rounded text-[10px] font-bold uppercase bg-blue-900/50 text-blue-400 border border-blue-700/30"
>info</span
>
{{ end }}
<span class="text-slate-200 text-xs font-medium">
{{ .Title }}
</span>
<span class="text-slate-600 text-[11px] ml-auto whitespace-nowrap">
{{ .Timestamp.Format "2006-01-02 15:04:05" }} UTC
({{ relTime .Timestamp }})
</span>
</div>
<p
class="text-slate-400 text-xs whitespace-pre-line pl-0.5"
>
{{ .Message }}
</p>
</div>
{{ end }}
</div>
{{ else }}
<p class="text-slate-600 italic text-xs">
No alerts recorded since last restart.
</p>
{{ end }}
</section>
{{/* ---- Footer ---- */}}
<div
class="text-[11px] text-slate-700 border-t border-slate-800 pt-4 mt-8"
>
dnswatcher &middot; monitoring {{ len .Snapshot.Domains }} domains +
{{ len .Snapshot.Hostnames }} hostnames
</div>
</div>
</body>
</html>

View File

@@ -78,5 +78,6 @@ func (l *Logger) Identify() {
l.log.Info("starting",
"appname", l.params.Globals.Appname,
"version", l.params.Globals.Version,
"buildarch", l.params.Globals.Buildarch,
)
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,105 +0,0 @@
package notify
import (
"context"
"io"
"log/slog"
"net/http"
"net/url"
"time"
)
// NtfyPriority exports ntfyPriority for testing.
func NtfyPriority(priority string) string {
return ntfyPriority(priority)
}
// SlackColor exports slackColor for testing.
func SlackColor(priority string) string {
return slackColor(priority)
}
// NewRequestForTest exports newRequest for testing.
func NewRequestForTest(
ctx context.Context,
method string,
target *url.URL,
body io.Reader,
) *http.Request {
return newRequest(ctx, method, target, body)
}
// NewTestService creates a Service suitable for unit testing.
// It discards log output and uses the given transport.
func NewTestService(transport http.RoundTripper) *Service {
return &Service{
log: slog.New(slog.DiscardHandler),
transport: transport,
history: NewAlertHistory(),
}
}
// SetNtfyURL sets the ntfy URL on a Service for testing.
func (svc *Service) SetNtfyURL(u *url.URL) {
svc.ntfyURL = u
}
// SetSlackWebhookURL sets the Slack webhook URL on a
// Service for testing.
func (svc *Service) SetSlackWebhookURL(u *url.URL) {
svc.slackWebhookURL = u
}
// SetMattermostWebhookURL sets the Mattermost webhook URL on
// a Service for testing.
func (svc *Service) SetMattermostWebhookURL(u *url.URL) {
svc.mattermostWebhookURL = u
}
// SendNtfy exports sendNtfy for testing.
func (svc *Service) SendNtfy(
ctx context.Context,
topicURL *url.URL,
title, message, priority string,
) error {
return svc.sendNtfy(ctx, topicURL, title, message, priority)
}
// SendSlack exports sendSlack for testing.
func (svc *Service) SendSlack(
ctx context.Context,
webhookURL *url.URL,
title, message, priority string,
) error {
return svc.sendSlack(
ctx, webhookURL, title, message, priority,
)
}
// SetRetryConfig overrides the retry configuration for
// testing.
func (svc *Service) SetRetryConfig(cfg RetryConfig) {
svc.retryConfig = cfg
}
// SetSleepFunc overrides the sleep function so tests can
// eliminate real delays.
func (svc *Service) SetSleepFunc(
fn func(time.Duration) <-chan time.Time,
) {
svc.sleepFn = fn
}
// DeliverWithRetry exports deliverWithRetry for testing.
func (svc *Service) DeliverWithRetry(
ctx context.Context,
endpoint string,
fn func(context.Context) error,
) error {
return svc.deliverWithRetry(ctx, endpoint, fn)
}
// BackoffDuration exports RetryConfig.backoff for testing.
func (rc RetryConfig) BackoffDuration(attempt int) time.Duration {
return rc.defaults().backoff(attempt)
}

View File

@@ -1,62 +0,0 @@
package notify
import (
"sync"
"time"
)
// maxAlertHistory is the maximum number of alerts to retain.
const maxAlertHistory = 100
// AlertEntry represents a single notification that was sent.
type AlertEntry struct {
Timestamp time.Time
Title string
Message string
Priority string
}
// AlertHistory is a thread-safe ring buffer that stores
// the most recent alerts.
type AlertHistory struct {
mu sync.RWMutex
entries [maxAlertHistory]AlertEntry
count int
index int
}
// NewAlertHistory creates a new empty AlertHistory.
func NewAlertHistory() *AlertHistory {
return &AlertHistory{}
}
// Add records a new alert entry in the ring buffer.
func (h *AlertHistory) Add(entry AlertEntry) {
h.mu.Lock()
defer h.mu.Unlock()
h.entries[h.index] = entry
h.index = (h.index + 1) % maxAlertHistory
if h.count < maxAlertHistory {
h.count++
}
}
// Recent returns the stored alerts in reverse chronological
// order (newest first). Returns at most maxAlertHistory entries.
func (h *AlertHistory) Recent() []AlertEntry {
h.mu.RLock()
defer h.mu.RUnlock()
result := make([]AlertEntry, h.count)
for i := range h.count {
// Walk backwards from the most recent entry.
idx := (h.index - 1 - i + maxAlertHistory) %
maxAlertHistory
result[i] = h.entries[idx]
}
return result
}

View File

@@ -1,88 +0,0 @@
package notify_test
import (
"testing"
"time"
"sneak.berlin/go/dnswatcher/internal/notify"
)
func TestAlertHistoryEmpty(t *testing.T) {
t.Parallel()
h := notify.NewAlertHistory()
entries := h.Recent()
if len(entries) != 0 {
t.Fatalf("expected 0 entries, got %d", len(entries))
}
}
func TestAlertHistoryAddAndRecent(t *testing.T) {
t.Parallel()
h := notify.NewAlertHistory()
now := time.Now().UTC()
h.Add(notify.AlertEntry{
Timestamp: now.Add(-2 * time.Minute),
Title: "first",
Message: "msg1",
Priority: "info",
})
h.Add(notify.AlertEntry{
Timestamp: now.Add(-1 * time.Minute),
Title: "second",
Message: "msg2",
Priority: "warning",
})
entries := h.Recent()
if len(entries) != 2 {
t.Fatalf("expected 2 entries, got %d", len(entries))
}
// Newest first.
if entries[0].Title != "second" {
t.Errorf(
"expected newest first, got %q", entries[0].Title,
)
}
if entries[1].Title != "first" {
t.Errorf(
"expected oldest second, got %q", entries[1].Title,
)
}
}
func TestAlertHistoryOverflow(t *testing.T) {
t.Parallel()
h := notify.NewAlertHistory()
const totalEntries = 110
// Fill beyond capacity.
for i := range totalEntries {
h.Add(notify.AlertEntry{
Timestamp: time.Now().UTC(),
Title: "alert",
Message: "msg",
Priority: string(rune('0' + i%10)),
})
}
entries := h.Recent()
const maxHistory = 100
if len(entries) != maxHistory {
t.Fatalf(
"expected %d entries, got %d",
maxHistory, len(entries),
)
}
}

View File

@@ -112,9 +112,6 @@ type Service struct {
ntfyURL *url.URL
slackWebhookURL *url.URL
mattermostWebhookURL *url.URL
history *AlertHistory
retryConfig RetryConfig
sleepFn func(time.Duration) <-chan time.Time
}
// New creates a new notify Service.
@@ -126,7 +123,6 @@ func New(
log: params.Logger.Get(),
transport: http.DefaultTransport,
config: params.Config,
history: NewAlertHistory(),
}
if params.Config.NtfyTopic != "" {
@@ -171,117 +167,65 @@ func New(
return svc, nil
}
// History returns the alert history for reading recent alerts.
func (svc *Service) History() *AlertHistory {
return svc.history
}
// SendNotification sends a notification to all configured
// endpoints and records it in the alert history.
// endpoints.
func (svc *Service) SendNotification(
ctx context.Context,
title, message, priority string,
) {
svc.history.Add(AlertEntry{
Timestamp: time.Now().UTC(),
Title: title,
Message: message,
Priority: priority,
})
svc.dispatchNtfy(ctx, title, message, priority)
svc.dispatchSlack(ctx, title, message, priority)
svc.dispatchMattermost(ctx, title, message, priority)
}
func (svc *Service) dispatchNtfy(
ctx context.Context,
title, message, priority string,
) {
if svc.ntfyURL == nil {
return
}
if svc.ntfyURL != nil {
go func() {
notifyCtx := context.WithoutCancel(ctx)
err := svc.deliverWithRetry(
notifyCtx, "ntfy",
func(c context.Context) error {
return svc.sendNtfy(
c, svc.ntfyURL,
err := svc.sendNtfy(
notifyCtx,
svc.ntfyURL,
title, message, priority,
)
},
)
if err != nil {
svc.log.Error(
"failed to send ntfy notification "+
"after retries",
"failed to send ntfy notification",
"error", err,
)
}
}()
}
func (svc *Service) dispatchSlack(
ctx context.Context,
title, message, priority string,
) {
if svc.slackWebhookURL == nil {
return
}
if svc.slackWebhookURL != nil {
go func() {
notifyCtx := context.WithoutCancel(ctx)
err := svc.deliverWithRetry(
notifyCtx, "slack",
func(c context.Context) error {
return svc.sendSlack(
c, svc.slackWebhookURL,
err := svc.sendSlack(
notifyCtx,
svc.slackWebhookURL,
title, message, priority,
)
},
)
if err != nil {
svc.log.Error(
"failed to send slack notification "+
"after retries",
"failed to send slack notification",
"error", err,
)
}
}()
}
func (svc *Service) dispatchMattermost(
ctx context.Context,
title, message, priority string,
) {
if svc.mattermostWebhookURL == nil {
return
}
if svc.mattermostWebhookURL != nil {
go func() {
notifyCtx := context.WithoutCancel(ctx)
err := svc.deliverWithRetry(
notifyCtx, "mattermost",
func(c context.Context) error {
return svc.sendSlack(
c, svc.mattermostWebhookURL,
err := svc.sendSlack(
notifyCtx,
svc.mattermostWebhookURL,
title, message, priority,
)
},
)
if err != nil {
svc.log.Error(
"failed to send mattermost notification "+
"after retries",
"failed to send mattermost notification",
"error", err,
)
}
}()
}
}
func (svc *Service) sendNtfy(

View File

@@ -1,139 +0,0 @@
package notify
import (
"context"
"math"
"math/rand/v2"
"time"
)
// Retry defaults.
const (
// DefaultMaxRetries is the number of additional attempts
// after the first failure.
DefaultMaxRetries = 3
// DefaultBaseDelay is the initial delay before the first
// retry attempt.
DefaultBaseDelay = 1 * time.Second
// DefaultMaxDelay caps the computed backoff delay.
DefaultMaxDelay = 10 * time.Second
// backoffMultiplier is the exponential growth factor.
backoffMultiplier = 2
// jitterFraction controls the ±random spread applied
// to each delay (0.25 = ±25%).
jitterFraction = 0.25
)
// RetryConfig holds tuning knobs for the retry loop.
// Zero values fall back to the package defaults above.
type RetryConfig struct {
MaxRetries int
BaseDelay time.Duration
MaxDelay time.Duration
}
// defaults returns a copy with zero fields replaced by
// package defaults.
func (rc RetryConfig) defaults() RetryConfig {
if rc.MaxRetries <= 0 {
rc.MaxRetries = DefaultMaxRetries
}
if rc.BaseDelay <= 0 {
rc.BaseDelay = DefaultBaseDelay
}
if rc.MaxDelay <= 0 {
rc.MaxDelay = DefaultMaxDelay
}
return rc
}
// backoff computes the delay for attempt n (0-indexed) with
// jitter. The raw delay is BaseDelay * 2^n, capped at
// MaxDelay, then randomised by ±jitterFraction.
func (rc RetryConfig) backoff(attempt int) time.Duration {
raw := float64(rc.BaseDelay) *
math.Pow(backoffMultiplier, float64(attempt))
if raw > float64(rc.MaxDelay) {
raw = float64(rc.MaxDelay)
}
// Apply jitter: uniform in [raw*(1-j), raw*(1+j)].
lo := raw * (1 - jitterFraction)
hi := raw * (1 + jitterFraction)
jittered := lo + rand.Float64()*(hi-lo) //nolint:gosec // jitter does not need crypto/rand
return time.Duration(jittered)
}
// deliverWithRetry calls fn, retrying on error with
// exponential backoff. It logs every failed attempt and
// returns the last error if all attempts are exhausted.
func (svc *Service) deliverWithRetry(
ctx context.Context,
endpoint string,
fn func(context.Context) error,
) error {
cfg := svc.retryConfig.defaults()
var lastErr error
// attempt 0 is the initial call; attempts 1..MaxRetries
// are retries.
for attempt := range cfg.MaxRetries + 1 {
lastErr = fn(ctx)
if lastErr == nil {
if attempt > 0 {
svc.log.Info(
"notification delivered after retry",
"endpoint", endpoint,
"attempt", attempt+1,
)
}
return nil
}
// Last attempt — don't sleep, just return.
if attempt == cfg.MaxRetries {
break
}
delay := cfg.backoff(attempt)
svc.log.Warn(
"notification delivery failed, retrying",
"endpoint", endpoint,
"attempt", attempt+1,
"maxAttempts", cfg.MaxRetries+1,
"retryIn", delay,
"error", lastErr,
)
select {
case <-ctx.Done():
return ctx.Err()
case <-svc.sleepFunc(delay):
}
}
return lastErr
}
// sleepFunc returns a channel that closes after d.
// It is a field-level indirection so tests can override it.
func (svc *Service) sleepFunc(d time.Duration) <-chan time.Time {
if svc.sleepFn != nil {
return svc.sleepFn(d)
}
return time.After(d)
}

View File

@@ -1,493 +0,0 @@
package notify_test
import (
"context"
"errors"
"net/http"
"net/http/httptest"
"net/url"
"sync"
"sync/atomic"
"testing"
"time"
"sneak.berlin/go/dnswatcher/internal/notify"
)
// Static test errors (err113).
var (
errTransient = errors.New("transient failure")
errPermanent = errors.New("permanent failure")
errFail = errors.New("fail")
)
// instantSleep returns a closed channel immediately, removing
// real delays from tests.
func instantSleep(_ time.Duration) <-chan time.Time {
ch := make(chan time.Time, 1)
ch <- time.Now()
return ch
}
// ── backoff calculation ───────────────────────────────────
func TestBackoffDurationIncreases(t *testing.T) {
t.Parallel()
cfg := notify.RetryConfig{
MaxRetries: 5,
BaseDelay: 1 * time.Second,
MaxDelay: 30 * time.Second,
}
prev := time.Duration(0)
// With jitter the exact value varies, but the trend
// should be increasing for the first few attempts.
for attempt := range 4 {
d := cfg.BackoffDuration(attempt)
if d <= 0 {
t.Fatalf(
"attempt %d: backoff must be positive, got %v",
attempt, d,
)
}
// Allow jitter to occasionally flatten a step, but
// the midpoint (no-jitter) should be strictly higher.
midpoint := cfg.BaseDelay * (1 << attempt)
if attempt > 0 && midpoint <= prev {
t.Fatalf(
"midpoint should grow: attempt %d midpoint=%v prev=%v",
attempt, midpoint, prev,
)
}
prev = midpoint
}
}
func TestBackoffDurationCappedAtMax(t *testing.T) {
t.Parallel()
cfg := notify.RetryConfig{
MaxRetries: 5,
BaseDelay: 1 * time.Second,
MaxDelay: 5 * time.Second,
}
// Attempt 10 would be 1024s without capping.
d := cfg.BackoffDuration(10)
// With ±25% jitter on a 5s cap: max is 6.25s.
const maxWithJitter = 5*time.Second +
5*time.Second/4 +
time.Millisecond // rounding margin
if d > maxWithJitter {
t.Errorf(
"backoff %v exceeds max+jitter %v",
d, maxWithJitter,
)
}
}
// ── deliverWithRetry ──────────────────────────────────────
func TestDeliverWithRetrySucceedsFirstAttempt(t *testing.T) {
t.Parallel()
svc := notify.NewTestService(http.DefaultTransport)
svc.SetSleepFunc(instantSleep)
var calls atomic.Int32
err := svc.DeliverWithRetry(
context.Background(), "test",
func(_ context.Context) error {
calls.Add(1)
return nil
},
)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if calls.Load() != 1 {
t.Errorf("expected 1 call, got %d", calls.Load())
}
}
func TestDeliverWithRetryRetriesOnFailure(t *testing.T) {
t.Parallel()
svc := notify.NewTestService(http.DefaultTransport)
svc.SetSleepFunc(instantSleep)
svc.SetRetryConfig(notify.RetryConfig{
MaxRetries: 3,
BaseDelay: time.Millisecond,
MaxDelay: 10 * time.Millisecond,
})
var calls atomic.Int32
// Fail twice, then succeed on the third attempt.
err := svc.DeliverWithRetry(
context.Background(), "test",
func(_ context.Context) error {
n := calls.Add(1)
if n <= 2 {
return errTransient
}
return nil
},
)
if err != nil {
t.Fatalf("expected success after retries: %v", err)
}
if calls.Load() != 3 {
t.Errorf("expected 3 calls, got %d", calls.Load())
}
}
func TestDeliverWithRetryExhaustsAttempts(t *testing.T) {
t.Parallel()
svc := notify.NewTestService(http.DefaultTransport)
svc.SetSleepFunc(instantSleep)
svc.SetRetryConfig(notify.RetryConfig{
MaxRetries: 2,
BaseDelay: time.Millisecond,
MaxDelay: 10 * time.Millisecond,
})
var calls atomic.Int32
err := svc.DeliverWithRetry(
context.Background(), "test",
func(_ context.Context) error {
calls.Add(1)
return errPermanent
},
)
if err == nil {
t.Fatal("expected error when all retries exhausted")
}
if !errors.Is(err, errPermanent) {
t.Errorf("expected permanent failure, got: %v", err)
}
// 1 initial + 2 retries = 3 total.
if calls.Load() != 3 {
t.Errorf("expected 3 calls, got %d", calls.Load())
}
}
func TestDeliverWithRetryRespectsContextCancellation(
t *testing.T,
) {
t.Parallel()
svc := notify.NewTestService(http.DefaultTransport)
svc.SetRetryConfig(notify.RetryConfig{
MaxRetries: 5,
BaseDelay: time.Millisecond,
MaxDelay: 10 * time.Millisecond,
})
// Use a blocking sleep so the context cancellation is
// the only way out.
svc.SetSleepFunc(func(_ time.Duration) <-chan time.Time {
return make(chan time.Time) // never fires
})
ctx, cancel := context.WithCancel(context.Background())
done := make(chan error, 1)
go func() {
done <- svc.DeliverWithRetry(
ctx, "test",
func(_ context.Context) error {
return errFail
},
)
}()
// Wait for the first failure + retry sleep to be
// entered, then cancel.
time.Sleep(50 * time.Millisecond)
cancel()
select {
case err := <-done:
if !errors.Is(err, context.Canceled) {
t.Errorf(
"expected context.Canceled, got: %v", err,
)
}
case <-time.After(2 * time.Second):
t.Fatal("deliverWithRetry did not return after cancel")
}
}
// ── integration: SendNotification with retry ──────────────
func TestSendNotificationRetriesTransientFailure(
t *testing.T,
) {
t.Parallel()
var (
mu sync.Mutex
attempts int
)
srv := httptest.NewServer(
http.HandlerFunc(
func(w http.ResponseWriter, _ *http.Request) {
mu.Lock()
attempts++
n := attempts
mu.Unlock()
if n <= 2 {
w.WriteHeader(
http.StatusInternalServerError,
)
return
}
w.WriteHeader(http.StatusOK)
}),
)
defer srv.Close()
svc := newRetryTestService(srv.URL, "ntfy")
svc.SendNotification(
context.Background(),
"Retry Test", "body", "warning",
)
waitForCondition(t, func() bool {
mu.Lock()
defer mu.Unlock()
return attempts >= 3
})
}
// newRetryTestService creates a test service with instant
// sleep and low retry delays for the named endpoint.
func newRetryTestService(
rawURL, endpoint string,
) *notify.Service {
svc := notify.NewTestService(http.DefaultTransport)
svc.SetSleepFunc(instantSleep)
svc.SetRetryConfig(notify.RetryConfig{
MaxRetries: 3,
BaseDelay: time.Millisecond,
MaxDelay: 10 * time.Millisecond,
})
u, _ := url.Parse(rawURL)
switch endpoint {
case "ntfy":
svc.SetNtfyURL(u)
case "slack":
svc.SetSlackWebhookURL(u)
case "mattermost":
svc.SetMattermostWebhookURL(u)
}
return svc
}
func TestSendNotificationAllEndpointsRetrySetup(
t *testing.T,
) {
t.Parallel()
result := newEndpointRetryResult()
ntfySrv, slackSrv, mmSrv := newRetryServers(result)
defer ntfySrv.Close()
defer slackSrv.Close()
defer mmSrv.Close()
svc := buildAllEndpointRetryService(
ntfySrv.URL, slackSrv.URL, mmSrv.URL,
)
svc.SendNotification(
context.Background(),
"Multi-Retry", "testing", "error",
)
assertAllEndpointsRetried(t, result)
}
// endpointRetryResult tracks per-endpoint retry state.
type endpointRetryResult struct {
mu sync.Mutex
ntfyAttempts int
slackAttempts int
mmAttempts int
ntfyOK bool
slackOK bool
mmOK bool
}
func newEndpointRetryResult() *endpointRetryResult {
return &endpointRetryResult{}
}
func newRetryServers(
r *endpointRetryResult,
) (*httptest.Server, *httptest.Server, *httptest.Server) {
mk := func(
attempts *int, ok *bool,
) *httptest.Server {
return httptest.NewServer(
http.HandlerFunc(
func(w http.ResponseWriter, _ *http.Request) {
r.mu.Lock()
*attempts++
n := *attempts
r.mu.Unlock()
if n == 1 {
w.WriteHeader(
http.StatusServiceUnavailable,
)
return
}
r.mu.Lock()
*ok = true
r.mu.Unlock()
w.WriteHeader(http.StatusOK)
}),
)
}
return mk(&r.ntfyAttempts, &r.ntfyOK),
mk(&r.slackAttempts, &r.slackOK),
mk(&r.mmAttempts, &r.mmOK)
}
func buildAllEndpointRetryService(
ntfyURL, slackURL, mmURL string,
) *notify.Service {
svc := notify.NewTestService(http.DefaultTransport)
svc.SetSleepFunc(instantSleep)
svc.SetRetryConfig(notify.RetryConfig{
MaxRetries: 3,
BaseDelay: time.Millisecond,
MaxDelay: 10 * time.Millisecond,
})
nu, _ := url.Parse(ntfyURL)
su, _ := url.Parse(slackURL)
mu, _ := url.Parse(mmURL)
svc.SetNtfyURL(nu)
svc.SetSlackWebhookURL(su)
svc.SetMattermostWebhookURL(mu)
return svc
}
func assertAllEndpointsRetried(
t *testing.T,
r *endpointRetryResult,
) {
t.Helper()
waitForCondition(t, func() bool {
r.mu.Lock()
defer r.mu.Unlock()
return r.ntfyOK && r.slackOK && r.mmOK
})
r.mu.Lock()
defer r.mu.Unlock()
if r.ntfyAttempts < 2 {
t.Errorf(
"ntfy: expected >= 2 attempts, got %d",
r.ntfyAttempts,
)
}
if r.slackAttempts < 2 {
t.Errorf(
"slack: expected >= 2 attempts, got %d",
r.slackAttempts,
)
}
if r.mmAttempts < 2 {
t.Errorf(
"mattermost: expected >= 2 attempts, got %d",
r.mmAttempts,
)
}
}
func TestSendNotificationPermanentFailureLogsError(
t *testing.T,
) {
t.Parallel()
var (
mu sync.Mutex
attempts int
)
srv := httptest.NewServer(
http.HandlerFunc(
func(w http.ResponseWriter, _ *http.Request) {
mu.Lock()
attempts++
mu.Unlock()
w.WriteHeader(
http.StatusInternalServerError,
)
}),
)
defer srv.Close()
svc := newRetryTestService(srv.URL, "slack")
svc.SetRetryConfig(notify.RetryConfig{
MaxRetries: 2,
BaseDelay: time.Millisecond,
MaxDelay: 10 * time.Millisecond,
})
svc.SendNotification(
context.Background(),
"Permanent Fail", "body", "error",
)
// 1 initial + 2 retries = 3 total.
waitForCondition(t, func() bool {
mu.Lock()
defer mu.Unlock()
return attempts >= 3
})
}

View File

@@ -4,6 +4,7 @@ import (
"context"
"errors"
"fmt"
"math/rand"
"net"
"sort"
"strings"
@@ -41,6 +42,22 @@ func rootServerList() []string {
}
}
const maxRootServers = 3
// randomRootServers returns a shuffled subset of root servers.
func randomRootServers() []string {
all := rootServerList()
rand.Shuffle(len(all), func(i, j int) {
all[i], all[j] = all[j], all[i]
})
if len(all) > maxRootServers {
return all[:maxRootServers]
}
return all
}
func checkCtx(ctx context.Context) error {
err := ctx.Err()
if err != nil {
@@ -227,7 +244,7 @@ func (r *Resolver) followDelegation(
authNS := extractNSSet(resp.Ns)
if len(authNS) == 0 {
return r.resolveNSIterative(ctx, domain)
return r.resolveNSRecursive(ctx, domain)
}
glue := extractGlue(resp.Extra)
@@ -291,84 +308,60 @@ func (r *Resolver) resolveNSIPs(
return ips
}
// resolveNSIterative queries for NS records using iterative
// resolution as a fallback when followDelegation finds no
// authoritative answer in the delegation chain.
func (r *Resolver) resolveNSIterative(
// resolveNSRecursive queries for NS records using recursive
// resolution as a fallback for intercepted environments.
func (r *Resolver) resolveNSRecursive(
ctx context.Context,
domain string,
) ([]string, error) {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
domain = dns.Fqdn(domain)
servers := rootServerList()
msg := new(dns.Msg)
msg.SetQuestion(domain, dns.TypeNS)
msg.RecursionDesired = true
for range maxDelegation {
for _, ip := range randomRootServers() {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
resp, err := r.queryServers(
ctx, servers, domain, dns.TypeNS,
)
addr := net.JoinHostPort(ip, "53")
resp, _, err := r.client.ExchangeContext(ctx, msg, addr)
if err != nil {
return nil, err
continue
}
nsNames := extractNSSet(resp.Answer)
if len(nsNames) > 0 {
return nsNames, nil
}
// Follow delegation.
authNS := extractNSSet(resp.Ns)
if len(authNS) == 0 {
break
}
glue := extractGlue(resp.Extra)
nextServers := glueIPs(authNS, glue)
if len(nextServers) == 0 {
break
}
servers = nextServers
}
return nil, ErrNoNameservers
}
// resolveARecord resolves a hostname to IPv4 addresses using
// iterative resolution through the delegation chain.
// resolveARecord resolves a hostname to IPv4 addresses.
func (r *Resolver) resolveARecord(
ctx context.Context,
hostname string,
) ([]string, error) {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
hostname = dns.Fqdn(hostname)
servers := rootServerList()
msg := new(dns.Msg)
msg.SetQuestion(hostname, dns.TypeA)
msg.RecursionDesired = true
for range maxDelegation {
for _, ip := range randomRootServers() {
if checkCtx(ctx) != nil {
return nil, ErrContextCanceled
}
resp, err := r.queryServers(
ctx, servers, hostname, dns.TypeA,
)
addr := net.JoinHostPort(ip, "53")
resp, _, err := r.client.ExchangeContext(ctx, msg, addr)
if err != nil {
return nil, fmt.Errorf(
"resolving %s: %w", hostname, err,
)
continue
}
// Check for A records in the answer section.
var ips []string
for _, rr := range resp.Answer {
@@ -380,24 +373,6 @@ func (r *Resolver) resolveARecord(
if len(ips) > 0 {
return ips, nil
}
// Follow delegation if present.
authNS := extractNSSet(resp.Ns)
if len(authNS) == 0 {
break
}
glue := extractGlue(resp.Extra)
nextServers := glueIPs(authNS, glue)
if len(nextServers) == 0 {
// Resolve NS IPs iteratively — but guard
// against infinite recursion by using only
// already-resolved servers.
break
}
servers = nextServers
}
return nil, fmt.Errorf(
@@ -427,7 +402,7 @@ func (r *Resolver) FindAuthoritativeNameservers(
candidate := strings.Join(labels[i:], ".") + "."
nsNames, err := r.followDelegation(
ctx, candidate, rootServerList(),
ctx, candidate, randomRootServers(),
)
if err == nil && len(nsNames) > 0 {
sort.Strings(nsNames)

View File

@@ -1,14 +1,11 @@
package server
import (
"net/http"
"time"
"github.com/go-chi/chi/v5"
chimw "github.com/go-chi/chi/v5/middleware"
"github.com/prometheus/client_golang/prometheus/promhttp"
"sneak.berlin/go/dnswatcher/static"
)
// requestTimeout is the maximum duration for handling a request.
@@ -25,25 +22,7 @@ func (s *Server) SetupRoutes() {
s.router.Use(s.mw.CORS())
s.router.Use(chimw.Timeout(requestTimeout))
// Dashboard (read-only web UI)
s.router.Get("/", s.handlers.HandleDashboard())
// Static assets (embedded CSS/JS)
s.router.Mount(
"/s",
http.StripPrefix(
"/s",
http.FileServer(http.FS(static.Static)),
),
)
// Health check (standard well-known path)
s.router.Get(
"/.well-known/healthcheck",
s.handlers.HandleHealthCheck(),
)
// Legacy health check (keep for backward compatibility)
// Health check
s.router.Get("/health", s.handlers.HandleHealthCheck())
// API v1 routes

View File

@@ -57,47 +57,8 @@ type HostnameState struct {
// PortState holds the monitoring state for a port.
type PortState struct {
Open bool `json:"open"`
Hostnames []string `json:"hostnames"`
LastChecked time.Time `json:"lastChecked"`
}
// UnmarshalJSON implements custom unmarshaling to handle both
// the old single-hostname format and the new multi-hostname
// format for backward compatibility with existing state files.
func (ps *PortState) UnmarshalJSON(data []byte) error {
// Use an alias to prevent infinite recursion.
type portStateAlias struct {
Open bool `json:"open"`
Hostnames []string `json:"hostnames"`
LastChecked time.Time `json:"lastChecked"`
}
var alias portStateAlias
err := json.Unmarshal(data, &alias)
if err != nil {
return fmt.Errorf("unmarshaling port state: %w", err)
}
ps.Open = alias.Open
ps.Hostnames = alias.Hostnames
ps.LastChecked = alias.LastChecked
// If Hostnames is empty, try reading the old single-hostname
// format for backward compatibility.
if len(ps.Hostnames) == 0 {
var old struct {
Hostname string `json:"hostname"`
}
// Best-effort: ignore errors since the main unmarshal
// already succeeded.
if json.Unmarshal(data, &old) == nil && old.Hostname != "" {
ps.Hostnames = []string{old.Hostname}
}
}
return nil
LastChecked time.Time `json:"lastChecked"`
}
// CertificateState holds TLS certificate monitoring state.
@@ -302,27 +263,6 @@ func (s *State) GetPortState(key string) (*PortState, bool) {
return ps, ok
}
// DeletePortState removes a port state entry.
func (s *State) DeletePortState(key string) {
s.mu.Lock()
defer s.mu.Unlock()
delete(s.snapshot.Ports, key)
}
// GetAllPortKeys returns all port state keys.
func (s *State) GetAllPortKeys() []string {
s.mu.RLock()
defer s.mu.RUnlock()
keys := make([]string, 0, len(s.snapshot.Ports))
for k := range s.snapshot.Ports {
keys = append(keys, k)
}
return keys
}
// SetCertificateState updates the state for a certificate.
func (s *State) SetCertificateState(
key string,

File diff suppressed because it is too large Load Diff

View File

@@ -20,19 +20,3 @@ func NewForTest() *State {
config: &config.Config{DataDir: ""},
}
}
// NewForTestWithDataDir creates a State backed by the given directory
// for tests that need file persistence.
func NewForTestWithDataDir(dataDir string) *State {
return &State{
log: slog.Default(),
snapshot: &Snapshot{
Version: stateVersion,
Domains: make(map[string]*DomainState),
Hostnames: make(map[string]*HostnameState),
Ports: make(map[string]*PortState),
Certificates: make(map[string]*CertificateState),
},
config: &config.Config{DataDir: dataDir},
}
}

View File

@@ -72,15 +72,13 @@ func New(
}
lifecycle.Append(fx.Hook{
OnStart: func(_ context.Context) error {
// Use context.Background() — the fx startup context
// expires after startup completes, so deriving from it
// would cancel the watcher immediately. The watcher's
// lifetime is controlled by w.cancel in OnStop.
ctx, cancel := context.WithCancel(context.Background())
OnStart: func(startCtx context.Context) error {
ctx, cancel := context.WithCancel(
context.WithoutCancel(startCtx),
)
w.cancel = cancel
go w.Run(ctx) //nolint:contextcheck // intentionally not derived from startCtx
go w.Run(ctx)
return nil
},
@@ -129,7 +127,6 @@ func (w *Watcher) Run(ctx context.Context) {
)
w.RunOnce(ctx)
w.maybeSendTestNotification(ctx)
dnsTicker := time.NewTicker(w.config.DNSInterval)
tlsTicker := time.NewTicker(w.config.TLSInterval)
@@ -144,16 +141,9 @@ func (w *Watcher) Run(ctx context.Context) {
return
case <-dnsTicker.C:
w.runDNSChecks(ctx)
w.checkAllPorts(ctx)
w.runDNSAndPortChecks(ctx)
w.saveState()
case <-tlsTicker.C:
// Run DNS first so TLS checks use freshly
// resolved IP addresses, not stale ones from
// a previous cycle.
w.runDNSChecks(ctx)
w.runTLSChecks(ctx)
w.saveState()
}
@@ -161,26 +151,10 @@ func (w *Watcher) Run(ctx context.Context) {
}
// RunOnce performs a single complete monitoring cycle.
// DNS checks run first so that port and TLS checks use
// freshly resolved IP addresses. Port checks run before
// TLS because TLS checks only target IPs with an open
// port 443.
func (w *Watcher) RunOnce(ctx context.Context) {
w.detectFirstRun()
// Phase 1: DNS resolution must complete first so that
// subsequent checks use fresh IP addresses.
w.runDNSChecks(ctx)
// Phase 2: Port checks populate port state that TLS
// checks depend on (TLS only targets IPs where port
// 443 is open).
w.checkAllPorts(ctx)
// Phase 3: TLS checks use fresh DNS IPs and current
// port state.
w.runDNSAndPortChecks(ctx)
w.runTLSChecks(ctx)
w.saveState()
w.firstRun = false
}
@@ -197,11 +171,7 @@ func (w *Watcher) detectFirstRun() {
}
}
// runDNSChecks performs DNS resolution for all configured domains
// and hostnames, updating state with freshly resolved records.
// This must complete before port or TLS checks run so those
// checks operate on current IP addresses.
func (w *Watcher) runDNSChecks(ctx context.Context) {
func (w *Watcher) runDNSAndPortChecks(ctx context.Context) {
for _, domain := range w.config.Domains {
w.checkDomain(ctx, domain)
}
@@ -209,6 +179,8 @@ func (w *Watcher) runDNSChecks(ctx context.Context) {
for _, hostname := range w.config.Hostnames {
w.checkHostname(ctx, hostname)
}
w.checkAllPorts(ctx)
}
func (w *Watcher) checkDomain(
@@ -476,94 +448,24 @@ func (w *Watcher) detectInconsistencies(
}
func (w *Watcher) checkAllPorts(ctx context.Context) {
// Phase 1: Build current IP:port → hostname associations
// from fresh DNS data.
associations := w.buildPortAssociations()
// Phase 2: Check each unique IP:port and update state
// with the full set of associated hostnames.
for key, hostnames := range associations {
ip, port := parsePortKey(key)
if port == 0 {
continue
for _, hostname := range w.config.Hostnames {
w.checkPortsForHostname(ctx, hostname)
}
w.checkSinglePort(ctx, ip, port, hostnames)
for _, domain := range w.config.Domains {
w.checkPortsForHostname(ctx, domain)
}
// Phase 3: Remove port state entries that no longer have
// any hostname referencing them.
w.cleanupStalePorts(associations)
}
// buildPortAssociations constructs a map from IP:port keys to
// the sorted set of hostnames currently resolving to that IP.
func (w *Watcher) buildPortAssociations() map[string][]string {
assoc := make(map[string]map[string]bool)
func (w *Watcher) checkPortsForHostname(
ctx context.Context,
hostname string,
) {
ips := w.collectIPs(hostname)
allNames := make(
[]string, 0,
len(w.config.Hostnames)+len(w.config.Domains),
)
allNames = append(allNames, w.config.Hostnames...)
allNames = append(allNames, w.config.Domains...)
for _, name := range allNames {
ips := w.collectIPs(name)
for _, ip := range ips {
for _, port := range monitoredPorts {
key := fmt.Sprintf("%s:%d", ip, port)
if assoc[key] == nil {
assoc[key] = make(map[string]bool)
}
assoc[key][name] = true
}
}
}
result := make(map[string][]string, len(assoc))
for key, set := range assoc {
hostnames := make([]string, 0, len(set))
for h := range set {
hostnames = append(hostnames, h)
}
sort.Strings(hostnames)
result[key] = hostnames
}
return result
}
// parsePortKey splits an "ip:port" key into its components.
func parsePortKey(key string) (string, int) {
lastColon := strings.LastIndex(key, ":")
if lastColon < 0 {
return key, 0
}
ip := key[:lastColon]
var p int
_, err := fmt.Sscanf(key[lastColon+1:], "%d", &p)
if err != nil {
return ip, 0
}
return ip, p
}
// cleanupStalePorts removes port state entries that are no
// longer referenced by any hostname in the current DNS data.
func (w *Watcher) cleanupStalePorts(
currentAssociations map[string][]string,
) {
for _, key := range w.state.GetAllPortKeys() {
if _, exists := currentAssociations[key]; !exists {
w.state.DeletePortState(key)
w.checkSinglePort(ctx, ip, port, hostname)
}
}
}
@@ -600,7 +502,7 @@ func (w *Watcher) checkSinglePort(
ctx context.Context,
ip string,
port int,
hostnames []string,
hostname string,
) {
result, err := w.portCheck.CheckPort(ctx, ip, port)
if err != nil {
@@ -625,8 +527,8 @@ func (w *Watcher) checkSinglePort(
}
msg := fmt.Sprintf(
"Hosts: %s\nAddress: %s\nPort now %s",
strings.Join(hostnames, ", "), key, stateStr,
"Host: %s\nAddress: %s\nPort now %s",
hostname, key, stateStr,
)
w.notify.SendNotification(
@@ -639,7 +541,7 @@ func (w *Watcher) checkSinglePort(
w.state.SetPortState(key, &state.PortState{
Open: result.Open,
Hostnames: hostnames,
Hostname: hostname,
LastChecked: now,
})
}
@@ -855,38 +757,6 @@ func (w *Watcher) saveState() {
}
}
// maybeSendTestNotification sends a startup status notification
// after the first full scan completes, if SEND_TEST_NOTIFICATION
// is enabled. The message is clearly informational ("all ok")
// and not an error or anomaly alert.
func (w *Watcher) maybeSendTestNotification(ctx context.Context) {
if !w.config.SendTestNotification {
return
}
snap := w.state.GetSnapshot()
msg := fmt.Sprintf(
"dnswatcher has started and completed its initial scan.\n"+
"Monitoring %d domain(s) and %d hostname(s).\n"+
"Tracking %d port endpoint(s) and %d TLS certificate(s).\n"+
"All notification channels are working.",
len(snap.Domains),
len(snap.Hostnames),
len(snap.Ports),
len(snap.Certificates),
)
w.log.Info("sending startup test notification")
w.notify.SendNotification(
ctx,
"✅ dnswatcher startup complete",
msg,
"success",
)
}
// --- Utility functions ---
func toSet(items []string) map[string]bool {

View File

@@ -682,191 +682,6 @@ func TestGracefulShutdown(t *testing.T) {
}
}
func setupHostnameIP(
deps *testDeps,
hostname, ip string,
) {
deps.resolver.allRecords[hostname] = map[string]map[string][]string{
"ns1.example.com.": {"A": {ip}},
}
deps.portChecker.results[ip+":80"] = true
deps.portChecker.results[ip+":443"] = true
deps.tlsChecker.certs[ip+":"+hostname] = &tlscheck.CertificateInfo{
CommonName: hostname,
Issuer: "DigiCert",
NotAfter: time.Now().Add(90 * 24 * time.Hour),
SubjectAlternativeNames: []string{hostname},
}
}
func updateHostnameIP(deps *testDeps, hostname, ip string) {
deps.resolver.mu.Lock()
deps.resolver.allRecords[hostname] = map[string]map[string][]string{
"ns1.example.com.": {"A": {ip}},
}
deps.resolver.mu.Unlock()
deps.portChecker.mu.Lock()
deps.portChecker.results[ip+":80"] = true
deps.portChecker.results[ip+":443"] = true
deps.portChecker.mu.Unlock()
deps.tlsChecker.mu.Lock()
deps.tlsChecker.certs[ip+":"+hostname] = &tlscheck.CertificateInfo{
CommonName: hostname,
Issuer: "DigiCert",
NotAfter: time.Now().Add(90 * 24 * time.Hour),
SubjectAlternativeNames: []string{hostname},
}
deps.tlsChecker.mu.Unlock()
}
func TestDNSRunsBeforePortAndTLSChecks(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Hostnames = []string{"www.example.com"}
w, deps := newTestWatcher(t, cfg)
setupHostnameIP(deps, "www.example.com", "10.0.0.1")
ctx := t.Context()
w.RunOnce(ctx)
snap := deps.state.GetSnapshot()
if _, ok := snap.Ports["10.0.0.1:80"]; !ok {
t.Fatal("expected port state for 10.0.0.1:80")
}
// DNS changes to a new IP; port and TLS must pick it up.
updateHostnameIP(deps, "www.example.com", "10.0.0.2")
w.RunOnce(ctx)
snap = deps.state.GetSnapshot()
if _, ok := snap.Ports["10.0.0.2:80"]; !ok {
t.Error("port check used stale DNS: missing 10.0.0.2:80")
}
certKey := "10.0.0.2:443:www.example.com"
if _, ok := snap.Certificates[certKey]; !ok {
t.Error("TLS check used stale DNS: missing " + certKey)
}
}
func TestSendTestNotification_Enabled(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Domains = []string{"example.com"}
cfg.Hostnames = []string{"www.example.com"}
cfg.SendTestNotification = true
w, deps := newTestWatcher(t, cfg)
setupBaselineMocks(deps)
w.RunOnce(t.Context())
// RunOnce does not send the test notification — it is
// sent by Run after RunOnce completes. Call the exported
// RunOnce then check that no test notification was sent
// (only Run triggers it). We test the full path via Run.
notifications := deps.notifier.getNotifications()
if len(notifications) != 0 {
t.Errorf(
"RunOnce should not send test notification, got %d",
len(notifications),
)
}
}
func TestSendTestNotification_ViaRun(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Domains = []string{"example.com"}
cfg.Hostnames = []string{"www.example.com"}
cfg.SendTestNotification = true
cfg.DNSInterval = 24 * time.Hour
cfg.TLSInterval = 24 * time.Hour
w, deps := newTestWatcher(t, cfg)
setupBaselineMocks(deps)
ctx, cancel := context.WithCancel(t.Context())
done := make(chan struct{})
go func() {
w.Run(ctx)
close(done)
}()
// Wait for the initial scan and test notification.
time.Sleep(500 * time.Millisecond)
cancel()
<-done
notifications := deps.notifier.getNotifications()
found := false
for _, n := range notifications {
if n.Priority == "success" &&
n.Title == "✅ dnswatcher startup complete" {
found = true
}
}
if !found {
t.Errorf(
"expected startup test notification, got: %v",
notifications,
)
}
}
func TestSendTestNotification_Disabled(t *testing.T) {
t.Parallel()
cfg := defaultTestConfig(t)
cfg.Domains = []string{"example.com"}
cfg.Hostnames = []string{"www.example.com"}
cfg.SendTestNotification = false
cfg.DNSInterval = 24 * time.Hour
cfg.TLSInterval = 24 * time.Hour
w, deps := newTestWatcher(t, cfg)
setupBaselineMocks(deps)
ctx, cancel := context.WithCancel(t.Context())
done := make(chan struct{})
go func() {
w.Run(ctx)
close(done)
}()
time.Sleep(500 * time.Millisecond)
cancel()
<-done
notifications := deps.notifier.getNotifications()
for _, n := range notifications {
if n.Title == "✅ dnswatcher startup complete" {
t.Error(
"test notification should not be sent when disabled",
)
}
}
}
func TestNSFailureAndRecovery(t *testing.T) {
t.Parallel()

File diff suppressed because one or more lines are too long

View File

@@ -1,10 +0,0 @@
// Package static provides embedded static assets.
package static
import "embed"
// Static contains the embedded static assets (CSS, JS) served
// at the /s/ URL prefix.
//
//go:embed css
var Static embed.FS