- Reduce cache size from 8GB to 512MB (still plenty for 2.4GB DB)
- Reduce mmap_size from 10GB to 256MB (reasonable default)
- Use default page size (4KB) instead of 8KB
- Use default WAL checkpoint interval (1000 pages)
- Remove redundant pragmas (threads, cache_spill, read_uncommitted)
- Clean up connection string to only use _txlock parameter
- Keep synchronous=OFF for performance (since we have mutex protection)
- Add internal mutex to Database struct with lock/unlock wrappers
- Add debug logging for lock acquisition and release with timing
- Wrap all write operations with database mutex
- Use _txlock=immediate in SQLite connection string
This works around apparent issues with SQLite's internal locking
not properly respecting busy_timeout in production environment.
- Add _txlock=immediate to SQLite connection string
- This prevents deadlocks by acquiring write locks at transaction start
- Multiple concurrent writers now queue properly instead of failing instantly
- Resolves 'database table is locked' errors in production
- Fix Ctrl-C shutdown by using fx.Shutdowner instead of just canceling context
- Pass context from fx lifecycle to rw.Run() for proper cancellation
- Adjust WAL settings: checkpoint at 50MB, max size 100MB
- Reduce busy timeout from 30s to 2s to fail fast on lock contention
This should fix the issue where Ctrl-C doesn't cause shutdown and improve
database responsiveness under heavy load.
- Add VACUUM on startup to defragment database
- Increase cache size from 256MB to 2GB for better performance
- Increase mmap_size from 256MB to 512MB
- Add PRAGMA analysis_limit=0 to disable automatic ANALYZE
- Remove PRAGMA optimize which could trigger slow ANALYZE
These changes should dramatically improve query performance and prevent
the 5+ second query times seen in production.
- Add debug logging for goroutines and memory usage (enabled via DEBUG=routewatch)
- Increase SQLite connection pool from 1 to 10 connections for better concurrency
- Optimize SQLite pragmas for balanced performance and safety
- Add proper shutdown handling for peering handler
- Define constants to avoid magic numbers in code
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix GetASDetails to properly handle timestamp from MAX(last_updated)
- Parse timestamp string from SQLite aggregate function result
- Add natural sorting of prefixes by IP address in AS detail view
- Sort IPv4 and IPv6 prefixes separately by network address
- Remove SQL ORDER BY since we're sorting in Go
- This fixes the issue where AS detail pages showed no prefixes
- Update GetASDetails query to GROUP BY prefix instead of using DISTINCT
- Use MAX(last_updated) to get the most recent update time for each prefix
- This prevents duplicate prefixes from appearing when announced by multiple peers
- Maintains the same prefix count and ordering
- Replace slash-to-dash conversion with proper URL encoding
- Update handlePrefixDetail and handlePrefixDetailJSON to URL decode prefix parameter
- Update handleIPRedirect to URL encode the prefix in the redirect
- Add urlEncode template function for use in templates
- Update AS detail template to URL encode prefix links
- This properly handles the slash in CIDR notation (e.g., /prefix/192.168.1.0%2F24)
- Implement handleIPRedirect handler that looks up the prefix containing an IP
- Add /ip/{ip} route to routes.go
- Reuse existing GetASInfoForIP database method which returns prefix info
- Redirect to /prefix/<prefix> page with HTTP 303 See Other status
- Handle invalid IPs (400) and IPs with no route (404)
- Implement handleASDetail() and handlePrefixDetail() HTML handlers
- Create AS detail HTML template with prefix listings
- Create prefix detail HTML template with route information
- Add timeSince template function for human-readable durations
- Update templates.go to include new templates
- Server-side rendered pages as requested (no client-side API calls)
- Move all handler functions to handlers.go
- Move setupRoutes to routes.go
- Clean up server.go to only contain core server logic
- Add missing GetASDetails and GetPrefixDetails to mockStore for tests
- Fix linter errors (magic numbers, unused parameters, blank lines)
- Add last_updated timestamp and age fields to ASInfo
- Include route's last_updated time from live_routes table
- Calculate and display age as human-readable duration
- Update both IPv4 and IPv6 queries to fetch timestamp
- Fix error handling to return 400 for invalid IPs
- Add handleIPLookup handler that uses GetASInfoForIP
- Create writeJSONError and writeJSONSuccess helper functions
- Refactor all JSON error responses to use the helpers
- Add GetASInfoForIP to Store interface
- Add mock implementation for tests
- Fix all linter warnings
- Add formatProcessingTime function to display microseconds for values < 1ms
- Show 0 µs for times < 0.001ms, X.X µs for times < 0.01ms
- Show X.XXX ms for times < 1ms, X.XX ms for times >= 1ms
- Apply formatting to both average and min/max time displays
- Add v4_ip_start and v4_ip_end columns to live_routes table
- Calculate IPv4 CIDR ranges as 32-bit integers for fast lookups
- Update PrefixHandler to populate IPv4 range fields
- Add GetASInfoForIP method with optimized IPv4 queries
- Add comprehensive tests for IP conversion functions
- Remove routingtable package entirely as database handles all routing data
- Remove snapshotter package as database contains all information
- Rename 'Connection Status' box to 'RouteWatch' and add Go version, goroutines, memory usage
- Move IPv4/IPv6 prefix counts from Database Statistics to Routing Table box
- Add Peers count to Database Statistics box
- Add go-humanize dependency for memory formatting
- Update server to include new metrics in API responses
- Remove RoutingTableHandler as PrefixHandler maintains live_routes table
- Update server to get route counts from database instead of in-memory routing table
- Add GetLiveRouteCounts method to database for IPv4/IPv6 route counts
- Use metrics tracker in PrefixHandler for route update rates
- Remove snapshotter entirely as database contains all information
- Update tests to work without routing table
- Increase PrefixHandler queue size to 500k and batch size to 25k
- Set SQLite PRAGMA synchronous=OFF for faster writes (trades durability)
- Increase SQLite cache to 1GB and mmap to 512MB
- Increase WAL checkpoint interval to 10000 pages
- Set page size to 8KB for better performance
- Increase busy timeout to 30 seconds
- Keep single connection to avoid SQLite locking issues
- Rename asn_peerings table to peerings
- Change columns from from_asn_id/to_asn_id to as_a/as_b (integers)
- Remove foreign key constraints to asns table
- Update RecordPeering to use AS numbers directly
- Add validation in RecordPeering to ensure:
- Both AS numbers are > 0
- AS numbers are different
- as_a is always lower than as_b (normalized)
- Update PeeringHandler to no longer need ASN cache
- Simplify the code by removing unnecessary ASN lookups
- Create PeeringHandler for asn_peerings table maintenance
- Rename DBHandler to ASHandler (now only handles asns table)
- Move prefixes table maintenance to PrefixHandler
- Optimize PeeringHandler with in-memory AS path tracking:
- Stores AS paths in memory with timestamps
- Processes peerings in batch every 2 minutes
- Prunes old paths (>30 minutes) every 5 minutes
- Normalizes peerings with lower AS number first
- Each handler now has a single responsibility:
- ASHandler: asns table
- PeerHandler: bgp_peers table
- PrefixHandler: prefixes and live_routes tables
- PeeringHandler: asn_peerings table
- DBHandler now only maintains asns and asn_peerings tables
- PrefixHandler maintains both prefixes and live_routes tables
- This consolidates all prefix-related operations in one handler
- Added new live_routes table with mask_length column for tracking CIDR prefix lengths
- Updated PrefixHandler to maintain live routing table with additions and deletions
- Added route expiration functionality (5 minute timeout) to in-memory routing table
- Added prefix distribution stats showing count of prefixes by mask length
- Added IPv4/IPv6 prefix distribution cards to status page
- Updated database interface with UpsertLiveRoute, DeleteLiveRoute, and GetPrefixDistribution
- Set all handler queue depths to 50000 for consistency
- Doubled DBHandler batch size to 32000 for better throughput
- Fixed withdrawal handling to delete routes when origin ASN is available
- Renamed BatchedDatabaseHandler to DBHandler
- Renamed BatchedPeerHandler to PeerHandler
- Quadrupled DBHandler batch size from 4000 to 16000
- Created new PrefixHandler using same batching strategy to maintain routing table in database
- Removed verbose batch flush logging from all handlers
- Updated app.go to use renamed handlers and register PrefixHandler
- Fixed test configuration to enable batched database writes
- Change source location format from separate source and line fields
to a single source field with format 'file.go:linenum'
- This provides a more concise and standard format for source locations
- Create internal/logger package with Logger wrapper around slog
- Logger automatically adds source file, line number, and function name to all log entries
- Use golang.org/x/term to properly detect if stdout is a terminal
- Replace all slog.Logger usage with logger.Logger throughout the codebase
- Remove verbose logging from database GetStats() method
- Update all constructors and dependencies to use the new logger
- Increase batch size from 100/50 to 500 for both handlers
- Increase batch timeout from 100-200ms to 5 seconds
- This will reduce database write frequency and improve throughput
for high-volume BGP update streams
- Add BatchedDatabaseHandler that batches prefix, ASN, and peering operations
- Add BatchedPeerHandler that batches peer update operations
- Batch operations are deduped and flushed every 100-200ms or when batch size is reached
- Add EnableBatchedDatabaseWrites config option (enabled by default)
- Properly flush remaining batches on shutdown
- This significantly reduces database write pressure and improves throughput
- Remove old database Config struct and related functions
- Update database.New() to accept config.Config parameter
- Update routingtable.New() to accept config.Config parameter
- Update snapshotter.New() to accept config.Config parameter
- Simplify fx module providers in app.go
- Fix truthiness check for environment variables
- Handle empty state directory gracefully in routing table and snapshotter
- Update all tests to use empty state directory for testing
- Add GetHandlerStats() method to streamer to expose handler metrics
- Include queue length/capacity, processed/dropped counts, timing stats
- Update API to include handler_stats in response
- Add dynamic handler stats display to status page HTML
- Shows separate status box for each handler with all metrics
- Add database file size tracking to Stats struct and GetStats()
- Move routing table metrics to separate 'Routing Table' status box
- Add IPv4/IPv6 updates per second to routing table metrics
- Database box now shows: ASNs, prefixes, peerings, and database size
- Routing table box shows: live routes, IPv4/IPv6 counts, and update rates
- Move GetDetailedStats() call outside of read lock to avoid deadlock
- Add timing logs to identify performance bottlenecks during snapshot
- Log duration for copying routes, marshaling JSON, and writing to disk
- Add Shutdown() method to RouteWatch with mutex-protected shutdown flag
- Move all cleanup logic from Run() to Shutdown()
- Call Shutdown() from fx OnStop hook
- This ensures snapshotter gets called during graceful shutdown
- Remove immediate snapshot when periodic goroutine starts
- Fix variable shadowing issue in snapshotter creation
- Add debug logging for snapshotter shutdown
- Snapshots now only occur after 10 minutes or on shutdown
- Create snapshotter package with periodic (10 min) and on-demand snapshots
- Add JSON serialization with gzip compression and atomic file writes
- Update routing table to track AddedAt time for each route
- Load snapshots on startup, filtering out stale routes (>30 minutes old)
- Add ROUTEWATCH_DISABLE_SNAPSHOTTER env var for tests
- Use OS-appropriate state directories (macOS: ~/Library/Application Support, Linux: /var/lib or XDG_STATE_HOME)
- Update server Stats and StatsResponse structs to include ipv4_routes and ipv6_routes
- Fetch detailed routing table stats to get IPv4/IPv6 breakdown
- Add IPv4 Routes and IPv6 Routes display to HTML status page
- Change metric values to monospace font and remove bold styling
- Add handle and description columns to asns table
- Look up ASN info using asinfo package when creating new ASNs
- Remove noisy debug logging for individual route updates
- Add IPv4/IPv6 route counters and update rate tracking
- Log routing table statistics every 15 seconds with IPv4/IPv6 breakdown
- Track updates per second for both IPv4 and IPv6 routes separately
- Remove live_routes table from SQL schema and all related indexes
- Create new internal/routingtable package with thread-safe RoutingTable
- Implement RouteKey-based indexing with secondary indexes for efficient lookups
- Add RoutingTableHandler to manage in-memory routes separately from database
- Update DatabaseHandler to only handle persistent database operations
- Wire up RoutingTable through fx dependency injection
- Update server to get live route count from routing table instead of database
- Remove LiveRoutes field from database.Stats struct
- Update tests to work with new architecture
- Replace modernc.org/sqlite with github.com/mattn/go-sqlite3
- Update connection string for go-sqlite3 syntax
- Keep all performance optimizations and pragmas
The CGO driver may provide better performance for write-heavy
workloads compared to the pure Go implementation.
- Set concurrent handlers back to 100 (from 200)
- Set slow query threshold to 50ms (from 10ms)
These values provide a good balance between throughput and
system resource usage.
- Add semaphore to limit concurrent message handlers to 100
- Drop messages when at capacity instead of creating unbounded goroutines
- Track and log dropped messages (every 1000 drops)
- Remove nested goroutine spawning in handler loop
- Add metrics for dropped messages and active handlers
This prevents the memory usage from growing unboundedly when the
database can't keep up with the incoming BGP message stream. Messages
are dropped gracefully rather than causing OOM errors.
Queries in the 50-70ms range are acceptable for now given SQLite's
write serialization constraints. Setting threshold to 100ms to focus
on truly problematic queries.