Commit Graph

54 Commits

Author SHA1 Message Date
40d7f0185b Optimize database batch operations with prepared statements
- Add prepared statements to all batch operations for better performance
- Fix database lock contention by properly batching operations
- Update SQLite settings for extreme performance (8GB cache, sync OFF)
- Add proper error handling for statement closing
- Update tests to properly track batch operations
2025-07-28 17:21:40 +02:00
b9b0792df9 Fix shutdown handling and optimize SQLite settings
- Fix Ctrl-C shutdown by using fx.Shutdowner instead of just canceling context
- Pass context from fx lifecycle to rw.Run() for proper cancellation
- Adjust WAL settings: checkpoint at 50MB, max size 100MB
- Reduce busy timeout from 30s to 2s to fail fast on lock contention

This should fix the issue where Ctrl-C doesn't cause shutdown and improve
database responsiveness under heavy load.
2025-07-28 16:52:52 +02:00
21921a170c Optimize database performance to fix slow queries
- Add VACUUM on startup to defragment database
- Increase cache size from 256MB to 2GB for better performance
- Increase mmap_size from 256MB to 512MB
- Add PRAGMA analysis_limit=0 to disable automatic ANALYZE
- Remove PRAGMA optimize which could trigger slow ANALYZE

These changes should dramatically improve query performance and prevent
the 5+ second query times seen in production.
2025-07-28 16:47:59 +02:00
78d6e17c76 Add debug logging and optimize SQLite performance
- Add debug logging for goroutines and memory usage (enabled via DEBUG=routewatch)
- Increase SQLite connection pool from 1 to 10 connections for better concurrency
- Optimize SQLite pragmas for balanced performance and safety
- Add proper shutdown handling for peering handler
- Define constants to avoid magic numbers in code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-28 15:45:06 +02:00
9b649c98c9 Fix AS detail view and add prefix sorting
- Fix GetASDetails to properly handle timestamp from MAX(last_updated)
- Parse timestamp string from SQLite aggregate function result
- Add natural sorting of prefixes by IP address in AS detail view
- Sort IPv4 and IPv6 prefixes separately by network address
- Remove SQL ORDER BY since we're sorting in Go
- This fixes the issue where AS detail pages showed no prefixes
2025-07-28 04:42:10 +02:00
48db8b9edf Fix AS detail view to show unique prefixes
- Update GetASDetails query to GROUP BY prefix instead of using DISTINCT
- Use MAX(last_updated) to get the most recent update time for each prefix
- This prevents duplicate prefixes from appearing when announced by multiple peers
- Maintains the same prefix count and ordering
2025-07-28 04:36:22 +02:00
df31cf880a Fix prefix URL routing by using URL encoding
- Replace slash-to-dash conversion with proper URL encoding
- Update handlePrefixDetail and handlePrefixDetailJSON to URL decode prefix parameter
- Update handleIPRedirect to URL encode the prefix in the redirect
- Add urlEncode template function for use in templates
- Update AS detail template to URL encode prefix links
- This properly handles the slash in CIDR notation (e.g., /prefix/192.168.1.0%2F24)
2025-07-28 04:34:34 +02:00
af9ff258b1 Add /ip/<ip> route that redirects to prefix detail page
- Implement handleIPRedirect handler that looks up the prefix containing an IP
- Add /ip/{ip} route to routes.go
- Reuse existing GetASInfoForIP database method which returns prefix info
- Redirect to /prefix/<prefix> page with HTTP 303 See Other status
- Handle invalid IPs (400) and IPs with no route (404)
2025-07-28 04:31:22 +02:00
aeeb5e7d7d Implement AS and prefix detail pages
- Implement handleASDetail() and handlePrefixDetail() HTML handlers
- Create AS detail HTML template with prefix listings
- Create prefix detail HTML template with route information
- Add timeSince template function for human-readable durations
- Update templates.go to include new templates
- Server-side rendered pages as requested (no client-side API calls)
2025-07-28 04:26:20 +02:00
27ae80ea2e Refactor server package: split handlers and routes into separate files
- Move all handler functions to handlers.go
- Move setupRoutes to routes.go
- Clean up server.go to only contain core server logic
- Add missing GetASDetails and GetPrefixDetails to mockStore for tests
- Fix linter errors (magic numbers, unused parameters, blank lines)
2025-07-28 04:00:12 +02:00
2fc24bb937 Add route age information to IP lookup API
- Add last_updated timestamp and age fields to ASInfo
- Include route's last_updated time from live_routes table
- Calculate and display age as human-readable duration
- Update both IPv4 and IPv6 queries to fetch timestamp
- Fix error handling to return 400 for invalid IPs
2025-07-28 03:44:19 +02:00
691710bc7c Add /api/v1/ip/<ip> endpoint for IP to AS lookups
- Add handleIPLookup handler that uses GetASInfoForIP
- Create writeJSONError and writeJSONSuccess helper functions
- Refactor all JSON error responses to use the helpers
- Add GetASInfoForIP to Store interface
- Add mock implementation for tests
- Fix all linter warnings
2025-07-28 03:31:53 +02:00
afb916036c Fix handler processing time display for sub-millisecond values
- Add formatProcessingTime function to display microseconds for values < 1ms
- Show 0 µs for times < 0.001ms, X.X µs for times < 0.01ms
- Show X.XXX ms for times < 1ms, X.XX ms for times >= 1ms
- Apply formatting to both average and min/max time displays
2025-07-28 03:26:16 +02:00
13047b5cb9 Add IPv4 range optimization for IP to AS lookups
- Add v4_ip_start and v4_ip_end columns to live_routes table
- Calculate IPv4 CIDR ranges as 32-bit integers for fast lookups
- Update PrefixHandler to populate IPv4 range fields
- Add GetASInfoForIP method with optimized IPv4 queries
- Add comprehensive tests for IP conversion functions
2025-07-28 03:23:25 +02:00
ae89468a1b Remove routing table and snapshotter packages, update status page
- Remove routingtable package entirely as database handles all routing data
- Remove snapshotter package as database contains all information
- Rename 'Connection Status' box to 'RouteWatch' and add Go version, goroutines, memory usage
- Move IPv4/IPv6 prefix counts from Database Statistics to Routing Table box
- Add Peers count to Database Statistics box
- Add go-humanize dependency for memory formatting
- Update server to include new metrics in API responses
2025-07-28 03:11:36 +02:00
d929f24f80 Remove RoutingTableHandler and snapshotter, use database for route stats
- Remove RoutingTableHandler as PrefixHandler maintains live_routes table
- Update server to get route counts from database instead of in-memory routing table
- Add GetLiveRouteCounts method to database for IPv4/IPv6 route counts
- Use metrics tracker in PrefixHandler for route update rates
- Remove snapshotter entirely as database contains all information
- Update tests to work without routing table
2025-07-28 03:02:44 +02:00
cb1f4d9052 Add route update metrics tracking to PrefixHandler
- Add RecordIPv4Update and RecordIPv6Update to metrics package
- Add SetMetricsTracker method to PrefixHandler
- Track IPv4/IPv6 route updates when processing announcements
- Add GetMetricsTracker method to Streamer to expose metrics
2025-07-28 02:55:27 +02:00
bc640b0b37 Reduce all handler queue sizes to 100,000 2025-07-28 02:50:05 +02:00
7d814c9d2d Optimize SQLite and PrefixHandler for better performance
- Increase PrefixHandler queue size to 500k and batch size to 25k
- Set SQLite PRAGMA synchronous=OFF for faster writes (trades durability)
- Increase SQLite cache to 1GB and mmap to 512MB
- Increase WAL checkpoint interval to 10000 pages
- Set page size to 8KB for better performance
- Increase busy timeout to 30 seconds
- Keep single connection to avoid SQLite locking issues
2025-07-28 02:40:17 +02:00
54bb0ba1cb Simplify peerings table to store AS numbers directly
- Rename asn_peerings table to peerings
- Change columns from from_asn_id/to_asn_id to as_a/as_b (integers)
- Remove foreign key constraints to asns table
- Update RecordPeering to use AS numbers directly
- Add validation in RecordPeering to ensure:
  - Both AS numbers are > 0
  - AS numbers are different
  - as_a is always lower than as_b (normalized)
- Update PeeringHandler to no longer need ASN cache
- Simplify the code by removing unnecessary ASN lookups
2025-07-28 02:36:15 +02:00
1157003db7 Refactor database handlers and optimize PeeringHandler
- Create PeeringHandler for asn_peerings table maintenance
- Rename DBHandler to ASHandler (now only handles asns table)
- Move prefixes table maintenance to PrefixHandler
- Optimize PeeringHandler with in-memory AS path tracking:
  - Stores AS paths in memory with timestamps
  - Processes peerings in batch every 2 minutes
  - Prunes old paths (>30 minutes) every 5 minutes
  - Normalizes peerings with lower AS number first
- Each handler now has a single responsibility:
  - ASHandler: asns table
  - PeerHandler: bgp_peers table
  - PrefixHandler: prefixes and live_routes tables
  - PeeringHandler: asn_peerings table
2025-07-28 02:31:04 +02:00
eaa11b5f8d Move prefixes table maintenance from DBHandler to PrefixHandler
- DBHandler now only maintains asns and asn_peerings tables
- PrefixHandler maintains both prefixes and live_routes tables
- This consolidates all prefix-related operations in one handler
2025-07-28 02:16:12 +02:00
8b43882526 Increase batch sizes to 10000 and queue sizes to 200000, reduce timeout to 2s 2025-07-28 02:11:05 +02:00
eda90d96a9 Remove debug logging for withdrawals without origin ASN 2025-07-28 02:07:33 +02:00
3c46087976 Add live routing table with CIDR mask length tracking
- Added new live_routes table with mask_length column for tracking CIDR prefix lengths
- Updated PrefixHandler to maintain live routing table with additions and deletions
- Added route expiration functionality (5 minute timeout) to in-memory routing table
- Added prefix distribution stats showing count of prefixes by mask length
- Added IPv4/IPv6 prefix distribution cards to status page
- Updated database interface with UpsertLiveRoute, DeleteLiveRoute, and GetPrefixDistribution
- Set all handler queue depths to 50000 for consistency
- Doubled DBHandler batch size to 32000 for better throughput
- Fixed withdrawal handling to delete routes when origin ASN is available
2025-07-28 01:51:42 +02:00
cea7c3dfd3 Rename handlers and add PrefixHandler for database routing table
- Renamed BatchedDatabaseHandler to DBHandler
- Renamed BatchedPeerHandler to PeerHandler
- Quadrupled DBHandler batch size from 4000 to 16000
- Created new PrefixHandler using same batching strategy to maintain routing table in database
- Removed verbose batch flush logging from all handlers
- Updated app.go to use renamed handlers and register PrefixHandler
- Fixed test configuration to enable batched database writes
2025-07-28 01:37:19 +02:00
3aef3f9a07 Format logger source location as file.go:linenum
- Change source location format from separate source and line fields
  to a single source field with format 'file.go:linenum'
- This provides a more concise and standard format for source locations
2025-07-28 01:16:29 +02:00
67f6b78aaa Add custom logger with source location tracking and remove verbose database logs
- Create internal/logger package with Logger wrapper around slog
- Logger automatically adds source file, line number, and function name to all log entries
- Use golang.org/x/term to properly detect if stdout is a terminal
- Replace all slog.Logger usage with logger.Logger throughout the codebase
- Remove verbose logging from database GetStats() method
- Update all constructors and dependencies to use the new logger
2025-07-28 01:14:51 +02:00
3f06955214 Increase batch sizes and timeouts for better throughput
- Increase batch size from 100/50 to 500 for both handlers
- Increase batch timeout from 100-200ms to 5 seconds
- This will reduce database write frequency and improve throughput
  for high-volume BGP update streams
2025-07-28 01:03:09 +02:00
155c08d735 Implement batched database operations for improved performance
- Add BatchedDatabaseHandler that batches prefix, ASN, and peering operations
- Add BatchedPeerHandler that batches peer update operations
- Batch operations are deduped and flushed every 100-200ms or when batch size is reached
- Add EnableBatchedDatabaseWrites config option (enabled by default)
- Properly flush remaining batches on shutdown
- This significantly reduces database write pressure and improves throughput
2025-07-28 01:01:27 +02:00
d15a5e91b9 Inject Config as dependency for database, routing table, and snapshotter
- Remove old database Config struct and related functions
- Update database.New() to accept config.Config parameter
- Update routingtable.New() to accept config.Config parameter
- Update snapshotter.New() to accept config.Config parameter
- Simplify fx module providers in app.go
- Fix truthiness check for environment variables
- Handle empty state directory gracefully in routing table and snapshotter
- Update all tests to use empty state directory for testing
2025-07-28 00:55:09 +02:00
1a0622efaa Add handler queue metrics to status page
- Add GetHandlerStats() method to streamer to expose handler metrics
- Include queue length/capacity, processed/dropped counts, timing stats
- Update API to include handler_stats in response
- Add dynamic handler stats display to status page HTML
- Shows separate status box for each handler with all metrics
2025-07-28 00:32:37 +02:00
6593a7be76 Add database file size and reorganize status page
- Add database file size tracking to Stats struct and GetStats()
- Move routing table metrics to separate 'Routing Table' status box
- Add IPv4/IPv6 updates per second to routing table metrics
- Database box now shows: ASNs, prefixes, peerings, and database size
- Routing table box shows: live routes, IPv4/IPv6 counts, and update rates
2025-07-28 00:27:54 +02:00
5bd3add59b Fix deadlock in snapshotter and add timing logs
- Move GetDetailedStats() call outside of read lock to avoid deadlock
- Add timing logs to identify performance bottlenecks during snapshot
- Log duration for copying routes, marshaling JSON, and writing to disk
2025-07-28 00:16:01 +02:00
fa9b086629 Fix shutdown sequence to ensure final snapshot is taken
- Add Shutdown() method to RouteWatch with mutex-protected shutdown flag
- Move all cleanup logic from Run() to Shutdown()
- Call Shutdown() from fx OnStop hook
- This ensures snapshotter gets called during graceful shutdown
2025-07-28 00:11:54 +02:00
52cdcd5785 Fix snapshotter initialization and remove initial snapshot on startup
- Remove immediate snapshot when periodic goroutine starts
- Fix variable shadowing issue in snapshotter creation
- Add debug logging for snapshotter shutdown
- Snapshots now only occur after 10 minutes or on shutdown
2025-07-28 00:06:39 +02:00
ae2ef2ae0c Implement routing table snapshotter with automatic loading on startup
- Create snapshotter package with periodic (10 min) and on-demand snapshots
- Add JSON serialization with gzip compression and atomic file writes
- Update routing table to track AddedAt time for each route
- Load snapshots on startup, filtering out stale routes (>30 minutes old)
- Add ROUTEWATCH_DISABLE_SNAPSHOTTER env var for tests
- Use OS-appropriate state directories (macOS: ~/Library/Application Support, Linux: /var/lib or XDG_STATE_HOME)
2025-07-28 00:03:19 +02:00
283f2ddbf2 Add separate IPv4/IPv6 route counts to status page and API
- Update server Stats and StatsResponse structs to include ipv4_routes and ipv6_routes
- Fetch detailed routing table stats to get IPv4/IPv6 breakdown
- Add IPv4 Routes and IPv6 Routes display to HTML status page
- Change metric values to monospace font and remove bold styling
2025-07-27 23:46:47 +02:00
1d05372899 Fix linting errors for magic numbers in handler queue sizes
- Define constants for all handler queue capacities
- Fix integer overflow warning in metrics calculation
- Add missing blank lines before continue statements
2025-07-27 23:38:38 +02:00
76ec9f68b7 Add ASN info lookup and periodic routing table statistics
- Add handle and description columns to asns table
- Look up ASN info using asinfo package when creating new ASNs
- Remove noisy debug logging for individual route updates
- Add IPv4/IPv6 route counters and update rate tracking
- Log routing table statistics every 15 seconds with IPv4/IPv6 breakdown
- Track updates per second for both IPv4 and IPv6 routes separately
2025-07-27 23:25:23 +02:00
a555a1dee2 Replace live_routes database table with in-memory routing table
- Remove live_routes table from SQL schema and all related indexes
- Create new internal/routingtable package with thread-safe RoutingTable
- Implement RouteKey-based indexing with secondary indexes for efficient lookups
- Add RoutingTableHandler to manage in-memory routes separately from database
- Update DatabaseHandler to only handle persistent database operations
- Wire up RoutingTable through fx dependency injection
- Update server to get live route count from routing table instead of database
- Remove LiveRoutes field from database.Stats struct
- Update tests to work with new architecture
2025-07-27 23:16:19 +02:00
b49d3ce88c Switch back to CGO SQLite driver
- Replace modernc.org/sqlite with github.com/mattn/go-sqlite3
- Update connection string for go-sqlite3 syntax
- Keep all performance optimizations and pragmas

The CGO driver may provide better performance for write-heavy
workloads compared to the pure Go implementation.
2025-07-27 22:57:53 +02:00
d328fb0942 Adjust concurrent handlers and query threshold
- Set concurrent handlers back to 100 (from 200)
- Set slow query threshold to 50ms (from 10ms)

These values provide a good balance between throughput and
system resource usage.
2025-07-27 22:56:37 +02:00
e14c89e4f1 Implement backpressure mechanism to prevent memory exhaustion
- Add semaphore to limit concurrent message handlers to 100
- Drop messages when at capacity instead of creating unbounded goroutines
- Track and log dropped messages (every 1000 drops)
- Remove nested goroutine spawning in handler loop
- Add metrics for dropped messages and active handlers

This prevents the memory usage from growing unboundedly when the
database can't keep up with the incoming BGP message stream. Messages
are dropped gracefully rather than causing OOM errors.
2025-07-27 22:51:02 +02:00
32a540e0bf Increase slow query threshold to 100ms
Queries in the 50-70ms range are acceptable for now given SQLite's
write serialization constraints. Setting threshold to 100ms to focus
on truly problematic queries.
2025-07-27 22:45:28 +02:00
1f8ececedf Optimize database write performance
- Use SQLite UPSERT for UpdateLiveRoute to eliminate SELECT+UPDATE/INSERT pattern
- Add connection string optimizations (synchronous=NORMAL, cache_size)
- Add WAL checkpoint configuration for better write performance
- Add index on live_routes(id) for UPDATE operations
- Set WAL autocheckpoint to 1000 pages

These changes should reduce write amplification and improve overall
throughput by:
1. Reducing from 2 queries to 1 for route updates
2. Better WAL checkpoint management
3. More efficient UPDATE operations with dedicated index
2025-07-27 22:42:49 +02:00
397ccd21fe Increase slow query threshold to 50ms
The 10ms threshold was too noisy - queries in the 10-20ms range are
actually performing well after the index optimizations. Setting the
threshold to 50ms will help identify truly problematic queries.
2025-07-27 22:40:32 +02:00
6f0f217379 Add additional database indexes for better query performance
- Add covering index for SELECT id queries (avoids table lookups)
- Add stats index for COUNT queries on live_routes
- Add index on asns.number for ASN lookups
- Add index on bgp_peers.peer_ip for peer lookups

These indexes should further reduce query times, especially:
- The covering index includes the id column, eliminating the need
  to access the table after index lookup
- The stats index helps with COUNT(*) queries
- The asns.number index speeds up ASN lookups in transactions
2025-07-27 22:39:31 +02:00
4a3d71d307 Extract database schema to separate SQL file
- Move schema from database.go to schema.sql
- Use go:embed to include schema at compile time
- Add 'out' debug file to .gitignore

This makes the schema more maintainable and easier to review.
2025-07-27 22:38:51 +02:00
97a06e14f2 Add SQL query logging and performance improvements
- Implement comprehensive SQL query logging for queries over 10ms
- Add logging wrapper methods for all database operations
- Replace timing code in GetStats with simple info log messages
- Add missing database indexes for better query performance:
  - idx_live_routes_lookup for common prefix/origin/peer lookups
  - idx_live_routes_withdraw for withdrawal updates
  - idx_prefixes_prefix for prefix lookups
  - idx_asn_peerings_lookup for peering relationship queries
- Increase SQLite cache size to 512MB
- Add performance-oriented SQLite pragmas
- Extract HTML templates to separate files using go:embed
- Add JSON response middleware with @meta field (like bgpview.io API)
- Fix concurrent map write errors in HTTP handlers
- Add request timeout handling with proper JSON error responses

These changes significantly improve database query performance and
provide visibility into slow queries for debugging purposes.
2025-07-27 22:34:48 +02:00