Reduce DNS query timeout and limit root server fan-out #29
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The resolver uses a 5s query timeout (
queryTimeoutDuration) which is far too high. Measured median RTT to root servers from GU is ~19ms. 5s timeout with 2 retries means a single failed hop burns 10s+.Additionally,
rootServerList()returns all 13 root servers but several code paths try them sequentially (e.g.resolveNSRecursivetries first 3,resolveARecordtries first 3), whilefollowDelegation→queryServerstries ALL servers in the list if earlier ones fail.Changes needed
Reduce
queryTimeoutDurationfrom 5s to 2s. This is still >100x median RTT to roots and gives plenty of headroom for slower auth/TLD servers worldwide.Limit root server queries to 3 NSes. The root is run correctly — if 3 of 13 root servers are unreachable, something is wrong with our network, not the root. Shuffle the list and pick 3 to avoid always hitting the same ones.
Keep
maxRetries = 2— that gives 3 attempts per server, which is fine with lower timeout.Impact
This should significantly reduce test suite time (currently 39s in
internal/resolveralone, mostly from timeout-triggered retries). More importantly, production queries will fail fast instead of hanging for 10s on a single bad hop.