Release branch v2.0.0#266
Open
cuonglm wants to merge 141 commits into
Open
Conversation
f38c9ae to
90eddb8
Compare
1c74fc4 to
f9d0263
Compare
0d4b697 to
d0e66b8
Compare
de415df to
1fbbb14
Compare
This commit reverts changes from v1.4.5 to v1.4.7, to prepare for v2.0.0 branch codes. Changes includes in these releases have been included in v2.0.0 branch already. Details: Revert "feat: add --rfc1918 flag for explicit LAN client support" This reverts commit 0e3f764. Revert "Upgrade quic-go to v0.54.0" This reverts commit e52402e. Revert "docs: add known issues documentation for Darwin 15.5 upgrade issue" This reverts commit 2133f31. Revert "start mobile library with provision id and custom hostname." This reverts commit a198a5c. Revert "Add OPNsense new lease file" This reverts commit 7af29cf. Revert ".github/workflows: bump go version to 1.24.x" This reverts commit ce1a165. Revert "fix: ensure upstream health checks can handle large DNS responses" This reverts commit fd48e6d. Revert "refactor(prog): move network monitoring outside listener loop" This reverts commit d71d134. Revert "fix: correct Windows API constants to fix domain join detection" This reverts commit 21855df. Revert "refactor: move network monitoring to separate goroutine" This reverts commit 66e2d3a. Revert "refactor: extract empty string filtering to reusable function" This reverts commit 36a7423. Revert "cmd/cli: ignore empty positional argument for start command" This reverts commit e616091. Revert "Avoiding Windows runners file locking issue" This reverts commit 0948161. Revert "refactor: split selfUpgradeCheck into version check and upgrade execution" This reverts commit ce29b5d. Revert "internal/router: support Ubios 4.3+" This reverts commit de24fa2. Revert "internal/router: support Merlin Guest Network Pro VLAN" This reverts commit 6663925.
So setting up logging for ctrld binary and ctrld packages could be done more easily, decouple the required setup for interactive vs daemon running. This is the first step toward replacing rs/zerolog libary with a different logging library.
By adding a logger field to "prog" struct, and use this field inside its method instead of always accessing global mainLog variable. This at least ensure more consistent usage of the logger during ctrld prog runtime, and also help refactoring the code more easily in the future (like replacing the logger library).
Make nameserver resolution functions more consistent and accessible: - Rename currentNameserversFromResolvconf to CurrentNameserversFromResolvconf - Move function to public API for better reusability - Update all internal references to use the new public API - Add comprehensive godoc comments for nameserver functions - Improve code organization by centralizing DNS resolution logic This change makes the nameserver resolution functionality more maintainable and easier to use across different parts of the codebase.
- Add timeouts and proper cleanup in Test_osResolver_Singleflight: * Implement context timeout * Add proper PacketConn cleanup * Fix race conditions in error handling * Improve atomic value reporting - Enhance Test_osResolver_HotCache: * Add proper timeout context * Implement more reliable cache verification * Fix potential resource leaks * Add deterministic polling intervals - Add thread safety to Test_Edns0_CacheReply: * Implement proper timeout context * Add proper resource cleanup * Fix concurrent operations handling The changes improve overall test suite reliability by addressing resource management, timeout handling, and thread safety concerns across multiple DNS resolver test cases.
Move client information related functions from client_info_*.go to desktop_*.go files to better organize platform-specific code and separate desktop functionality from shared code. No functional changes.
Improve documentation for Test_prog_parseResolvConfNameservers to clarify that the old implementation was removed as part of code deduplication effort. The code for handling resolv.conf was unified into the resolvconffile package to provide a consistent interface across the codebase. This change provides better context for future developers about why the refactoring was done and what benefits it brings.
Add context parameter to validInterfacesMap for better error handling and logging. Move Windows-specific network adapter validation logic to the ctrld package. Key changes include: - Add context parameter to validInterfacesMap across all platforms - Move Windows validInterfaces to ctrld.ValidInterfaces - Improve error handling for virtual interface detection on Linux - Update all callers to pass appropriate context This change improves error reporting and makes the interface validation code more maintainable across different platforms.
Move getDNS type definition from dns.go to os_linux.go where it is used. Remove the now-empty dns.go file. This change improves code organization by keeping platform-specific types with their implementations.
Break down the large DNS handling function into smaller, focused functions with clear responsibilities: - Extract handleDNSQuery from serveDNS handler function - Create dedicated startListeners function for listener management - Add standardQueryRequest struct to encapsulate query parameters - Split special domain handling into separate function - Add descriptive comments for each new function - Improve variable names for better clarity (e.g., startTime vs t) This refactoring improves code maintainability and readability without changing the core DNS proxy functionality.
By looking for any additional dnsmasq configuration files under /tmp/etc, and handling them like default one.
This change improves compatibility with newer UniFi OS versions while maintaining backward compatibility with UniFi OS 4.2 and earlier. The refactoring also reduces code duplication and improves maintainability by centralizing dnsmasq configuration path logic.
upstreamConfigFor() used strings.Contains(":") to decide whether to
append ":53", which always evaluates true for IPv6 addresses. This left
bare addresses like "2a0d:6fc0:9b0:3600::1" without brackets or port,
causing net.Dial to reject with "too many colons in address".
Use net.JoinHostPort() which handles IPv6 bracketing automatically,
producing "[2a0d:6fc0:9b0:3600::1]:53".
- Update comment in ensurePFAnchorReference: pfctl -sn returns rdr-anchor only (nat-anchor not used by ctrld) - Update nat-anchor table entry in pf-dns-intercept.md - Add pf nuances 10-16 from investigation: cross-AF redirect, block return, sendmsg EINVAL, nat-on-lo0, raw sockets, DIOCNATLOOK, and the pragmatic IPv6 block solution
When port 53 is taken (e.g. by mDNSResponder), ctrld failed with 'could not find available listen ip and port' instead of falling back to port 5354. Root cause: tryUpdateListenerConfig() checked the dnsIntercept bool, which is derived in prog.run() AFTER listener config is resolved. Fix: check interceptMode string directly (CLI flag + config fallback) in a new tryUpdateListenerConfigIntercept() that tries 127.0.0.1:53 then 127.0.0.1:5354. Also updates buildPFAnchorRules() to use the actual listener IP/port from config instead of hardcoded 127.0.0.1:53, so pf rules redirect to wherever ctrld is actually listening.
Pass a quic.Config with KeepAlivePeriod (15s) to DoQ dial calls instead of nil, so pooled connections send periodic QUIC PINGs to stay alive and detect dead paths proactively. Also add IdleTimeoutError to the DoQ retry conditions alongside io.EOF, so stale pooled connections trigger a transparent retry instead of propagating as a query failure.
Replace conn.OpenStream (non-blocking) with conn.OpenStreamSync so that the resolver waits for the server's MAX_STREAMS credit replenishment frame instead of immediately failing when the stream limit is temporarily exhausted. Also retry on StreamLimitReachedError as defense-in-depth for servers that are slow or fail to send MAX_STREAMS updates.
SetSelfIP unconditionally accessed t.dhcp, but t.dhcp is only initialized when DHCP discovery is enabled. A network change event can fire SetSelfIP regardless of the discovery configuration, causing a nil pointer dereference. Guard the t.dhcp access with a nil check so the self IP is still updated on the Table even when DHCP discovery is disabled.
README.md: fix Go version requirement (1.23 -> 1.24), update OS support architectures (add arm64/mipsle/mips64 for Linux, arm64 for Windows/FreeBSD, remove windows/arm), fix broken PowerShell install path, demote H1 section headings to H2.
When multiple network changes fire in quick succession (e.g., VPN disconnect + interface swap), the second handleRecovery() call cancels the first but inherits stale DoH transports, causing DNS blackouts of up to 30 seconds. Three changes to reduce worst-case recovery from ~30s to <3s: 1. ForceReBootstrap() on recovery entry — closes dead connections and creates fresh transports synchronously before probing, replacing the lazy ReBootstrap() flag that left stale connections for probes to hit. 2. Debounce handleRecovery() for network changes (500ms window) — only the recovery flow is debounced; all other state updates (IP, pf anchor, VPN DNS, tunnel checks) still run immediately on every event. This eliminates the cancel-and-restart race without missing state. 3. Combined effect: ForceReBootstrap closes old in-flight connections (closeTransports) and builds new ones (SetupTransport) atomically, so recovery probes never inherit dead connections from a prior recovery attempt.
Add file-backed persistence to the internal logWriter so runtime logs survive service restarts. When internal logging is enabled (CD mode, no explicit log_path), writes are teed to both the existing in-memory ring buffer and a rotated file on disk (ctrld.log in the home directory). File rotation: 5MB max with 1 backup (ctrld.log.1), so max ~10MB on disk. Log view/send now reads from the persisted files (including backup) to provide complete history across restarts. Live tail continues to use the in-memory subscriber mechanism unchanged. Activation: same conditions as existing internal logging — CD mode only, no log_path configured. No new config options or dependencies.
3e53fd4 to
4753507
Compare
When third-party VPN software (e.g., OpenVPN) installs WFP block filters via block-outside-dns, all DNS traffic to non-tunnel interfaces is blocked — including DNS to 127.0.0.1 (ctrld's NRPT target). This breaks DNS mode interception because the NRPT catch-all rule routes queries to loopback, but WFP blocks the connection before it reaches ctrld's listener. Fix: after exhausting all NRPT recovery attempts, activate a minimal WFP session with "hard permit" filters (FWPM_FILTER_FLAG_CLEAR_ACTION_RIGHT) for DNS to localhost in a max-priority sublayer (weight 0xFFFF). This overrides the VPN's block for loopback DNS only, while preserving the VPN's DNS leak protection for all other (non-loopback) DNS traffic. The loopback protect is: - Only activated when NRPT probes fail (not preemptively) - Harmless when no conflicting WFP blocks exist (permit-only, no blocks) - Persistent until ctrld shutdown (survives VPN reconnect cycles) - Cleaned up by the existing cleanupWFPFilters path on shutdown
4753507 to
81aa6b2
Compare
When WFP loopback protect is active, the upstream.os healthcheck will always fail because an external WFP block filter is interfering with plain DNS. This demotes those expected failures to debug level and returns errOsHealthcheckSuppressed so the recovery loop treats them as non-fatal, eliminating the log spam described in #526.
Go's default is already TLS 1.2+ (since Go 1.18), but making this explicit satisfies RFC 7858/9250 recommendations and makes the security intent clear for auditors.
Current code writes to a predictable path, which on systems without `fs.protected_symlinks` (e.g. embedded routers) could allow a local attacker with API compromise to perform symlink attacks.
Currently there is no limit on PIN attempts, allowing unlimited brute force if an attacker gains socket access. While the socket is root-only by default, rate limiting is cheap defense-in-depth.
DoQ responses are length-prefixed per RFC 9250. The resolver previously assumed the stream always contained at least two bytes and unpacked from buf[2:], which could panic on truncated or malicious replies. Validate the prefix against the bytes read, return a clear error, and retire the connection from the pool on framing failure. Unpack only the slice declared by the prefix so a short read cannot be misinterpreted as a full message. Add regression coverage with a small test server that returns malformed raw payloads (empty, one byte, prefix-only, prefix larger than payload).
DoQ pools now keep a single quic.Transport and UDP socket for all dials, so parallel dial and reconnect churn no longer allocate a new socket per attempt or leak the winner's UDP conn when the caller owns the packet conn. quicParallelDialer accepts an optional transport: when set, dials use Transport.DialEarly on that socket; when nil, behavior matches the old per-dial ListenUDP path (losers close their sockets). Per RFC 9250 §4.2, close the query stream's send side before reading the response so strict upstreams see STREAM FIN before answering. CloseIdleConnections closes the shared transport and underlying UDP conn so checked-out connections and the OS socket are torn down. Add a FIN-strict test server, coverage for bootstrap vs parallel-dial paths, and a Linux-only FD churn regression test.
Stick to go1.25 for now, since using go1.26 causing a runtime panic when building arm platforms.
For GO-2026-5026 security fix.
The test:windows CI job intermittently failed to clean up .testbin with
"Access to the path '...cmd_cli.test.exe' is denied". This was previously
attributed to Windows Defender scanning the large unsigned test binaries,
and mitigated with Defender exclusions and cleanup retries. That was
treating a symptom.
Root cause: performUpgrade() self-upgrades by running
exec.Command(os.Executable(), "upgrade", "prod", "-vv") as a detached,
windowless child. In the real ctrld binary this re-execs ctrld and is
correct. Under `go test`, os.Executable() is the test binary itself, and
`go test` stops flag parsing at the first positional arg ("upgrade") and
ignores the rest -- so the child silently re-runs the entire test suite.
That child hits the upgrade tests again and spawns more detached children,
recursively: a fork bomb of hidden processes that pins the runner's
CPU/memory and keeps the test binary's image file locked. Windows refuses
to delete the image of a running process, hence the "Access is denied"
during after_script. Whether any children are still alive when cleanup
runs is a timing race, which is why the failure was flaky.
Two tests reached this path: Test_performUpgrade (directly) and
Test_selfUpgradeCheck (via selfUpgradeCheck -> performUpgrade on the
"upgrade allowed" case).
Fix:
- prog.go: extract the command construction into a package-level
newUpgradeCmd var. Production behavior is unchanged.
- main_test.go: stub newUpgradeCmd once in TestMain so the whole test
binary self-execs with `-test.run=^$` (matches no tests, exits
immediately) instead of re-running the suite. This covers every test
that reaches performUpgrade, present and future, while still exercising
the cmd.Start() success path.
For fixing CVE-2026-40898.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Major Release
This release contains new features, improvements and bug fixes.
Added
Improvements
commands.go(1,397 lines) into 13 focused command files, improving maintainability and testabilityFixes
Breaking Changes
If you were using ctrld with any of these router platforms, you will need to use alternative deployment methods. See the migration guide for details.
Note: All other functionality remains backward compatible. Existing configuration files and CLI commands continue to work without changes.