feat(igc): enable tx checksum offload#694
Conversation
- Add ActiveChecksumContext enum to Tx to track and reuse context descriptors - Add igc_tx_ctx_setup() to program the Advanced TX Context Descriptor for IPv4, TCP/IPv4, and UDP/IPv4 checksum offload - Update igc_send() to prepend a context descriptor when offload context changes - Advertise CSUM_IPv4, CSUM_TCPv4, and CSUM_UDPv4 capabilities - Enable TCP checksum offload in if_net and update the capability comment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds IPv4/TCPv4/UDPv4 TX checksum offload support for the Intel IGC (I225/I226) driver by introducing advanced TX context descriptors and advertising the corresponding network capabilities so the stack can request hardware checksum insertion.
Changes:
- Add tracking for the “active” TX checksum context and emit an Advanced TX Context Descriptor when the offload context changes.
- Advertise
CSUM_IPv4,CSUM_TCPv4, andCSUM_UDPv4capabilities in the IGC driver and enable TCP checksum offload in the network interface capabilities mapping. - Extend IGC descriptor definitions/structures to support Advanced Context Descriptors.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| awkernel_lib/src/net/if_net.rs | Enables TCP checksum offload reporting to smoltcp when the device advertises CSUM_TCPv4. |
| awkernel_drivers/src/pcie/intel/igc/igc_defines.rs | Adds constants needed to build Advanced TX Context Descriptors for checksum offload. |
| awkernel_drivers/src/pcie/intel/igc/igc_base.rs | Extends the advanced TX descriptor union to include the context descriptor view. |
| awkernel_drivers/src/pcie/intel/igc.rs | Implements context descriptor setup, context reuse tracking, capability advertisement, and send-path changes to prepend context descriptors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Validated on real I225 hardware (UDP echo + TCP both work, valid on-wire checksums confirmed via tcpdump on the peer). - IP header length in the advanced context descriptor was programmed in 32-bit words instead of bytes (`header_len()` returns IHL). With the wrong length the NIC mis-placed the checksums and IP packets never egressed correctly; shift IHL left by 2 to get bytes (matches the igb driver). - Seed the IPv4 pseudo-header checksum into the L4 checksum field before DMA. smoltcp leaves it zero when TX checksum is offloaded, so without the seed the NIC computed segment-only checksums and every TCP/UDP checksum was wrong on the wire. Add `igc_pseudo_cksum()` and write it at the L4 csum offset. - Restore `active_checksum_context` when `igc_send()` bails out due to insufficient descriptors, so the software context cannot desync from what the hardware actually saw (addresses the review's state-desync concern). - Reclaim context descriptors in `igc_txeof()`: they carry no DD writeback, so the per-descriptor DD scan would stall at the first one and eventually wedge TX once the ring filled. Treat a non-DD descriptor whose successor is done as a consumed context descriptor. - Derive `Copy, Clone` for `ActiveChecksumContext` to support the snapshot. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Factor the L4 checksum seed `(usize, u16)` into a `L4CksumSeed` type alias so `igc_tx_ctx_setup`'s return type is no longer flagged by clippy::type_complexity. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| let mut vlan_macip_lens = 0u32; | ||
| let mut offload = false; | ||
|
|
||
| vlan_macip_lens |= (core::mem::size_of::<EtherHeader>() as u32) << IGC_ADVTXD_MACLEN_SHIFT; |
There was a problem hiding this comment.
Inline 802.1Q VLAN tags pass through extract_headers() and are still classified as IPv4/TCP/UDP, but MACLEN is hardcoded to size_of::<EtherHeader>() (14) and l4_off uses ETHER_HDR_LEN. For a frame carrying an 802.1Q tag the real L2 header is 18 bytes, so the NIC computes the L3/L4 checksum over the wrong range and the pseudo-header seed lands at the wrong offset.
Suggest deriving the L2 length from ext.ether_vlan.is_some() (14 vs 18) and using it for both vlan_macip_lens MACLEN and l4_off.
|
|
||
| /// Compute the IPv4 pseudo-header checksum that must be seeded into the L4 | ||
| /// checksum field so the NIC can complete TCP/UDP checksum offload. | ||
| fn igc_pseudo_cksum(ip: &Ip) -> u16 { |
There was a problem hiding this comment.
igc_pseudo_cksum() and the seed offset computation assume the L4 header lives at ETHER_HDR_LEN + iphlen, but for any IP fragment (non-zero fragment offset or MF=1) the L4 header is not there. The seed is written into payload bytes and the NIC then "completes" a checksum over the wrong field, producing broken packets on the wire.
Suggest bailing out of offload when (ip.ip_off.swap_bytes() & 0x3fff) != 0 and falling back to the software path.
|
|
||
| vlan_macip_lens |= (core::mem::size_of::<EtherHeader>() as u32) << IGC_ADVTXD_MACLEN_SHIFT; | ||
|
|
||
| let NetworkHdr::Ipv4(ip) = &ext.network else { |
There was a problem hiding this comment.
tx_packet_header_flags() sets TCP_CSUM_OUT/UDP_CSUM_OUT for any TCP/UDP transport regardless of network family, but the new capability is CSUM_TCPv4/CSUM_UDPv4 only and this function early-returns for non-IPv4. An IPv6+TCP/UDP packet thus bypasses both smoltcp's software checksum (skipped because cap.checksum.{tcp,udp}.tx() == false) and the NIC offload, and leaves the wire with a zero/garbage L4 checksum.
Either gate the flags on IPv4 in tx_packet_header_flags(), or split the capability bits per family in if_net.rs.
|
|
||
| let ext = match extract_headers(ether_frame.data) { | ||
| Ok(e) => e, | ||
| Err(_) => return Ok((0, base_olinfo, None)), |
There was a problem hiding this comment.
If extract_headers() returns Err, this returns (0, base_olinfo, None) and the data descriptor is posted without offload. But smoltcp's TX layer, observing cap.checksum.{tcp,udp}.tx() == false, did not compute the L4 checksum either, so the frame goes out with an unfilled checksum field.
Treat this as a hard error (e.g. Err(IgcDriverErr::InvalidPacket)) and drop the packet, mirroring how igb/ixgbe handle it.
| // always immediately followed by its data descriptor, and a single queue | ||
| // completes in order, so if the next descriptor is done this one is a | ||
| // consumed context descriptor and can be reclaimed; otherwise stop. | ||
| let next = if idx + 1 == ring_len { 0 } else { idx + 1 }; |
There was a problem hiding this comment.
The look-ahead lacks an idx == tx.next_avail_desc guard. When the ring is empty (next_to_clean == next_avail_desc), if the next slot still has a stale DD bit set (e.g. on a fresh queue before the first reclaim cycle), the loop treats idx as a consumed context descriptor and advances next_to_clean past next_avail_desc, corrupting the igc_desc_unused() accounting.
Add if idx == tx.next_avail_desc { break; } at the top of the loop, and guard the look-ahead so it only fires when next != tx.next_avail_desc.
| ) -> Result<(usize, u32, Option<L4CksumSeed>), IgcDriverErr> { | ||
| let base_olinfo = (ether_frame.data.len() as u32) << IGC_ADVTXD_PAYLEN_SHIFT; | ||
|
|
||
| let ext = match extract_headers(ether_frame.data) { |
There was a problem hiding this comment.
extract_headers() is parsed on every send, walking Ethernet/VLAN/IP/TCP/UDP with several unsafe reads, even when csum_flags is empty and no offload is requested. On the fast path this is the dominant per-packet cost added by this PR.
Early-return at the top: if ether_frame.csum_flags.is_empty() { return Ok((0, base_olinfo, None)); }.
| type_tucmd_mlhl |= IGC_ADVTXD_DTYP_CTXT | IGC_TXD_CMD_DEXT; | ||
| let mss_l4len_idx = l4len << IGC_ADVTXD_L4LEN_SHIFT; | ||
|
|
||
| let desc = &mut tx.tx_desc_ring.as_mut()[head]; |
There was a problem hiding this comment.
igc_tx_ctx_setup writes the context descriptor and updates tx.active_checksum_context before igc_send confirms the ring has room (the in-code comment at L1142-L1143 acknowledges this), which is why the caller has to snapshot and restore saved_ctx on the bail-out path. Functionally it is fine — next_avail_desc does not move so the slot is not booked, the next igc_send overwrites the 16-byte descriptor in full, and the stale mss_l4len_idx overlapping wb.status keeps the DD bit at 0 so igc_txeof is not misled — but the side-effect boundary is fragile and the rollback is easy to forget when extending the function later.
Two cleaner options:
- Make
igc_tx_ctx_setuppure: takeactive_checksum_contextby value and return(new_ctx, Option<ctx_payload>, olinfo_status, cksum_seed).igc_sendcommits both the descriptor write and the state update only after the room check. - Or do the conservative early check that the OpenBSD igc driver uses: at the top of
igc_send, bail out ifigc_desc_unused() < 2(the maximum a single packet can consume). This removes the snapshot/restore dance with minimal churn.
Description
Related links
How was this PR tested?
Notes for reviewers