Skip to content

feat(igc): enable tx checksum offload#694

Open
ytakano wants to merge 4 commits into
tier4:mainfrom
ytakano:igc_tx_csum_offload
Open

feat(igc): enable tx checksum offload#694
ytakano wants to merge 4 commits into
tier4:mainfrom
ytakano:igc_tx_csum_offload

Conversation

@ytakano

@ytakano ytakano commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Description

  • Add ActiveChecksumContext enum to Tx to track and reuse context descriptors
  • Add igc_tx_ctx_setup() to program the Advanced TX Context Descriptor for IPv4, TCP/IPv4, and UDP/IPv4 checksum offload
  • Update igc_send() to prepend a context descriptor when offload context changes
  • Advertise CSUM_IPv4, CSUM_TCPv4, and CSUM_UDPv4 capabilities
  • Enable TCP checksum offload in if_net and update the capability comment

Related links

How was this PR tested?

Notes for reviewers

ytakano and others added 2 commits June 8, 2026 18:30
- Add ActiveChecksumContext enum to Tx to track and reuse context descriptors
- Add igc_tx_ctx_setup() to program the Advanced TX Context Descriptor for
  IPv4, TCP/IPv4, and UDP/IPv4 checksum offload
- Update igc_send() to prepend a context descriptor when offload context changes
- Advertise CSUM_IPv4, CSUM_TCPv4, and CSUM_UDPv4 capabilities
- Enable TCP checksum offload in if_net and update the capability comment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Yuuki Takano <ytakanoster@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds IPv4/TCPv4/UDPv4 TX checksum offload support for the Intel IGC (I225/I226) driver by introducing advanced TX context descriptors and advertising the corresponding network capabilities so the stack can request hardware checksum insertion.

Changes:

  • Add tracking for the “active” TX checksum context and emit an Advanced TX Context Descriptor when the offload context changes.
  • Advertise CSUM_IPv4, CSUM_TCPv4, and CSUM_UDPv4 capabilities in the IGC driver and enable TCP checksum offload in the network interface capabilities mapping.
  • Extend IGC descriptor definitions/structures to support Advanced Context Descriptors.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
awkernel_lib/src/net/if_net.rs Enables TCP checksum offload reporting to smoltcp when the device advertises CSUM_TCPv4.
awkernel_drivers/src/pcie/intel/igc/igc_defines.rs Adds constants needed to build Advanced TX Context Descriptors for checksum offload.
awkernel_drivers/src/pcie/intel/igc/igc_base.rs Extends the advanced TX descriptor union to include the context descriptor view.
awkernel_drivers/src/pcie/intel/igc.rs Implements context descriptor setup, context reuse tracking, capability advertisement, and send-path changes to prepend context descriptors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread awkernel_drivers/src/pcie/intel/igc.rs Outdated
Comment thread awkernel_drivers/src/pcie/intel/igc.rs
Comment thread awkernel_drivers/src/pcie/intel/igc.rs
Comment thread awkernel_drivers/src/pcie/intel/igc.rs Outdated
ytakano and others added 2 commits June 10, 2026 06:34
Validated on real I225 hardware (UDP echo + TCP both work, valid on-wire
checksums confirmed via tcpdump on the peer).

- IP header length in the advanced context descriptor was programmed in
  32-bit words instead of bytes (`header_len()` returns IHL). With the wrong
  length the NIC mis-placed the checksums and IP packets never egressed
  correctly; shift IHL left by 2 to get bytes (matches the igb driver).
- Seed the IPv4 pseudo-header checksum into the L4 checksum field before DMA.
  smoltcp leaves it zero when TX checksum is offloaded, so without the seed
  the NIC computed segment-only checksums and every TCP/UDP checksum was wrong
  on the wire. Add `igc_pseudo_cksum()` and write it at the L4 csum offset.
- Restore `active_checksum_context` when `igc_send()` bails out due to
  insufficient descriptors, so the software context cannot desync from what
  the hardware actually saw (addresses the review's state-desync concern).
- Reclaim context descriptors in `igc_txeof()`: they carry no DD writeback,
  so the per-descriptor DD scan would stall at the first one and eventually
  wedge TX once the ring filled. Treat a non-DD descriptor whose successor is
  done as a consumed context descriptor.
- Derive `Copy, Clone` for `ActiveChecksumContext` to support the snapshot.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Factor the L4 checksum seed `(usize, u16)` into a `L4CksumSeed` type alias so
`igc_tx_ctx_setup`'s return type is no longer flagged by clippy::type_complexity.
No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ytakano ytakano marked this pull request as ready for review June 10, 2026 01:40
@ytakano ytakano requested a review from atsushi421 June 10, 2026 02:21
let mut vlan_macip_lens = 0u32;
let mut offload = false;

vlan_macip_lens |= (core::mem::size_of::<EtherHeader>() as u32) << IGC_ADVTXD_MACLEN_SHIFT;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline 802.1Q VLAN tags pass through extract_headers() and are still classified as IPv4/TCP/UDP, but MACLEN is hardcoded to size_of::<EtherHeader>() (14) and l4_off uses ETHER_HDR_LEN. For a frame carrying an 802.1Q tag the real L2 header is 18 bytes, so the NIC computes the L3/L4 checksum over the wrong range and the pseudo-header seed lands at the wrong offset.

Suggest deriving the L2 length from ext.ether_vlan.is_some() (14 vs 18) and using it for both vlan_macip_lens MACLEN and l4_off.


/// Compute the IPv4 pseudo-header checksum that must be seeded into the L4
/// checksum field so the NIC can complete TCP/UDP checksum offload.
fn igc_pseudo_cksum(ip: &Ip) -> u16 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

igc_pseudo_cksum() and the seed offset computation assume the L4 header lives at ETHER_HDR_LEN + iphlen, but for any IP fragment (non-zero fragment offset or MF=1) the L4 header is not there. The seed is written into payload bytes and the NIC then "completes" a checksum over the wrong field, producing broken packets on the wire.

Suggest bailing out of offload when (ip.ip_off.swap_bytes() & 0x3fff) != 0 and falling back to the software path.


vlan_macip_lens |= (core::mem::size_of::<EtherHeader>() as u32) << IGC_ADVTXD_MACLEN_SHIFT;

let NetworkHdr::Ipv4(ip) = &ext.network else {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tx_packet_header_flags() sets TCP_CSUM_OUT/UDP_CSUM_OUT for any TCP/UDP transport regardless of network family, but the new capability is CSUM_TCPv4/CSUM_UDPv4 only and this function early-returns for non-IPv4. An IPv6+TCP/UDP packet thus bypasses both smoltcp's software checksum (skipped because cap.checksum.{tcp,udp}.tx() == false) and the NIC offload, and leaves the wire with a zero/garbage L4 checksum.

Either gate the flags on IPv4 in tx_packet_header_flags(), or split the capability bits per family in if_net.rs.


let ext = match extract_headers(ether_frame.data) {
Ok(e) => e,
Err(_) => return Ok((0, base_olinfo, None)),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If extract_headers() returns Err, this returns (0, base_olinfo, None) and the data descriptor is posted without offload. But smoltcp's TX layer, observing cap.checksum.{tcp,udp}.tx() == false, did not compute the L4 checksum either, so the frame goes out with an unfilled checksum field.

Treat this as a hard error (e.g. Err(IgcDriverErr::InvalidPacket)) and drop the packet, mirroring how igb/ixgbe handle it.

// always immediately followed by its data descriptor, and a single queue
// completes in order, so if the next descriptor is done this one is a
// consumed context descriptor and can be reclaimed; otherwise stop.
let next = if idx + 1 == ring_len { 0 } else { idx + 1 };

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The look-ahead lacks an idx == tx.next_avail_desc guard. When the ring is empty (next_to_clean == next_avail_desc), if the next slot still has a stale DD bit set (e.g. on a fresh queue before the first reclaim cycle), the loop treats idx as a consumed context descriptor and advances next_to_clean past next_avail_desc, corrupting the igc_desc_unused() accounting.

Add if idx == tx.next_avail_desc { break; } at the top of the loop, and guard the look-ahead so it only fires when next != tx.next_avail_desc.

) -> Result<(usize, u32, Option<L4CksumSeed>), IgcDriverErr> {
let base_olinfo = (ether_frame.data.len() as u32) << IGC_ADVTXD_PAYLEN_SHIFT;

let ext = match extract_headers(ether_frame.data) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_headers() is parsed on every send, walking Ethernet/VLAN/IP/TCP/UDP with several unsafe reads, even when csum_flags is empty and no offload is requested. On the fast path this is the dominant per-packet cost added by this PR.

Early-return at the top: if ether_frame.csum_flags.is_empty() { return Ok((0, base_olinfo, None)); }.

type_tucmd_mlhl |= IGC_ADVTXD_DTYP_CTXT | IGC_TXD_CMD_DEXT;
let mss_l4len_idx = l4len << IGC_ADVTXD_L4LEN_SHIFT;

let desc = &mut tx.tx_desc_ring.as_mut()[head];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

igc_tx_ctx_setup writes the context descriptor and updates tx.active_checksum_context before igc_send confirms the ring has room (the in-code comment at L1142-L1143 acknowledges this), which is why the caller has to snapshot and restore saved_ctx on the bail-out path. Functionally it is fine — next_avail_desc does not move so the slot is not booked, the next igc_send overwrites the 16-byte descriptor in full, and the stale mss_l4len_idx overlapping wb.status keeps the DD bit at 0 so igc_txeof is not misled — but the side-effect boundary is fragile and the rollback is easy to forget when extending the function later.

Two cleaner options:

  • Make igc_tx_ctx_setup pure: take active_checksum_context by value and return (new_ctx, Option<ctx_payload>, olinfo_status, cksum_seed). igc_send commits both the descriptor write and the state update only after the room check.
  • Or do the conservative early check that the OpenBSD igc driver uses: at the top of igc_send, bail out if igc_desc_unused() < 2 (the maximum a single packet can consume). This removes the snapshot/restore dance with minimal churn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants