Nmi branch test#1
Open
dg1197 wants to merge 155 commits into
Open
Conversation
When we compile the rust programs with PIE, the compiler creates the Global Offset Table (GOT) to put the address of the extern variables. The GOT is supposed to be fixed at program load time by the dynamic loader. However, we do not have a dynamic loader and therefore, the GOT entries are un-patched and contain absolute addresses. This causes problem when the program is triggered in the kernel -- the use of absolute address will cause the code going to non-existing pages. Add a new GOT fix step when the base program is loaded. Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Add a new trace_printk function only used by inner-unikernel programs. This function always pads a null character at the end. Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Add a new iu_dispatcher_func to dispatch inner-unikernel programs so
that rust panics can be handled. The dispatch have a prototype of:
extern asmlinkage unsigned int iu_dispatcher_func(
const void *ctx,
const struct bpf_insn *insnsi,
unsigned int (*bpf_func)(const void *,
const struct bpf_insn *));
which shares the same signature as bpf_dispatcher_nop_func but differs
in linkage, as it is implemented directly in assembly.
The function will save the stack pointer and frame pointer to designated
per-cpu variables before calling into the program.
If the execution is successful (i.e. no exceptions), the function will
just return normally.
+-----------------------+
| iu_dispatcher_func: |
| movq %rsp %gs:iu_sp |
| movq %rbp %gs:iu_fp | +-----------+
| call *%rdx |--------------->| iu_prog1: |
| | | ... |
| iu_exit: |<---------------| ret |
| ret | +-----------+
| ... |
+-----------------------+
Under exceptional cases (where a rust panic is fired), rust_begin_unwind
(i.e. panic handler) will transfer the control flow to the iu_landingpad
function, which, after dumping some information to the kernel ring
buffer, will issue a direct jump to iu_panic_trampoline, a global label
in the middle of iu_dispatcher_func. The trampoline code restores the
old stack pointer and frame pointer value, effectively unwinding the
stack. It then sets a return value of -EINVAL and jumps to iu_exit to
return from iu_dispatcher_func.
+-----------------------+
| iu_dispatcher_func: |
| movq %rsp, %gs:iu_sp |
| movq %rbp, %gs:iu_fp | +-----------+
| call *%rdx |--------------->| iu_prog1: |
| | +------| ... |
+---->| iu_exit: | | | ret |
| | ret | | +-----------+
| | | |
| | iu_panic_trampoline: |<-----+ | panic!()
| | movq %gs:iu_sp, %rsp | | |
| | movq %gs:iu_fp, %rbp | | | +-------------------------+
| | movq $(-EINVAL), %rax | | +----->| iu_landingpad: |
+-----| jmp iu_exit | | | ... |
+-----------------------+ +---------| jmp iu_panic_trampoline |
+-------------------------+
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
This right now only works for program invocations where bpf_dispatcher_nop_func is used originally. It does cover all tracing programs (i.e. these invoked via trace_call_bpf). Other program types (e.g. XDP) are not supported. Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
- Remove debug prints - Remove commented out code Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Apparently lld does things differently from bfd and mold -- it puts a 0 at the relative relocation address instead of the addend. Let's just directly compute the final value with *ABS*+addend to make it more robust. Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Ruowen Qin <ruowenq2@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Ruowen Qin <ruowenq2@illinois.edu>
The previous __vmalloc() invocation already has __GFP_ZERO flag set so there is no need to zero the memory again. Plus, the address calculation is incorrect, which causes accidental zeroing of real data. Fixes: 23903f1 ("Rewritten to resolve conflicts") Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
We previously counted the total memory and page counts needed for the program incrementally. This causes problems when the linker (e.g. mold) generates a gap page between LOAD segments, as that gap page will not be counted. Instead, directly calculate the total memory and page counts by aligning the largest memeory address found in the LOAD segments to page boundary. Fixes: 88b2c24 ("Fixed memory conflict for distributed apps") Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
This fixes the following warning from modpost: WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kprobes/kprobe_target.o Fixes: c16cb95 ("samples/kprobe: add kprobe target module") Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: dmo <dganesh3@illiois.edu>
Signed-off-by: dmo <dganesh3@illiois.edu>
Signed-off-by: dmo <dganesh3@illiois.edu>
Signed-off-by: dmo <dganesh3@illiois.edu>
Signed-off-by: Dhanush Ganesh <dganesh3@illinois.edu>
jinghao-jia
pushed a commit
that referenced
this pull request
Jun 14, 2026
mvebu_pwm_suspend() and mvebu_pwm_resume() are called for all GPIO banks during suspend/resume, but not all banks have PWM functionality. GPIO banks without PWM have mvchip->mvpwm set to NULL. Calling mvebu_pwm_suspend() with mvpwm == NULL causes a NULL pointer dereference when it tries to access mvpwm->blink_select. Unable to handle kernel NULL pointer dereference at virtual address 00000020 when write [00000020] *pgd=00000000 Internal error: Oops: 815 [#1] PREEMPT ARM Modules linked in: CPU: 0 UID: 0 PID: 406 Comm: sh Not tainted 6.12.74-rt12-yocto-standard-g4e96f98fb7db-dirty #353 Hardware name: Marvell Armada 370/XP (Device Tree) PC is at regmap_mmio_read+0x38/0x54 LR is at regmap_mmio_read+0x38/0x54 pc : [<c05fd2ac>] lr : [<c05fd2ac>] psr: 200f0013 sp : f0c11d10 ip : 00000000 fp : c100d2f0 r10: c14fb854 r9 : 00000000 r8 : 00000000 r7 : c1799c00 r6 : 00000020 r5 : 00000020 r4 : c179c7c0 r3 : f0a231a0 r2 : 00000020 r1 : 00000020 r0 : 00000000 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 135ec059 DAC: 00000051 Call trace: regmap_mmio_read from _regmap_bus_reg_read+0x78/0xac _regmap_bus_reg_read from _regmap_read+0x60/0x154 _regmap_read from regmap_read+0x3c/0x60 regmap_read from mvebu_gpio_suspend+0xa4/0x14c mvebu_gpio_suspend from dpm_run_callback+0x54/0x180 dpm_run_callback from device_suspend+0x124/0x630 device_suspend from dpm_suspend+0x124/0x270 dpm_suspend from dpm_suspend_start+0x64/0x6c dpm_suspend_start from suspend_devices_and_enter+0x140/0x8e8 suspend_devices_and_enter from pm_suspend+0x2fc/0x308 pm_suspend from state_store+0x6c/0xc8 state_store from kernfs_fop_write_iter+0x10c/0x1f8 kernfs_fop_write_iter from vfs_write+0x270/0x468 vfs_write from ksys_write+0x70/0xf0 ksys_write from ret_fast_syscall+0x0/0x54 Add a NULL check for mvchip->mvpwm before calling the PWM suspend/resume functions. Fixes: 757642f ("gpio: mvebu: Add limited PWM support") Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Link: https://patch.msgid.link/20260608084334.2960803-1-yun.zhou@windriver.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
jinghao-jia
pushed a commit
that referenced
this pull request
Jun 14, 2026
mlx5_query_nic_vport_mac_list() sizes its firmware command buffer using the PF's log_max_current_uc/mc_list capabilities. When querying a VF vport with a larger configured max (via devlink), the firmware response can overflow this buffer: BUG: KASAN: slab-out-of-bounds in mlx5_query_nic_vport_mac_list+0x453/0x4c0 [mlx5_core] Read of size 4 at addr ff1100013ffc8a12 by task kworker/u96:2/385 CPU: 12 UID: 0 PID: 385 Comm: kworker/u96:2 Not tainted 7.0.0-rc6+ #1 PREEMPT Hardware name: QEMU Standard PC (Q35 + ICH9, 2009) Workqueue: mlx5_esw_wq esw_vport_change_handler [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x69/0xa0 print_report+0x176/0x4e4 kasan_report+0xc8/0x100 mlx5_query_nic_vport_mac_list+0x453/0x4c0 [mlx5_core] esw_update_vport_addr_list+0x2e3/0xda0 [mlx5_core] esw_vport_change_handle_locked+0xa1f/0x1060 [mlx5_core] esw_vport_change_handler+0x6a/0x90 [mlx5_core] process_one_work+0x87f/0x15e0 worker_thread+0x62b/0x1020 kthread+0x375/0x490 ret_from_fork+0x4dc/0x810 ret_from_fork_asm+0x11/0x20 </TASK> Fix by querying the vport's own HCA caps to size the buffer correctly. Refactor the function to allocate and return the MAC list internally, removing the caller's dependency on knowing the correct max. Fixes: e16aea2 ("net/mlx5: Introduce access functions to modify/query vport mac lists") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260604135849.458060-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
jinghao-jia
pushed a commit
that referenced
this pull request
Jun 14, 2026
… completion rds_ib_xmit_atomic() always programs a masked atomic opcode (IB_WR_MASKED_ATOMIC_CMP_AND_SWP or IB_WR_MASKED_ATOMIC_FETCH_AND_ADD) for every RDS atomic cmsg. But the completion-side switch in rds_ib_send_unmap_op() only handles the non-masked opcodes, so a masked atomic completion falls through to default and returns rm == NULL while send->s_op is left set. rds_ib_send_cqe_handler() then dereferences the NULL rm via rm->m_final_op, oopsing in softirq context. An unprivileged AF_RDS sendmsg() of an atomic cmsg over an active RDS/IB connection triggers it; on hardware that natively accepts masked atomics (mlx4, mlx5) no extra setup is needed. RDS/IB: rds_ib_send_unmap_op: unexpected opcode 0xd in WR! Oops: general protection fault [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000190-0x0000000000000197] RIP: rds_ib_send_cqe_handler+0x25c/0xb10 (net/rds/ib_send.c:282) Call Trace: <IRQ> rds_ib_send_cqe_handler (net/rds/ib_send.c:282) poll_scq (net/rds/ib_cm.c:274) rds_ib_tasklet_fn_send (net/rds/ib_cm.c:294) tasklet_action_common (kernel/softirq.c:943) handle_softirqs (kernel/softirq.c:573) run_ksoftirqd (kernel/softirq.c:479) </IRQ> Kernel panic - not syncing: Fatal exception in interrupt Handle the masked atomic opcodes in the same case as the non-masked ones: they map to the same struct rds_message.atomic union member, so the existing container_of()/rds_ib_send_unmap_atomic() body is correct for them. Fixes: 20c72bd ("RDS: Implement masked atomic operations") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260606192447.1179255-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
jinghao-jia
pushed a commit
that referenced
this pull request
Jun 14, 2026
…ister NAT helpers such as nf_nat_h323 store a raw pointer to module text in exp->expectfn (e.g. ip_nat_q931_expect). nf_ct_helper_expectfn_unregister() only unlinks the callback descriptor and never walks the expectation table, so an expectation pending at module removal survives with a dangling exp->expectfn into freed module text. When the expected connection arrives, init_conntrack() invokes exp->expectfn(), now a stale pointer into the unloaded module. Reproduced on a KASAN build by loading the H.323 helpers, creating a Q.931 expectation, unloading nf_nat_h323, then connecting to the expected port: Oops: int3: 0000 [#1] SMP KASAN NOPTI RIP: 0010:0xffffffffa06102d1 init_conntrack.isra.0 (net/netfilter/nf_conntrack_core.c:1862) nf_conntrack_in (net/netfilter/nf_conntrack_core.c:2049) ipv4_conntrack_local (net/netfilter/nf_conntrack_proto.c:223) nf_hook_slow (net/netfilter/core.c:619) __ip_local_out (net/ipv4/ip_output.c:120) __tcp_transmit_skb (net/ipv4/tcp_output.c:1715) tcp_connect (net/ipv4/tcp_output.c:4374) tcp_v4_connect (net/ipv4/tcp_ipv4.c:345) __sys_connect (net/socket.c:2167) Modules linked in: nf_conntrack_h323 [last unloaded: nf_nat_h323] Reaching the dangling state requires CAP_SYS_MODULE in the initial user namespace to remove a NAT helper that still has live expectations, so this is a robustness fix; leaving an expectation pointing at freed text is wrong regardless. Add nf_ct_helper_expectfn_destroy(), which walks the expectation table and drops every expectation whose ->expectfn matches the descriptor being torn down. Call it from each NAT helper's exit path after the existing RCU grace period, so no expectation outlives the code it points at and no extra synchronize_rcu() is introduced. With the fix, the same reproducer runs to completion without the Oops. Fixes: f587de0 ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port") Reported-by: Xiang Mei <xmei5@asu.edu> Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
jinghao-jia
pushed a commit
that referenced
this pull request
Jun 14, 2026
The xe driver keeps track of whether to probe display, and whether display hardware is there, using xe->info.probe_display. It gets set to false if there's no display after intel_display_device_probe(). However, the display may also be disabled via fuses, detected at a later time in intel_display_device_info_runtime_init(). In this case, the xe driver does for_each_intel_crtc() on uninitialized mode config in xe_display_flush_cleanup_work(), leading to a NULL pointer dereference, and generally calls display code with display info cleared. Check for intel_display_device_present() after intel_display_device_info_runtime_init(), and reset xe->info.probe_display as necessary. Also do unset_display_features() for completeness, although display runtime init has already done that. This will need to be unified across all cases later. Move intel_display_device_info_runtime_init() call slightly earlier, similar to i915, to avoid a bunch of unnecessary setup for no display cases. Note #1: The xe driver has no business doing low level display plumbing like for_each_intel_crtc() to begin with. It all needs to happen in display code. Note #2: The actual bug is present already in commit 44e6949 ("drm/xe/display: Implement display support"), but the oops was likely introduced later at commit ddf6492 ("drm/xe/display: Make display suspend/resume work on discrete"). Fixes: 44e6949 ("drm/xe/display: Implement display support") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7904 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/6150 Cc: stable@vger.kernel.org # v6.8+ Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260515160920.1082842-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 7c3eb9f) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
jinghao-jia
pushed a commit
that referenced
this pull request
Jun 14, 2026
addrconf_get_prefix_route() can return the fib6_null_entry sentinel entry which has a NULL fib6_table pointer. Therefore, before setting the route's expiration time, check that we are not working with this entry, as otherwise a NPD will be triggered [1]. Note that the other callers of addrconf_get_prefix_route() are not susceptible to this bug: 1. addrconf_prefix_rcv(): Requests a route with the 'RTF_ADDRCONF | RTF_PREFIX_RT' flags which are not set on fib6_null_entry. 2. modify_prefix_route(): Fixed by commit a747e02 ("ipv6: avoid possible NULL deref in modify_prefix_route()"). 3. __ipv6_ifa_notify(): Calls ip6_del_rt() which specifically checks for fib6_null_entry and returns an error. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] [...] Call Trace: <TASK> __kasan_check_byte (mm/kasan/common.c:573) lock_acquire.part.0 (kernel/locking/lockdep.c:5842 (discriminator 1)) _raw_spin_lock_bh (kernel/locking/spinlock.c:182 (discriminator 1)) cleanup_prefix_route (net/ipv6/addrconf.c:1280) ipv6_del_addr (net/ipv6/addrconf.c:1342) inet6_addr_del.isra.0 (net/ipv6/addrconf.c:3119) inet6_rtm_deladdr (net/ipv6/addrconf.c:4812) rtnetlink_rcv_msg (net/core/rtnetlink.c:6997) netlink_rcv_skb (net/netlink/af_netlink.c:2555) netlink_unicast (net/netlink/af_netlink.c:1344) netlink_sendmsg (net/netlink/af_netlink.c:1899) __sock_sendmsg (net/socket.c:802 (discriminator 4)) ____sys_sendmsg (net/socket.c:2698) ___sys_sendmsg (net/socket.c:2752) __sys_sendmsg (net/socket.c:2784) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) Fixes: 5eb902b ("net/ipv6: Remove expired routes with a separated list of routes.") Reported-by: Ji'an Zhou <eilaimemedsnaimel@gmail.com> Reviewed-by: David Ahern <dahern@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260609145448.768318-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.