Skip to content

255 migrate the so3 build system to infrabase and move to the new so3 logo#256

Open
daniel-rossier wants to merge 92 commits into
mainfrom
255-migrate-the-so3-build-system-to-infrabase-and-move-to-the-new-so3-logo
Open

255 migrate the so3 build system to infrabase and move to the new so3 logo#256
daniel-rossier wants to merge 92 commits into
mainfrom
255-migrate-the-so3-build-system-to-infrabase-and-move-to-the-new-so3-logo

Conversation

@daniel-rossier

Copy link
Copy Markdown
Contributor

No description provided.

Daniel Rossier added 30 commits June 12, 2026 12:14
Re-sync the build system from the edgemtech Infrabase tree (without
torizon and e1c), nest the SO3 sources under so3/ to match the Infrabase
per-OS layout, and build so3/usr-so3/rootfs-so3/avz in-tree.

- build/: Infrabase meta-layers re-synced from edgemtech; torizon, e1c
  and verdin removed; new meta-toolchain layer (musl-cross-make recipe
  building the aarch64/arm musl user-space toolchain into build/tmp)
- SO3 sources nested under so3/{so3,usr,rootfs,target}; recipe paths and
  .gitignore updated for the new layout (artifacts re-ignored)
- in-tree recipes: so3 (6.2.0), usr-so3, rootfs-so3, avz (no github
  fetch); u-boot fetched+patched (2022.04, aligned with edgemtech)
- deploy via unprivileged bitbake + sudo -n (meta-filesystem)
- bsp-so3 builds, deploys and boots to so3% standalone (virt64) and as
  an AVZ guest (virt64_avz_so3 ITS, EL2)
…system-to-infrabase-and-move-to-the-new-so3-logo
The old manual qemu/ mechanism (fetch.sh + qemu.patch) is superseded by
the meta-qemu recipe: it fetches the same QEMU 8.2.2 and applies the same
hw/arm/virt.{c,h} patches (CLCD/KMI/PS2). Verified that build.sh -x qemu
rebuilds an equivalent qemu-system-aarch64. qemu/ stays gitignored and is
regenerated on demand.
Revert the avz recipe to fetching SO3 from upstream at a pinned SRCREV
and building the hypervisor (EL2) from it, instead of the in-tree so3/
sources. AVZ is decoupled from the in-tree SO3, which is the guest/
capsule (EL1) under development. Verified: bitbake avz fetches, attaches
into avz/, configures virt64_avz_defconfig and builds avz/so3.bin.
The do_build make invocation relied on a CROSS_COMPILE inherited from the
caller's shell, which broke virt32 (arm) builds when the shell had an
aarch64 CROSS_COMPILE set (cc1: unknown value 'generic-armv7-a' for
-mtune). Pass CROSS_COMPILE=${IB_TOOLCHAIN}- explicitly so virt32 uses
arm-linux-gnueabihf- and virt64 uses aarch64-none-linux-gnu-, matching
atf.bbclass.
Drop the usr/lib/lvgl git submodule (.gitmodules removed) and go back to
the original meta-usr strategy: lvgl is fetched at build time by the
meta-usr lvgl bbappend, gated on the :lvgl OVERRIDE. usr-so3 re-enables
do_fetch/unpack/attach so the bbappend pulls lvgl into usr/lib/lvgl
(do_patch stays noexec — the slv/lvgl integration patches are already
baked into the in-tree usr/). The lvgl bbappend now mkdir's lib/lvgl
(no longer pre-created by the submodule). usr/lib/lvgl is gitignored;
meta-usr otherwise realigned with edgemtech.
The bbclass selects the current platform's target (QEMU_TARGET: arm-softmmu
for virt32, aarch64-softmmu for virt64) and, when reconfiguring, appends any
other arch already built under qemu/build so meson does not drop it. Thus
building arm-softmmu then aarch64-softmmu (or vice-versa, e.g. switching
IB_PLATFORM between so3 standalone/avz/capsule) keeps both qemu-system-*
binaries instead of wiping the previous one. do_configure is nostamp so the
accumulation re-evaluates each build.
The SO3 kernel is built in place, so switching IB_PLATFORM between
virt64 and virt32 (aarch64<->arm) leaves a stale .config and object
files behind, producing a wrong-arch kernel. Track the last built
arch in a .ib_last_arch marker and run 'make distclean' only when it
changes, keeping same-arch rebuilds incremental.
Two arch-switch bugs surfaced when building SO3 for virt32 (arm) after
virt64 (aarch64):

1. 'OVERRIDES += ":so3"' inserts a leading space, so OVERRIDES became
   "...:arm :so3" and the CPU token parsed as "arm " (trailing space).
   :<cpu> overrides such as IB_MUSL_TARGET:arm then never collapsed, so
   the user-space cmake build got a literal ${IB_MUSL_TARGET} on PATH and
   could not find arm-linux-musleabihf-gcc. Switch to OVERRIDES:append in
   all five SO3 recipes (no inserted space).

2. The usr-so3 cmake build dir caches the toolchain in CMakeCache.txt, so
   switching arch kept emitting aarch64 binaries (an aarch64 init.elf on a
   32-bit kernel -> prefetch abort at boot). Wipe so3/usr/build when the
   arch changes, tracked via a .ib_last_arch marker at the usr/ root.
Both QEMU launch scripts only handled virt64, so with IB_PLATFORM=virt32
they printed the MAC/GDB lines and exited without starting QEMU. Select
QEMU_BIN per platform (qemu-system-arm for virt32) and add a virt32 branch
booting U-Boot directly (-M virt -cpu cortex-a15 -kernel u-boot/u-boot,
sdcard.img.virt32). stg.sh keeps the virtio GPU/keyboard/mouse + SDL
window; the virt64-only guard is widened to accept virt32.
u-boot is built from the meta-uboot recipe (github 2022.04 @ pinned
SRCREV + the SO3 patch set), which fetches and attaches it, backing any
prior copy up to u-boot.back. The committed in-tree u-boot/ was therefore
obsolete and was clobbered on every build, producing a huge spurious
diff. Remove all 18k files from tracking and gitignore /u-boot/, matching
how qemu/ and avz/ are already handled.
The patch set was inherited wholesale from the edgemtech recipe and had
never been regenerated by do_updiff in this repo. It carried two classes
of cruft:

  * duplicate chains — the same source file patched twice (e.g. board.c
    in 0004 and 0077, setexpr.c in 0008/0081, the tools/boot/*.c and the
    defconfigs each appearing in two generations with ./ vs b/ labels),
    the residue of repeated append-only updiff runs across a label-format
    change;
  * build artifacts frozen as patches — hello_world.srec, autoconf.mk,
    autoconf.mk.dep, include/config/uboot.release, include/generated/*
    (dt.h, *_autogenerated.h), lib/efi_selftest/efi_miniapp_*.h.

Regenerated from scratch: diff the pristine fetch against the working
tree (do_diffcompose), drop the old numbered set, promote the staged
one-patch-per-file result (do_updiff). 64 messy patches -> 54 clean,
consolidated, git-labelled patches. e1c_boot.c is kept (compiled but
unused) per decision. Verified: a clean fetch+unpack+patch+build applies
all 54 and produces a working u-boot.

Also completed the do_diffcompose artifact exclude-list in patch.bbclass
(autoconf.mk, autoconf.mk.dep, *.srec, efi_miniapp_*.h) so future updiff
runs stay clean.
ls sets CLOEXEC via fcntl(). arm64 musl issues this as fcntl (NR 25),
which SO3 handles; arm32 (virt32) musl issues the same call as fcntl64
(NR 221), which syscall.tbl never registered -> 'unhandled syscall: 221'
warning and a silently-failing -ENOSYS. Map fcntl64 to the existing
__sys_fcntl handler so virt32 behaves like virt64.
Killing a process whose spawned thread was blocked in the kernel hit
'BUG in kernel/thread.c:105' (discard_tcb_in_pcb: WAITING 'not handled
yet'). A sleeping thread sits in __sleep() with a struct timer on its
own kernel stack, so it cannot just be freed — the pending timer would
dangle and later fire on freed memory.

Handle it cooperatively: add a tcb->killed flag; discard_tcb_in_pcb()
flags+wakes WAITING threads (instead of BUG()) and waits for them via
the existing threads_active completion, reaping them afterwards.
A woken thread resumes in __sleep(), stops its own timer, sees killed
and self-terminates with thread_exit() — entirely in kernel, never
returning to the (already-released) user pages. READY threads are still
force-freed (they must not resume into freed user space).

Verified: Ctrl-C of lvgl_demo stress (whose slv tick thread loops in
usleep) no longer panics.

Limitation: only the __sleep() wait is instrumented. A thread killed
while blocked on a futex/mutex would not yet self-terminate; that needs
the same killed-check added to those wait paths.
The 128 KB lvgl heap is too small to build lv_demo_widgets (lv_conf.h's
own note flags this), so the widget tree failed to allocate, nothing
rendered, and the main thread spun in lv_timer_handler() without
reaching a syscall boundary — making Ctrl-C undeliverable. 4 MB fits the
demo comfortably; it is BSS (zero-init) so the .elf on disk is unchanged.
A diagnostic that bypasses LVGL: opens /dev/fb, queries geometry via the
same ioctls slv uses, mmap()s the VRAM and draws colour bars + an
animated square straight into it. Lets us tell apart a broken display
pipeline (PL111 CLCD -> QEMU SDL) from an LVGL-side problem. Ctrl-C to
quit.
fb_mmap() mapped the CLCD VRAM cacheable, which is wrong for a
framebuffer: on real hardware the CPU writes linger in the data cache
and never reach the scanout buffer. Map it non-cacheable (nocache=true).
(Under QEMU/TCG it is cosmetic since the cache is not modelled, but it is
required on real targets.)
SO3 drives the PL111 CLCD + PL050 keyboard/mouse that the so3 QEMU patch
wires unconditionally into '-M virt'; it has no virtio-gpu driver, so the
virtio-gpu/keyboard/mouse devices only added a competing blank console.
More importantly the SDL backend did not present the PL111 console's
surface at all (verified: pl110 renders the framebuffer into the surface
- monitor 'screendump' shows it - yet the SDL window stayed black).
Switching to '-display gtk' shows the panel correctly (and its View menu
lists every console). Drop the virtio-gpu/keyboard/mouse devices.
Paint the colour-bar background once, then per frame only restore the
square's previous rows and redraw it, instead of memcpy-ing the whole
3 MB framebuffer every frame.
The serial IRQ delivered SIGINT to current() - whatever thread happened
to be running when the Ctrl-C key arrived. A foreground app asleep in a
syscall (e.g. usleep) is not the running thread (the idle thread is, with
pcb==NULL), so Ctrl-C was silently dropped; it only worked for CPU-busy
apps. And at the shell prompt the prompt was never reprinted.

Two parts:

1. Track the foreground console process. Add a global fg_pcb, set by
   sys_do_wait4() to the child a process blocks waiting on (the shell's
   foreground job) and restored to the waiter when it exits. The serial
   IRQ now targets fg_pcb (fallback: current()), so SIGINT reaches the
   foreground app even while it sleeps.

2. Cancel the line at the prompt instead of signalling the shell. When a
   console read is in progress (read_lock held), the IRQ sets serial_intr;
   pl011_get_byte returns ETX and console_getc discards the typed line and
   returns an empty line, so the shell's fgets returns and it reprints the
   prompt once. This avoids musl's sticky-EOF on a 0-byte read and a
   siglongjmp-through-fgets file-lock leak. Matches the driver's existing
   read_lock design comment.

Relies on the cooperative WAITING-thread teardown for the kill path.
Mirror the virt32 graphical fix onto the virt64 branch: SO3 drives the
same PL111 CLCD + PL050 (virt64.dts has clcd@08800000 / pl050 nodes), has
no virtio-gpu driver, and the SDL backend does not present the PL111
console. Switch to '-display gtk' and drop the virtio-gpu/keyboard/mouse
devices. The flash0.img AVZ-vs-U-Boot boot heuristic is unchanged.

Untested (no virt64 graphical run this session) but the framebuffer path
is identical to virt32; the kernel-side fixes (non-cacheable fb, Ctrl-C)
are arch-shared.
An interrupted task (e.g. Ctrl-C during a clean) can leave a recipe
WORKDIR that exists but lacks its temp/ subdir. bitbake then cannot
create that task's fifo and fails with
  do_clean: [Errno 2] No such file or directory: .../temp/fifo.NNNN
(hit on 'build.sh -ca bsp-so3'). Before clean/build, scan tmp/work and
remove any workdir missing temp/ (it holds nothing useful) so bitbake
recreates it cleanly.
'build.sh -ca bsp-so3' failed with
  usr-so3 do_clean: [Errno 2] No such file or directory: .../temp/fifo.NNNN

Root cause: the lvgl bbappend's shell do_clean:append ran
'rm -rf ${WORKDIR}/*', which deleted the running clean task's own temp/
(holding its fifo + run script) mid-execution, leaving an empty workdir.
The next clean then could not create its fifo there and failed.

Fix: make do_clean a Python task (usr.bbclass) plus Python do_clean:append
in the usr-so3 recipe and the lvgl bbappend. Python tasks create their
temp dir themselves and use no fifo, so they are robust when the workdir
is fresh/empty. The lvgl append no longer touches WORKDIR (bitbake owns
it); it only purges the fetched lvgl tree (in-tree usr/lib/lvgl, src/lib,
${S}/lib/lvgl). Verified: fresh clean, repeat clean, full 'bsp-so3 -c
clean', and clean->rebuild (aarch64) all succeed.
…2 entries

Remove the IB_PLATFORM:so3 override: SO3 now always builds for the main
IB_PLATFORM. That override was referenced only here and resolved via
OVERRIDES, so a value diverging from IB_PLATFORM silently built SO3 for
the wrong arch. The standalone / AVZ-guest / capsule contexts are not
distinct platforms - they differ only by IB_CONFIG:so3 / IB_TARGET_ITS:so3
(e.g. capsule = virt64_capsule_defconfig + virt64_capsule), which are
independent of the platform variable.

Also add the virt32 counterparts that were missing
(PREFERRED_VERSION_so3, IB_CONFIG:so3, IB_TARGET_ITS:so3, IB_STORAGE_MODE)
and default IB_PLATFORM to virt64.
AVZ is an EL2 hypervisor. The virt64 launcher only enabled EL2
(virtualization=on) when filesystem/flash0.img was present (ATF chain);
booting AVZ via the ITS without ATF used plain -M virt,gic-version=2
(EL1), so AVZ faulted on its first EL2 system-register write
(Synchronous Abort -> reset). Detect the selected so3 ITS from local.conf
and, when it is an avz ITS, add virtualization=on with -kernel u-boot
(virt64_defconfig is EL2-aware). Verified: AVZ now boots.
Daniel Rossier added 16 commits June 16, 2026 15:36
`ls FILE` always called opendir(), which fails on a regular file, so it printed
"cannot open". When opendir() fails, stat the path: if it exists, list it as a
single entry (respecting -l); only report an error when it truly does not exist.
SO3 had no real-time clock: gettimeofday/clock_gettime returned time since boot,
and FatFS stamped writes with a fixed date (FF_FS_NORTC=1), so file timestamps
were meaningless and touch could not update them.

  - New PL031 RTC driver (devices/rtc/pl031.c, "arm,pl031") exposing
    rtc_get_time() (epoch seconds from the Data Register); rtc nodes added to
    virt64/virt32 dts (QEMU virt seeds the device from the host clock).
  - timer.c: gettimeofday and clock_gettime(CLOCK_REALTIME) now return RTC
    wall-clock time when available (CLOCK_MONOTONIC stays on the boot timer).
  - FatFS: FF_FS_NORTC=0 + get_fattime() from the RTC, so created files get the
    real time; FF_USE_CHMOD=1 to compile f_utime.
  - New utimensat syscall + fat_utime/.utime fop; touch now refreshes a file's
    modification time (NULL times => now) in addition to creating it.

Document the RTC-backed timestamps in user_space.rst.
Phase 4, milestone 1 — the full AVZ+SO3 boot image for verdin now builds
(SO3 kernel, AVZ hypervisor and the combined ITB), reusing the existing
imx8mp AVZ/ATF/OP-TEE foundations and staying independent of Torizon and the
e1c capsules (edgem1 keeps the verdin BSP in meta-e1c; this is capsule-free).

  - so3/so3/dts/verdin-imx8mp.dts: SO3 *guest* device tree — verdin hardware
    nodes (GICv3, i.MX UART3, CP15 timer) with the guest RAM window at
    0x48000000 (above AVZ's region), mirroring edgem1's avz-guest dts.
  - so3/so3/dts/Makefile: build verdin-imx8mp.dtb alongside the AVZ dtb.
  - so3/target/verdin_imx8mp_avz_so3.its: single ITS bundling AVZ (0x40080000)
    + AVZ dtb + the SO3 guest (0x48000000) + guest dtb + rootfs.fat.
  - build/conf/local.conf: verdin-imx8mp block (SO3 guest config + ITS, AVZ
    config, imx8mp ATF/OP-TEE plat ids and opts, toolchain) — mirrors edgem1.
  - build/meta-bsp/.../bsp_verdin-imx8mp.inc: clean (capsule-free) platform
    glue; stages imx-boot + the SO3 ITB for flashing.
  - build/meta-filesystem/classes/fs_verdin-imx8mp.bbclass: verdin storage
    class (ported from edgem1's meta-filesystem, no capsule deps).

Verified by building with IB_PLATFORM=verdin-imx8mp: so3 kernel, avz hypervisor
and bsp-so3 do_itb all succeed, producing verdin_imx8mp_avz_so3.itb. Booting on
the real board (flashing imx-boot + ITB, e.g. via TEZI) is the next step.
Build verdin's imx-boot (flash.bin = SPL + ATF(BL31) + OP-TEE + U-Boot via
imx_mkimage) — meta-uboot only had the QEMU U-Boot 2022.04.

  - meta-uboot: port uboot_2024.07.bb (+ 40 Toradex/verdin U-Boot patches) from
    edgem1; do_build assembles flash.bin from ATF's bl31.bin, OP-TEE's tee.bin
    and U-Boot. Decoupled from meta-e1c (IB_E1C_BSP_PATH): the 4 NXP lpddr4 DDR
    training blobs imx_mkimage consumes (68 KB) are vendored capsule-free under
    meta-bsp/recipes-bsp/bsp/verdin-imx8mp/ (one flat level — under bsp/ it is
    understood to be firmware), referenced via IB_IMX_FIRMWARE_PATH.
  - bsp-so3: verdin build deps = usr-so3 + uboot + avz (U-Boot is always needed,
    just wrapped into imx-boot here), via the platform-overridable
    IB_BSP_BUILD_DEPENDS.
  - local.conf: PREFERRED_VERSION_uboot:verdin-imx8mp = 2024.07 + the firmware
    path. virt64 keeps U-Boot 2022.04 (per-platform PREFERRED_VERSION).

Verified: build.sh -a bsp-so3 with IB_PLATFORM=verdin-imx8mp builds ATF, OP-TEE,
AVZ, SO3, U-Boot 2024.07 (flash.bin, 1.9 MB) and the rootfs/usr (63 tasks).
bsp_verdin-imx8mp.inc do_platform_deploy assembles a Toradex TEZI image set from
the official base image (image.json + partition layout) with our imx-boot
(flash.bin), a tiny boot.scr that bootm's the AVZ+SO3 ITB, and the ITB itself;
image.json is patched (autoinstall + filelist) to install them. No capsule /
OSTree rootfs — SO3's rootfs rides inside the ITB. Ported from edgem1's verdin
deploy, dropping all e1c/Torizon-rootfs machinery.
SO3 can now run standalone on the Verdin iMX8MP (no AVZ), as well as the AVZ
guest. The two differ by config/ITS/EL; one imx-boot serves both.

  - GIC version decoupled from the platform: SO3's native GICv3 is the AVZ EL2
    hypervisor path only (avz_el2_irq_handle virtualises IRQs to a guest), so
    standalone can't use it. arch/arm64/Kconfig: VERDIN_IMX8MP now `select GIC`
    (not GIC_V3); devices/irq/Kconfig gives GIC_V2/GIC_V3 prompts so the
    defconfig picks the version. The AVZ-guest config keeps GIC_V3; standalone
    uses GIC_V2 against the iMX8MP GICv3's GICv2-compat MMIO interface (GICC,
    ARE_NS=0), reusing SO3's proven EL1 GICv2 path. virt64/rpi4 keep select GIC_V2.
  - configs/verdin-imx8mp_defconfig: standalone (no AVZ; PROC_ENV, RAMDEV rootfs,
    FAT, IPC) + verdin devices (GICv2-compat, i.MX UART3, CP15 timer).
  - dts/verdin-imx8mp_standalone.dts: DRAM @ 0x40000000, GICv2 <GICD, GICC> view.
  - target/verdin_imx8mp_so3.its: SO3 (type kernel) + dtb + rootfs.fat, no AVZ.
  - U-Boot EL1 switch knob IB_UBOOT_SWITCH_EL1: when "1", build imx-boot with
    CONFIG_ARMV8_SWITCH_TO_EL1 so U-Boot hands SO3 off at EL1 (SO3 standalone is
    EL1-native; the AVZ guest keeps EL2). No SO3 changes needed for the EL.
  - boot.scr/deploy generalised to a fixed boot.itb name so one TEZI deploy
    serves both variants. local.conf documents the standalone selection.

Build-verified (IB_PLATFORM=verdin-imx8mp): standalone so3 (GIC_V2) + u-boot with
EL1 switch + verdin_imx8mp_so3.itb; AVZ-guest so3 (GIC_V3) and virt64 (GIC_V2)
still build. Runtime (ARE_NS=0 GICv2-compat, EL1 boot) to validate on hardware.
…ames

so3/target/{virt64,virt32,rpi4_64}.its were symlinks to SO3 ITS variants
(e.g. virt64.its -> virt64_so3.its). SO3 builds never use them — IB_TARGET_ITS:
so3:* names the specific ITS directly (virt64_avz_so3, virt32_so3, ...) and no
script references the bare names. They only fed bsp-linux's ${IB_PLATFORM}.its
bare path, which (pointing at an SO3 ITS) was already vestigial/broken. Remove
them. (.itb files in target/ are gitignored build artifacts.)
The 32-bit Raspberry Pi 4 target was unused (no IB_TARGET_ITS:so3:rpi4; only a
vestigial IB_TARGET_ITS:linux:rpi4_64="rpi4" pointing at it). Remove it whole:
  - so3/target/rpi4.its, dts/rpi4.dts, configs/rpi4_defconfig,
    arch/arm32/rpi4/ (board: Makefile, platsmp.c, mach/io.h)
  - Kconfig RPI4 (arm32 platform choice), Makefile + dts/Makefile RPI4 hooks,
    the `# CONFIG_RPI4 is not set` lines in the virt32 defconfigs
  - local.conf rpi4 entries (IB_PLATFORM/STORAGE_DEVICE/ROOTFS_PARTITION/
    PLAT_CPU:rpi4 and the linux:rpi4_64="rpi4" ITS ref)

rpi4_64 (64-bit Pi) is untouched. Verified: virt32 still builds.
__get_avz_fdt_paddr() and fdt_property_read_u64() read the FIT <load>
property as a fixed 64-bit (8-byte) value. The ITS emits <load> as a
32-bit (4-byte) cell for addresses that fit in 32 bits (e.g.
0x46000000), so the 8-byte read pulls in the following FDT token
(FDT_END_NODE = 0x00000002), producing a bogus pointer 0x4600000000000002.

AVZ then dereferenced that as __fdt_addr in early boot (fdt_ro_probe_ ->
fdt32_ld), taking a Data Abort (esr 0x96000000, address-size fault) at
EL2 before VBAR_EL2 was set, which U-Boot's handler caught and reset.

Read <load>/<entry> according to the property's actual length (4 -> u32,
>=8 -> u64). Covers __get_avz_fdt_paddr's MMU-off byte read and the
loadAgency() load/entry reads via the shared helper.
AVZ's loadAgency() expected the guest (kernel + flat_dt + ramdisk) in the
same FIT as avz_dt (single-FIT model). The edgem1 e1c boot — and now so3
itself — keep the guest in a SEPARATE ITB so the components stay
decoupled (AVZ ITB + guest/capsule ITB). e1c-boot already passes that
second ITB in x1 and places its loadables at their load addresses.

- head.S: preserve x1 (the agency/guest ITB paddr) into a new global
  __agency_itb_paddr, alongside x0 -> __fdt_addr (the AVZ FIT).
- loadAgency(): read avz_dt from the AVZ FIT (x0), but read the guest
  load/entry, flat_dt and ramdisk from __agency_itb_paddr (x1). The
  agency's device tree is now its own flat_dt (placed by the bootloader),
  not the AVZ FIT; get_mem_info() and the initrd PA->IPA fixup operate on
  that guest flat_dt.

so3's own AVZ ITS still needs splitting into AVZ ITB + guest ITB (follow-up);
edgem1's e1c ITB already provides the separate guest ITB.
loadAgency() only fixed up pre-existing linux,initrd-start/end in the
guest flat_dt (single-FIT assumption). With the guest in a separate ITB
the flat_dt has no initrd props, so the guest panicked with "VFS: Unable
to mount root fs".

Mirror load_S3C: read the ramdisk loadable (load addr + size) from the
guest ITB, convert PA->IPA, and publish linux,initrd-start/end into the
guest's /chosen (created by e1c-boot). No memcpy — the bootloader already
placed the ramdisk at its load address.
Mirror the edge-m1 e1c component separation on the so3 side: the SO3
guest now lives in its own ITB instead of the single AVZ+guest FIT, so
AVZ loads it from x1 (matching the new loadAgency separate-ITB path).

- target/virt64_avz_so3.its: AVZ ITB (avz kernel + avz_dt, config "avz").
- target/virt64_so3_guest.its: SO3 guest ITB (so3 + flat_dt + ramfs,
  config "capsule"; generic names — no e1c references on the so3 side).
- bsp-so3 do_itb: also build <plat>_so3_guest.itb when its ITS exists.
- bsp_virt64.inc __do_platform_deploy: stage both ITBs + the e1c-boot uEnv
  when a guest ITB is present; else the single-ITB bootm deploy.
- files/uEnv_virt64_avz.txt: load both ITBs and jump via the e1c-boot
  U-Boot command (the command lives in U-Boot, fine to use from so3).
- avz recipe: fetch from local so3 HEAD (carries the AVZ option-B work).
- local.conf: IB_CONFIG:so3:virt64 = virt64_defconfig (SO3 guest runs
  fully-virtualized under AVZ with the ARM timer; virt64.dtb).

Verified: SO3 boots under AVZ from the separate ITB to the `/ %` prompt.
Reflect the split of the AVZ hypervisor and its SO3 guest into two
separate FIT images on virt64 (commit 27cc306): the AVZ ITB
(virt64_avz_so3.its) now carries only the hypervisor + avz_dt, and the
guest lives in virt64_so3_guest.its, loaded by AVZ via U-Boot's e1c-boot
command (AVZ FIT in x0, guest ITB in x1). rpi4_64/verdin still use the
combined single-ITB AVZ image.

- build_system.rst: correct the ITS table, add the guest ITB row, and a
  new "Two-ITB AVZ boot (virt64)" section (label two_itb_boot) covering
  do_itb, __do_platform_deploy and the e1c-boot uEnv.
- architecture.rst: boot-flow prose + step 1 (two ITBs, e1c-boot x0/x1).
- avz.rst: loadAgency parses the guest from the x1 ITB; initrd->/chosen.
- user_guide.rst: note that virt64 builds/deploys two ITBs automatically.
- capsules.rst: cite both the AVZ and guest ITBs for the demo.
Align the so3 ITS naming/layout with the edge-m1 convention, where the
AVZ ITB is <plat>_avz.its (hypervisor + avz_dt only) and the guest is a
separate ITB. Drop the legacy _so3 suffix on the AVZ ITB and extend the
two-ITB split (already done for virt64) to rpi4_64 and verdin.

ITS (so3/target/):
- rename virt64_avz_so3.its -> virt64_avz.its (already AVZ-only).
- split rpi4_64_avz_so3.its    -> rpi4_64_avz.its + rpi4_64_so3_guest.its.
- split verdin_imx8mp_avz_so3.its -> verdin_imx8mp_avz.its + verdin_imx8mp_so3_guest.its.
  (also fixes rpi4's broken incbin path ../avz -> ../../avz.)
- per-platform load addresses preserved.

Build/deploy:
- bsp-so3 do_itb: derive the guest ITS from IB_TARGET_ITS
  (<plat>_avz -> <plat>_so3_guest), not IB_PLATFORM — keeps underscore
  naming on hyphenated platforms (verdin-imx8mp) and avoids a false
  2-ITB positive in standalone mode.
- bsp_arm_common.inc: factor the 2-ITB deploy here, gated on the ITS
  ending in _avz; virt64 and rpi4 both use it. bsp_virt64.inc reduces to
  a delegating call.
- add uEnv_rpi4_64_avz.txt (load both ITBs from FAT + e1c-boot).
- verdin: stage boot_avz.itb + boot_guest.itb + new boot_avz.scr.txt
  (e1c-boot); single-ITB bootm kept for the standalone ITS.
- local.conf: IB_TARGET_ITS:so3:{virt64,verdin-imx8mp} -> *_avz.

Docs: build_system/architecture/avz/capsules/user_guide updated; the
two-ITB section generalised to all three platforms.

Validated: bitbake parse OK; virt64 AVZ + guest ITBs assemble via mkimage
with the expected sub-images/addresses. rpi4_64/verdin split + e1c-boot
wiring is in place but not yet validated on hardware (noted in the docs).
Daniel Rossier added 9 commits June 17, 2026 14:18
The **/build .gitignore rule (.gitignore:17) silently keeps recipe source
patches out of the index; they must be force-added like the 187 already
tracked. 25 patches referenced by committed recipe metadata (atf, linux,
buildroot, lvgl) were never added, so the clean CI checkout failed parsing
with "Unable to get checksum for ... SRC_URI entry". Force-add them.
**/build silently ignored the whole /build metadata tree, so recipe patches
had to be force-added; a forgotten one broke CI parsing (fixed in a1624e7).
Un-ignore /build, re-ignore output one level down, re-include conf + meta-*
source dirs. Anchor the bare 'atf' rule to /atf so it stops matching the
meta-atf recipe dir. Ignore generated build/conf/auto.conf and *.orig/*.rej.
do_build compiles gcc 12.4.0 with in-tree mpfr/gmp/mpc. When the source
tree has inconsistent timestamps (configure.ac newer than configure, as on
a fresh copy that doesn't preserve mtimes), make tries to regenerate the
autotools files and invokes automake-1.17/autoconf. The so3-env CI image
ships no autotools (and Ubuntu 24.04 has automake 1.16, not the 1.17 mpfr
wants), so do_build died with 'automake-1.17: command not found'.

--disable-maintainer-mode turns the regen rules into no-ops, making the
toolchain build environment- and timestamp-independent. Verified by
reproducing the exact failure in the so3-env container and confirming the
flag builds gcc past the mpfr stage.
Temporary diagnostic: the toolchain build fails only on GitHub-hosted
runners (passes locally and on a self-hosted box with the same image and
commit), and the inner log.do_build is never shown in the CI console.
Print nproc/df/free and tail the failing toolchain log so we can see the
actual error. To be reverted once diagnosed.
The CI failure on 32852c3 was transient: the same build logic passed on
re-run (and passes locally + on a self-hosted box). The runner had ample
disk (85G) and RAM (16G), so the cause was a flaky mirror download — musl-
cross-make fetches tarballs from ftpmirror.gnu.org during do_build with a
no-retry 'wget -c -O'. Override DL_CMD with --tries/--waitretry/--timeout so
a single bad mirror recovers instead of failing the toolchain build.

Also revert the temporary build.yml diagnostic (cedbc42) now that the
root cause is understood; the workflow is back to its clean form.
Reproduces .github/workflows/build.yml without pushing: exports the
git-tracked tree into a throwaway dir under $HOME (snap/rootless Docker
cannot bind-mount /tmp) and runs the exact 'build.sh -k so3' + 'build.sh -x
usr-so3' in the so3-env image, per platform. Mounting only tracked files
means untracked-but-referenced sources fail locally exactly as in CI, and
build/tmp is excluded so the toolchain builds from scratch. Use -r <ref>
for an exact committed state.
The toolchain build failed intermittently in CI (FAIL/FAIL/PASS/PASS/FAIL
across runs), always at musl-toolchain do_build, very early and with no
build output — i.e. a download failure. musl-cross-make's default GNU_SITE
is ftpmirror.gnu.org, which 302-redirects to a random mirror; incomplete
mirrors 404 and wget --tries just re-hits the same redirect. Pin GNU_SITE to
the canonical https://ftp.gnu.org/gnu (complete, no random mirror); keep the
wget retries as a safety net.

Also keep a minimal on-failure dump of the toolchain do_build log in CI so
any residual download flake is diagnosable without a separate commit.
The Check Code Style workflow was red (pre-existing): after the Infrabase
migration the SO3 sources moved under so3/, so check-path 'so3' swept in
vendored code (micropython, libxml2) and check-path 'usr/src' (no longer a
real dir) silently fell back to scanning the whole repo. clang-format also
flagged genuinely-misformatted first-party files.

- Point check-paths at the real nested dirs: so3/so3 (kernel) and so3/usr
  (user space); exclude vendored trees (micropython, libxml2, usr/lib/linux,
  lvgl).
- Reformat the 13 tracked first-party files that violated the repo's own
  .clang-format (5 kernel, 7 usr/lib/slv, fb_test.c).

Verified by replicating the action's exact logic (find + exclude regex,
clang-format 19, --style=file) over the tracked tree: both jobs report 0
failures.
Make the generic build/ files byte-identical to edgem1 where they should
be (meld-minimal), while keeping the torizon/e1c separation intact:

- restore the EDGEMTech copyright headers on the generic layer files
  (meta-so3/meta-qemu/meta-rootfs/meta-filesystem/meta-uboot layer.conf,
  avz/so3 bbclass, bsp-so3, rootfs-so3, so3_6.2.0)
- drop the dead utils_restore_user_ownership() call in usr-so3 (undefined,
  error-path only)
- drop a stray whitespace line in rootfs-linux
@daniel-rossier daniel-rossier force-pushed the 255-migrate-the-so3-build-system-to-infrabase-and-move-to-the-new-so3-logo branch from 4934926 to 97585d5 Compare June 17, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate the so3 build system to InfraBase and move to the new SO3 logo

1 participant