h264 transport with memory2 support by TomCC7 · Pull Request #2472 · dimensionalOS/dimos

TomCC7 · 2026-06-11T20:41:47Z

Summary

Adds an opt-in H.264 image path for DimOS typed Image streams.

This PR keeps the public module contract as Out[Image] / In[Image], preserves JPEG as the default memory2 image codec, and adds H.264 as an explicit transport/storage codec path:

H264LcmTransport compresses live Image streams over LCM, should be straightforward to extend to other transport as well
Image can now carry raw pixels or encoded H.264 access-unit bytes through encoding and codec_metadata.
H264ImageCodec stores encoded H.264 Image values through the existing memory2 Backend + Codec path.
Demo blueprints exercise synthetic, webcam, replay, Rerun, and JPEG-vs-H.264 storage benchmark flows.

Key design decisions

Preserve `Image` as the public stream type

Modules still publish and subscribe to Image. H.264 is an opt-in codec detail, not a new default module payload type.

Image now has explicit codec fields:

encoding: str = "raw"
codec_metadata: dict[str, Any] = {}

encoding="raw": data is a NumPy pixel array.
encoding="h264": data is one complete H.264 Annex B access unit for one source frame.

Pixel operations require raw images and fail clearly for encoded images.

Keep memory2 generic

H.264 storage uses the normal memory2 codec path:

store.stream("color_image", Image, codec="h264")

There is no H.264-specific memory2 backend, payload-strategy framework, or GOP index table in Store or Backend.

Default image storage remains unchanged:

store.stream("color_image", Image)  # JPEG

Let transport own live encode/decode state

H264LcmTransport has two receive modes:

decode_images=True: normal subscribers receive decoded raw Image values.
decode_images=False: subscribers receive encoded Image(encoding="h264") values, suitable for memory2 recording.

Match Foxglove-style packet shape

Each encoded image contains all H.264 Annex B NAL units emitted for one encoder input frame. This removes RTP-fragment handling from memory2 and LCM consumers.

A complete encoded frame packet is not necessarily independently decodable. P-frames still require prior GOP state.

Current limitations

The first version is best effort over LCM. H.264 delta frames depend on prior GOP state, so a subscriber that joins late, or a replay decoder that starts from an arbitrary timestamp, may first receive frames that cannot be decoded on their own. In that case, the decoder suppresses output until the next keyframe arrives. Video starts showing from that keyframe onward.

This is the same underlying problem for live late-join and memory2 random seek: DimOS does not yet have a uniform way to request, resend, or cache the keyframe needed to bootstrap decoding. A future DimOS QoS layer should solve this at the transport/session level by expressing reliability, keyframe request/resend, durable keyframe cache, or PLI-like recovery policy outside the H.264 codec itself.

ROS2 has a durability QoS setting that can resend old messages when new subscriber joins. Implementing something like that should be able to resolve the current issue.

Other current limits:

H.264 currently supports uint8 RGB, BGR, and grayscale images only.
memory2 replay of H.264 streams emits encoded images; consumers that need pixels must decode through an H.264 decoder session.
The memory2 path has the strongest automated coverage today. We still need more end-to-end connection tests for the full live path across module, transport, H.264 encode/decode, subscriber/replay behavior, and visualization.

Benchmark

Adds demo-h264-storage-benchmark, which records the same frames into two memory2 DBs:

benchmark_jpeg.db uses the default JPEG image codec.
benchmark_h264.db uses H.264 encoded images.

On the OpenCV vtest.avi sample, resized to 320 x 240 and recorded for 150 frames:

Codec	DB size
JPEG	1,884,160 bytes
H.264	1,126,400 bytes

H.264 used 59.8% of the JPEG storage size for this sample.

Verification

Run

uv run dimos run demo-h264-webcam-rerun

codecov · 2026-06-12T19:34:08Z

Codecov Report

❌ Patch coverage is 63.07420% with 418 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
dimos/protocol/video/demo_h264_video_e2e.py	39.10%	218 Missing ⚠️
dimos/protocol/video/h264.py	50.00%	93 Missing and 14 partials ⚠️
dimos/msgs/sensor_msgs/Image.py	71.01%	25 Missing and 15 partials ⚠️
dimos/protocol/pubsub/impl/h264_lcm.py	76.47%	4 Missing and 4 partials ⚠️
...ree/go2/blueprints/smart/unitree_go2_h264_video.py	72.41%	8 Missing ⚠️
dimos/core/transport.py	68.75%	5 Missing ⚠️
dimos/mapping/osm/current_location_map.py	0.00%	3 Missing ⚠️
dimos/memory2/video/h264.py	81.25%	2 Missing and 1 partial ⚠️
...rimental/temporal_memory/temporal_utils/helpers.py	0.00%	3 Missing ⚠️
dimos/memory2/codecs/base.py	75.00%	1 Missing and 1 partial ⚠️
... and 15 more

Flag	Coverage Δ
OS-ubuntu-24.04-arm	`63.82% <59.98%> (-0.03%)`	⬇️
OS-ubuntu-latest	`64.68% <61.92%> (-0.02%)`	⬇️
Py-3.10	`64.67% <61.92%> (-0.02%)`	⬇️
Py-3.11	`64.68% <61.92%> (-0.02%)`	⬇️
Py-3.12	`64.67% <61.92%> (-0.02%)`	⬇️
Py-3.13	`64.67% <61.92%> (-0.02%)`	⬇️
Py-3.14	`64.69% <62.19%> (-0.02%)`	⬇️
Py-3.14t	`64.67% <61.92%> (-0.02%)`	⬇️
SelfHosted-Large	`30.30% <24.13%> (?)`
SelfHosted-Linux	`38.18% <27.32%> (-0.11%)`	⬇️
SelfHosted-macOS	`36.92% <28.15%> (-0.13%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...imos/experimental/security_demo/security_module.py	`68.22% <100.00%> (ø)`
dimos/mapping/occupancy/visualize_path.py	`100.00% <100.00%> (ø)`
dimos/msgs/sensor_msgs/test_image.py	`87.91% <100.00%> (+1.32%)`	⬆️
dimos/protocol/pubsub/test_registry.py	`96.20% <100.00%> (+0.14%)`	⬆️
dimos/robot/all_blueprints.py	`100.00% <ø> (ø)`
...ueprints/agentic/unitree_go2_agentic_h264_video.py	`100.00% <100.00%> (ø)`
...go2/blueprints/smart/unitree_go2_h264_detection.py	`100.00% <100.00%> (ø)`
...imos/experimental/security_demo/depth_estimator.py	`29.03% <0.00%> (ø)`
dimos/models/vl/moondream.py	`31.86% <0.00%> (ø)`
dimos/models/vl/moondream_hosted.py	`34.28% <0.00%> (ø)`
... and 22 more

... and 18 files with indirect coverage changes

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

greptile-apps · 2026-06-12T20:33:35Z

Greptile Summary

This PR introduces an opt-in H.264 transport and storage path for DimOS Image streams, adding H264LcmTransport, H264ImageCodec, and the AiortcH264Codec/H264Encoder/H264Decoder stack, while preserving Image as the public stream type with JPEG remaining the default codec.

Image gains encoding and codec_metadata fields; lcm_encode/lcm_decode now handle a DIMI1-magic-prefixed packed format for encoded payloads, and all pixel-level operations guard on require_raw.
H264LcmTransport / H264EncoderMixin wire the new codec into the LCM pubsub path with a GopBuffer that suppresses non-decodable frames until the next keyframe.
H264ImageCodec plugs into the existing memory2 Backend/Codec path via CODEC_ID = "h264", letting callers store and replay encoded Image payloads without a new backend type.

Confidence Score: 4/5

Safe to merge with the caveat that the live LCM decode path has two narrow exception-handling gaps that could crash the subscriber thread on corrupted or malformed H.264 data.

The memory2 storage path is well-tested and the GopBuffer logic is correct. The live transport path catches only (VideoDecodeGapError, ValueError) around the PyAV decoder call in H264EncoderMixin.decode; av.error.FFmpegError subclasses raised by PyAV for corrupted H.264 bitstreams would escape through PubSubEncoderMixin.subscribe and crash the subscriber thread. A separate gap exists in _unpack_encoded_image_payload where a payload starting with b'DIMI1' but too short for the 4-byte length field raises struct.error, also not converted to DecodingError. Both are straightforward fixes.

dimos/protocol/pubsub/impl/h264_lcm.py (exception handling in decode) and dimos/msgs/sensor_msgs/Image.py (_unpack_encoded_image_payload)

Important Files Changed

Filename	Overview
dimos/protocol/pubsub/impl/h264_lcm.py	H264EncoderMixin decode catches (VideoDecodeGapError, ValueError) but PyAV av.error.FFmpegError subclasses can still escape and crash the subscriber thread
dimos/protocol/video/h264.py	New H.264 encode/decode core with AiortcH264Codec, H264Encoder, H264Decoder, GopBuffer; GOP gap suppression logic is correct; PyAV error propagation is the main gap
dimos/msgs/sensor_msgs/Image.py	Image gains encoding/codec_metadata fields, new encoded path in lcm_encode/lcm_decode; struct.error from a truncated magic header in _unpack_encoded_image_payload is not guarded
dimos/memory2/video/h264.py	H264ImageCodec stores/restores encoded Images via LCM; CODEC_ID = "h264" and encode/decode checks are correct
dimos/memory2/codecs/base.py	CODEC_ID attribute lookup added to _class_to_id and h264 case added to _make_one; clean and backward compatible
dimos/core/transport.py	H264LcmTransport added; reduce drops **kwargs (previously flagged); start/stop lifecycle mirrors JpegLcmTransport correctly
dimos/protocol/pubsub/registry.py	h264_lcm proto registered and wired to H264LcmTransport; consistent with existing jpeg_lcm pattern
dimos/protocol/video/demo_h264_video_e2e.py	Demo blueprints for webcam, synthetic, replay, Rerun, and storage benchmark; _H264RecorderMixin.start duplicates Recorder.start (previously flagged)

Sequence Diagram

sequenceDiagram
    participant Publisher
    participant H264EncoderMixin
    participant H264Encoder
    participant AiortcH264Codec
    participant LCM

    Publisher->>H264EncoderMixin: encode(raw Image)
    H264EncoderMixin->>H264Encoder: encode(image)
    H264Encoder->>AiortcH264Codec: encode_image(image, force_keyframe)
    AiortcH264Codec-->>H264Encoder: (access_unit bytes, pts)
    H264Encoder-->>H264EncoderMixin: "Image(encoding=h264, codec_metadata)"
    H264EncoderMixin->>LCM: publish(image.lcm_encode())

    participant Subscriber
    participant PubSubMixin as PubSubEncoderMixin
    participant H264Decoder
    participant GopBuffer

    LCM->>PubSubMixin: wrapper_cb(bytes)
    PubSubMixin->>H264EncoderMixin: decode(bytes, topic)
    H264EncoderMixin->>H264EncoderMixin: Image.lcm_decode encoded Image
    alt "decode_images=True"
        H264EncoderMixin->>H264Decoder: decode(encoded Image)
        H264Decoder->>GopBuffer: accept(image)
        alt GOP valid
            GopBuffer-->>H264Decoder: True
            H264Decoder->>AiortcH264Codec: decode_image(image)
            AiortcH264Codec-->>H264Decoder: raw Image
            H264Decoder-->>H264EncoderMixin: raw Image
        else GOP gap
            GopBuffer-->>H264Decoder: False
            H264Decoder-->>H264EncoderMixin: VideoDecodeGapError to DecodingError
        end
    else "decode_images=False"
        H264EncoderMixin-->>PubSubMixin: encoded Image for memory2 recording
    end
    PubSubMixin->>Subscriber: callback(decoded Image)

_{Reviews (4): Last reviewed commit: "fix: address h264 review comments" | Re-trigger Greptile}

leshy · 2026-06-13T00:24:40Z


        annotated = draw_bounding_box(
-            image.data.copy(),
+            image.require_raw("SecurityModule._detection_step").copy(),


why change image type? doesn't this break now if using diff transport? you should transport the actual type modules require, feel free to use some shim that reconstructs the image on .data access

@TomCC7 I think it's better to leave Image as is. If you want to have an encoded image message, can't you create a new message type? I think this change complicates the current Image class too much. We had this in the past as well where Image supported CUDA storage and the Image was overly complex.

For module-to-module transport cases, since all encode-decode are handled within transport layer, the image shouldn't require an overload. This addition was required to store encoded videoframe directly into recorder so that we enjoy smaller file size. And since we are doing encoding in transport layer, the message type can't be changed in the current design.

The other possible choice is to add a dedicated module to convert type, or directly modify recorder's storage process to encode directly. But then we'll need to duplicate the encode logic into separate places.

@paul-nechifor @leshy created a new pr that patches to this pr with a new design. Can check if that's the right direction: #2504. It basically mimic the current way of handling jpeg for transport/storage.

Generally I would say the core conflict here is that we are handling two usecases - a new transport that leverage video as a more efficient encoding during transportation and a new storage format for recorder - both need compression but they are in different layer (so we have to somewhat duplicate the encode/decode logic in two places).

One possible future cleanup I mentioned to @leshy before would be to reposition the recorder closer to the transport/import layer, so it can naturally record the already-encoded transport payload instead of receiving decoded module messages and encoding again for storage.

That would look more like:

+--------------------+ | recorder/storage | | stores encoded pkt | +---------^----------+ | Module Out[Image] -> encode once -> wire packet -> decode -> Module In[Image]

That may be a cleaner long-term architecture because transport and recording could share the same encoded packet path directly. But it is a larger architectural change. This patch keeps the existing module/recorder model and follows the repo’s current JPEG-style layering while avoiding encoded Image semantics.

paul-nechifor · 2026-06-13T23:41:42Z

+        config: Any | None = None,
+        decode_images: bool = True,
+        **kwargs: Any,
+    ) -> None:  # type: ignore[no-untyped-def]


The current transports have type: ignore for legacy reasons. You shouldn't copy that. Use proper types.

paul-nechifor · 2026-06-13T23:45:37Z

+    def is_encoded(self) -> bool:
+        return not self.is_raw
+
+    def require_raw(self, operation: str = "operation") -> np.ndarray[Any, np.dtype[Any]]:


This is a strange API. Why would require_raw need operation just to print a message in case it errors. The callee is already available in the traceback.

yeah these are purely vibed to pass the mypy check. need to be addressed together with how we handle image overloading

TomCC7 added 8 commits June 4, 2026 13:41

spec: openspec init

fe7a7de

feat: add h264 image transport and memory2 storage

bccc44c

refactor: use payload strategy for h264 storage

4e01f9a

refactor: store h264 as encoded images

683e6a9

test: add h264 storage benchmark blueprint

adaf929

refactor: clean up h264 memory2 spillover

c540b21

feat: add distributed test

a19705c

spec: remove

dc578e3

TomCC7 force-pushed the cc/mem2-image-encode branch from d08365e to dc578e3 Compare June 12, 2026 19:02

Merge remote-tracking branch 'origin/main' into cc/mem2-image-encode

32fea13

doc: revert

91ebd3b

TomCC7 changed the title ~~WIP: h264 transport with memory2 support~~ h264 transport with memory2 support Jun 12, 2026

TomCC7 marked this pull request as ready for review June 12, 2026 20:28

TomCC7 requested review from leshy, mustafab0, paul-nechifor and spomichter as code owners June 12, 2026 20:28

greptile-apps Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread dimos/memory2/codecs/base.py

fix: narrow raw image access for mypy

f02984a

TomCC7 requested a review from arkluc as a code owner June 12, 2026 21:00

greptile-apps Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread dimos/protocol/pubsub/impl/h264_lcm.py

TomCC7 added 3 commits June 12, 2026 14:13

fix: align h264 tests with ci constraints

0588e34

test: avoid turbojpeg dependency in codec assertion

12a4239

fix: address h264 review comments

e98edac

github-actions Bot added the ready-to-merge Required CI checks have passed on this PR label Jun 12, 2026

leshy reviewed Jun 13, 2026

View reviewed changes

feat: some more examples

ee9a851

github-actions Bot removed the ready-to-merge Required CI checks have passed on this PR label Jun 13, 2026

github-actions Bot added the ready-to-merge Required CI checks have passed on this PR label Jun 13, 2026

paul-nechifor reviewed Jun 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

h264 transport with memory2 support#2472

h264 transport with memory2 support#2472
TomCC7 wants to merge 15 commits into
mainfrom
cc/mem2-image-encode

TomCC7 commented Jun 11, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

leshy Jun 13, 2026

Uh oh!

paul-nechifor Jun 13, 2026

Uh oh!

TomCC7 Jun 13, 2026

Uh oh!

TomCC7 Jun 16, 2026 •

edited

Loading

Uh oh!

paul-nechifor Jun 13, 2026

Uh oh!

paul-nechifor Jun 13, 2026

Uh oh!

TomCC7 Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TomCC7 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key design decisions

Preserve Image as the public stream type

Keep memory2 generic

Let transport own live encode/decode state

Match Foxglove-style packet shape

Current limitations

Benchmark

Verification

Uh oh!

codecov Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

leshy Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

paul-nechifor Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

TomCC7 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

TomCC7 Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paul-nechifor Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

paul-nechifor Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

TomCC7 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TomCC7 commented Jun 11, 2026 •

edited

Loading

Preserve `Image` as the public stream type

codecov Bot commented Jun 12, 2026 •

edited

Loading

greptile-apps Bot commented Jun 12, 2026 •

edited

Loading

TomCC7 Jun 16, 2026 •

edited

Loading