Skip to content

h264 transport with memory2 support#2472

Open
TomCC7 wants to merge 15 commits into
mainfrom
cc/mem2-image-encode
Open

h264 transport with memory2 support#2472
TomCC7 wants to merge 15 commits into
mainfrom
cc/mem2-image-encode

Conversation

@TomCC7

@TomCC7 TomCC7 commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

Adds an opt-in H.264 image path for DimOS typed Image streams.

This PR keeps the public module contract as Out[Image] / In[Image], preserves JPEG as the default memory2 image codec, and adds H.264 as an explicit transport/storage codec path:

  • H264LcmTransport compresses live Image streams over LCM, should be straightforward to extend to other transport as well
  • Image can now carry raw pixels or encoded H.264 access-unit bytes through encoding and codec_metadata.
  • H264ImageCodec stores encoded H.264 Image values through the existing memory2 Backend + Codec path.
  • Demo blueprints exercise synthetic, webcam, replay, Rerun, and JPEG-vs-H.264 storage benchmark flows.

Key design decisions

Preserve Image as the public stream type

Modules still publish and subscribe to Image. H.264 is an opt-in codec detail, not a new default module payload type.

Image now has explicit codec fields:

encoding: str = "raw"
codec_metadata: dict[str, Any] = {}
  • encoding="raw": data is a NumPy pixel array.
  • encoding="h264": data is one complete H.264 Annex B access unit for one source frame.

Pixel operations require raw images and fail clearly for encoded images.

Keep memory2 generic

H.264 storage uses the normal memory2 codec path:

store.stream("color_image", Image, codec="h264")

There is no H.264-specific memory2 backend, payload-strategy framework, or GOP index table in Store or Backend.

Default image storage remains unchanged:

store.stream("color_image", Image)  # JPEG

Let transport own live encode/decode state

H264LcmTransport has two receive modes:

  • decode_images=True: normal subscribers receive decoded raw Image values.
  • decode_images=False: subscribers receive encoded Image(encoding="h264") values, suitable for memory2 recording.

Match Foxglove-style packet shape

Each encoded image contains all H.264 Annex B NAL units emitted for one encoder input frame. This removes RTP-fragment handling from memory2 and LCM consumers.

A complete encoded frame packet is not necessarily independently decodable. P-frames still require prior GOP state.

Current limitations

The first version is best effort over LCM. H.264 delta frames depend on prior GOP state, so a subscriber that joins late, or a replay decoder that starts from an arbitrary timestamp, may first receive frames that cannot be decoded on their own. In that case, the decoder suppresses output until the next keyframe arrives. Video starts showing from that keyframe onward.

This is the same underlying problem for live late-join and memory2 random seek: DimOS does not yet have a uniform way to request, resend, or cache the keyframe needed to bootstrap decoding. A future DimOS QoS layer should solve this at the transport/session level by expressing reliability, keyframe request/resend, durable keyframe cache, or PLI-like recovery policy outside the H.264 codec itself.

ROS2 has a durability QoS setting that can resend old messages when new subscriber joins. Implementing something like that should be able to resolve the current issue.

Other current limits:

  • H.264 currently supports uint8 RGB, BGR, and grayscale images only.
  • memory2 replay of H.264 streams emits encoded images; consumers that need pixels must decode through an H.264 decoder session.
  • The memory2 path has the strongest automated coverage today. We still need more end-to-end connection tests for the full live path across module, transport, H.264 encode/decode, subscriber/replay behavior, and visualization.

Benchmark

Adds demo-h264-storage-benchmark, which records the same frames into two memory2 DBs:

  • benchmark_jpeg.db uses the default JPEG image codec.
  • benchmark_h264.db uses H.264 encoded images.

On the OpenCV vtest.avi sample, resized to 320 x 240 and recorded for 150 frames:

Codec DB size
JPEG 1,884,160 bytes
H.264 1,126,400 bytes

H.264 used 59.8% of the JPEG storage size for this sample.

Verification

Run

uv run dimos run demo-h264-webcam-rerun

@TomCC7 TomCC7 force-pushed the cc/mem2-image-encode branch from d08365e to dc578e3 Compare June 12, 2026 19:02
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 63.07420% with 418 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
dimos/protocol/video/demo_h264_video_e2e.py 39.10% 218 Missing ⚠️
dimos/protocol/video/h264.py 50.00% 93 Missing and 14 partials ⚠️
dimos/msgs/sensor_msgs/Image.py 71.01% 25 Missing and 15 partials ⚠️
dimos/protocol/pubsub/impl/h264_lcm.py 76.47% 4 Missing and 4 partials ⚠️
...ree/go2/blueprints/smart/unitree_go2_h264_video.py 72.41% 8 Missing ⚠️
dimos/core/transport.py 68.75% 5 Missing ⚠️
dimos/mapping/osm/current_location_map.py 0.00% 3 Missing ⚠️
dimos/memory2/video/h264.py 81.25% 2 Missing and 1 partial ⚠️
...rimental/temporal_memory/temporal_utils/helpers.py 0.00% 3 Missing ⚠️
dimos/memory2/codecs/base.py 75.00% 1 Missing and 1 partial ⚠️
... and 15 more
Flag Coverage Δ
OS-ubuntu-24.04-arm 63.82% <59.98%> (-0.03%) ⬇️
OS-ubuntu-latest 64.68% <61.92%> (-0.02%) ⬇️
Py-3.10 64.67% <61.92%> (-0.02%) ⬇️
Py-3.11 64.68% <61.92%> (-0.02%) ⬇️
Py-3.12 64.67% <61.92%> (-0.02%) ⬇️
Py-3.13 64.67% <61.92%> (-0.02%) ⬇️
Py-3.14 64.69% <62.19%> (-0.02%) ⬇️
Py-3.14t 64.67% <61.92%> (-0.02%) ⬇️
SelfHosted-Large 30.30% <24.13%> (?)
SelfHosted-Linux 38.18% <27.32%> (-0.11%) ⬇️
SelfHosted-macOS 36.92% <28.15%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...imos/experimental/security_demo/security_module.py 68.22% <100.00%> (ø)
dimos/mapping/occupancy/visualize_path.py 100.00% <100.00%> (ø)
dimos/msgs/sensor_msgs/test_image.py 87.91% <100.00%> (+1.32%) ⬆️
dimos/protocol/pubsub/test_registry.py 96.20% <100.00%> (+0.14%) ⬆️
dimos/robot/all_blueprints.py 100.00% <ø> (ø)
...ueprints/agentic/unitree_go2_agentic_h264_video.py 100.00% <100.00%> (ø)
...go2/blueprints/smart/unitree_go2_h264_detection.py 100.00% <100.00%> (ø)
...imos/experimental/security_demo/depth_estimator.py 29.03% <0.00%> (ø)
dimos/models/vl/moondream.py 31.86% <0.00%> (ø)
dimos/models/vl/moondream_hosted.py 34.28% <0.00%> (ø)
... and 22 more

... and 18 files with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@TomCC7 TomCC7 changed the title WIP: h264 transport with memory2 support h264 transport with memory2 support Jun 12, 2026
@TomCC7 TomCC7 marked this pull request as ready for review June 12, 2026 20:28
@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces an opt-in H.264 transport and storage path for DimOS Image streams, adding H264LcmTransport, H264ImageCodec, and the AiortcH264Codec/H264Encoder/H264Decoder stack, while preserving Image as the public stream type with JPEG remaining the default codec.

  • Image gains encoding and codec_metadata fields; lcm_encode/lcm_decode now handle a DIMI1-magic-prefixed packed format for encoded payloads, and all pixel-level operations guard on require_raw.
  • H264LcmTransport / H264EncoderMixin wire the new codec into the LCM pubsub path with a GopBuffer that suppresses non-decodable frames until the next keyframe.
  • H264ImageCodec plugs into the existing memory2 Backend/Codec path via CODEC_ID = "h264", letting callers store and replay encoded Image payloads without a new backend type.

Confidence Score: 4/5

Safe to merge with the caveat that the live LCM decode path has two narrow exception-handling gaps that could crash the subscriber thread on corrupted or malformed H.264 data.

The memory2 storage path is well-tested and the GopBuffer logic is correct. The live transport path catches only (VideoDecodeGapError, ValueError) around the PyAV decoder call in H264EncoderMixin.decode; av.error.FFmpegError subclasses raised by PyAV for corrupted H.264 bitstreams would escape through PubSubEncoderMixin.subscribe and crash the subscriber thread. A separate gap exists in _unpack_encoded_image_payload where a payload starting with b'DIMI1' but too short for the 4-byte length field raises struct.error, also not converted to DecodingError. Both are straightforward fixes.

dimos/protocol/pubsub/impl/h264_lcm.py (exception handling in decode) and dimos/msgs/sensor_msgs/Image.py (_unpack_encoded_image_payload)

Important Files Changed

Filename Overview
dimos/protocol/pubsub/impl/h264_lcm.py H264EncoderMixin decode catches (VideoDecodeGapError, ValueError) but PyAV av.error.FFmpegError subclasses can still escape and crash the subscriber thread
dimos/protocol/video/h264.py New H.264 encode/decode core with AiortcH264Codec, H264Encoder, H264Decoder, GopBuffer; GOP gap suppression logic is correct; PyAV error propagation is the main gap
dimos/msgs/sensor_msgs/Image.py Image gains encoding/codec_metadata fields, new encoded path in lcm_encode/lcm_decode; struct.error from a truncated magic header in _unpack_encoded_image_payload is not guarded
dimos/memory2/video/h264.py H264ImageCodec stores/restores encoded Images via LCM; CODEC_ID = "h264" and encode/decode checks are correct
dimos/memory2/codecs/base.py CODEC_ID attribute lookup added to _class_to_id and h264 case added to _make_one; clean and backward compatible
dimos/core/transport.py H264LcmTransport added; reduce drops **kwargs (previously flagged); start/stop lifecycle mirrors JpegLcmTransport correctly
dimos/protocol/pubsub/registry.py h264_lcm proto registered and wired to H264LcmTransport; consistent with existing jpeg_lcm pattern
dimos/protocol/video/demo_h264_video_e2e.py Demo blueprints for webcam, synthetic, replay, Rerun, and storage benchmark; _H264RecorderMixin.start duplicates Recorder.start (previously flagged)

Sequence Diagram

sequenceDiagram
    participant Publisher
    participant H264EncoderMixin
    participant H264Encoder
    participant AiortcH264Codec
    participant LCM

    Publisher->>H264EncoderMixin: encode(raw Image)
    H264EncoderMixin->>H264Encoder: encode(image)
    H264Encoder->>AiortcH264Codec: encode_image(image, force_keyframe)
    AiortcH264Codec-->>H264Encoder: (access_unit bytes, pts)
    H264Encoder-->>H264EncoderMixin: "Image(encoding=h264, codec_metadata)"
    H264EncoderMixin->>LCM: publish(image.lcm_encode())

    participant Subscriber
    participant PubSubMixin as PubSubEncoderMixin
    participant H264Decoder
    participant GopBuffer

    LCM->>PubSubMixin: wrapper_cb(bytes)
    PubSubMixin->>H264EncoderMixin: decode(bytes, topic)
    H264EncoderMixin->>H264EncoderMixin: Image.lcm_decode encoded Image
    alt "decode_images=True"
        H264EncoderMixin->>H264Decoder: decode(encoded Image)
        H264Decoder->>GopBuffer: accept(image)
        alt GOP valid
            GopBuffer-->>H264Decoder: True
            H264Decoder->>AiortcH264Codec: decode_image(image)
            AiortcH264Codec-->>H264Decoder: raw Image
            H264Decoder-->>H264EncoderMixin: raw Image
        else GOP gap
            GopBuffer-->>H264Decoder: False
            H264Decoder-->>H264EncoderMixin: VideoDecodeGapError to DecodingError
        end
    else "decode_images=False"
        H264EncoderMixin-->>PubSubMixin: encoded Image for memory2 recording
    end
    PubSubMixin->>Subscriber: callback(decoded Image)
Loading

Reviews (4): Last reviewed commit: "fix: address h264 review comments" | Re-trigger Greptile

Comment thread dimos/memory2/codecs/base.py
@TomCC7 TomCC7 requested a review from arkluc as a code owner June 12, 2026 21:00
Comment thread dimos/protocol/pubsub/impl/h264_lcm.py
@github-actions github-actions Bot added the ready-to-merge Required CI checks have passed on this PR label Jun 12, 2026

annotated = draw_bounding_box(
image.data.copy(),
image.require_raw("SecurityModule._detection_step").copy(),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change image type? doesn't this break now if using diff transport? you should transport the actual type modules require, feel free to use some shim that reconstructs the image on .data access

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomCC7 I think it's better to leave Image as is. If you want to have an encoded image message, can't you create a new message type? I think this change complicates the current Image class too much. We had this in the past as well where Image supported CUDA storage and the Image was overly complex.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For module-to-module transport cases, since all encode-decode are handled within transport layer, the image shouldn't require an overload. This addition was required to store encoded videoframe directly into recorder so that we enjoy smaller file size. And since we are doing encoding in transport layer, the message type can't be changed in the current design.

The other possible choice is to add a dedicated module to convert type, or directly modify recorder's storage process to encode directly. But then we'll need to duplicate the encode logic into separate places.

@TomCC7 TomCC7 Jun 16, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paul-nechifor @leshy created a new pr that patches to this pr with a new design. Can check if that's the right direction: #2504. It basically mimic the current way of handling jpeg for transport/storage.

Generally I would say the core conflict here is that we are handling two usecases - a new transport that leverage video as a more efficient encoding during transportation and a new storage format for recorder - both need compression but they are in different layer (so we have to somewhat duplicate the encode/decode logic in two places).

One possible future cleanup I mentioned to @leshy before would be to reposition the recorder closer to the transport/import layer, so it can naturally record the already-encoded transport payload instead of receiving decoded module messages and encoding again for storage.

That would look more like:

             +--------------------+
             | recorder/storage   |
             | stores encoded pkt |
             +---------^----------+
                       |
Module Out[Image] -> encode once -> wire packet -> decode -> Module In[Image]

That may be a cleaner long-term architecture because transport and recording could share the same encoded packet path directly. But it is a larger architectural change. This patch keeps the existing module/recorder model and follows the repo’s current JPEG-style layering while avoiding encoded Image semantics.

@github-actions github-actions Bot removed the ready-to-merge Required CI checks have passed on this PR label Jun 13, 2026
@github-actions github-actions Bot added the ready-to-merge Required CI checks have passed on this PR label Jun 13, 2026
Comment thread dimos/core/transport.py
config: Any | None = None,
decode_images: bool = True,
**kwargs: Any,
) -> None: # type: ignore[no-untyped-def]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current transports have type: ignore for legacy reasons. You shouldn't copy that. Use proper types.

def is_encoded(self) -> bool:
return not self.is_raw

def require_raw(self, operation: str = "operation") -> np.ndarray[Any, np.dtype[Any]]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a strange API. Why would require_raw need operation just to print a message in case it errors. The callee is already available in the traceback.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah these are purely vibed to pass the mypy check. need to be addressed together with how we handle image overloading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Required CI checks have passed on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants