h264 transport with memory2 support#2472
Conversation
d08365e to
dc578e3
Compare
Greptile SummaryThis PR introduces an opt-in H.264 transport and storage path for DimOS
Confidence Score: 4/5Safe to merge with the caveat that the live LCM decode path has two narrow exception-handling gaps that could crash the subscriber thread on corrupted or malformed H.264 data. The memory2 storage path is well-tested and the GopBuffer logic is correct. The live transport path catches only (VideoDecodeGapError, ValueError) around the PyAV decoder call in H264EncoderMixin.decode; av.error.FFmpegError subclasses raised by PyAV for corrupted H.264 bitstreams would escape through PubSubEncoderMixin.subscribe and crash the subscriber thread. A separate gap exists in _unpack_encoded_image_payload where a payload starting with b'DIMI1' but too short for the 4-byte length field raises struct.error, also not converted to DecodingError. Both are straightforward fixes. dimos/protocol/pubsub/impl/h264_lcm.py (exception handling in decode) and dimos/msgs/sensor_msgs/Image.py (_unpack_encoded_image_payload) Important Files Changed
Sequence DiagramsequenceDiagram
participant Publisher
participant H264EncoderMixin
participant H264Encoder
participant AiortcH264Codec
participant LCM
Publisher->>H264EncoderMixin: encode(raw Image)
H264EncoderMixin->>H264Encoder: encode(image)
H264Encoder->>AiortcH264Codec: encode_image(image, force_keyframe)
AiortcH264Codec-->>H264Encoder: (access_unit bytes, pts)
H264Encoder-->>H264EncoderMixin: "Image(encoding=h264, codec_metadata)"
H264EncoderMixin->>LCM: publish(image.lcm_encode())
participant Subscriber
participant PubSubMixin as PubSubEncoderMixin
participant H264Decoder
participant GopBuffer
LCM->>PubSubMixin: wrapper_cb(bytes)
PubSubMixin->>H264EncoderMixin: decode(bytes, topic)
H264EncoderMixin->>H264EncoderMixin: Image.lcm_decode encoded Image
alt "decode_images=True"
H264EncoderMixin->>H264Decoder: decode(encoded Image)
H264Decoder->>GopBuffer: accept(image)
alt GOP valid
GopBuffer-->>H264Decoder: True
H264Decoder->>AiortcH264Codec: decode_image(image)
AiortcH264Codec-->>H264Decoder: raw Image
H264Decoder-->>H264EncoderMixin: raw Image
else GOP gap
GopBuffer-->>H264Decoder: False
H264Decoder-->>H264EncoderMixin: VideoDecodeGapError to DecodingError
end
else "decode_images=False"
H264EncoderMixin-->>PubSubMixin: encoded Image for memory2 recording
end
PubSubMixin->>Subscriber: callback(decoded Image)
Reviews (4): Last reviewed commit: "fix: address h264 review comments" | Re-trigger Greptile |
|
|
||
| annotated = draw_bounding_box( | ||
| image.data.copy(), | ||
| image.require_raw("SecurityModule._detection_step").copy(), |
There was a problem hiding this comment.
why change image type? doesn't this break now if using diff transport? you should transport the actual type modules require, feel free to use some shim that reconstructs the image on .data access
There was a problem hiding this comment.
@TomCC7 I think it's better to leave Image as is. If you want to have an encoded image message, can't you create a new message type? I think this change complicates the current Image class too much. We had this in the past as well where Image supported CUDA storage and the Image was overly complex.
There was a problem hiding this comment.
For module-to-module transport cases, since all encode-decode are handled within transport layer, the image shouldn't require an overload. This addition was required to store encoded videoframe directly into recorder so that we enjoy smaller file size. And since we are doing encoding in transport layer, the message type can't be changed in the current design.
The other possible choice is to add a dedicated module to convert type, or directly modify recorder's storage process to encode directly. But then we'll need to duplicate the encode logic into separate places.
There was a problem hiding this comment.
@paul-nechifor @leshy created a new pr that patches to this pr with a new design. Can check if that's the right direction: #2504. It basically mimic the current way of handling jpeg for transport/storage.
Generally I would say the core conflict here is that we are handling two usecases - a new transport that leverage video as a more efficient encoding during transportation and a new storage format for recorder - both need compression but they are in different layer (so we have to somewhat duplicate the encode/decode logic in two places).
One possible future cleanup I mentioned to @leshy before would be to reposition the recorder closer to the transport/import layer, so it can naturally record the already-encoded transport payload instead of receiving decoded module messages and encoding again for storage.
That would look more like:
+--------------------+
| recorder/storage |
| stores encoded pkt |
+---------^----------+
|
Module Out[Image] -> encode once -> wire packet -> decode -> Module In[Image]
That may be a cleaner long-term architecture because transport and recording could share the same encoded packet path directly. But it is a larger architectural change. This patch keeps the existing module/recorder model and follows the repo’s current JPEG-style layering while avoiding encoded Image semantics.
| config: Any | None = None, | ||
| decode_images: bool = True, | ||
| **kwargs: Any, | ||
| ) -> None: # type: ignore[no-untyped-def] |
There was a problem hiding this comment.
The current transports have type: ignore for legacy reasons. You shouldn't copy that. Use proper types.
| def is_encoded(self) -> bool: | ||
| return not self.is_raw | ||
|
|
||
| def require_raw(self, operation: str = "operation") -> np.ndarray[Any, np.dtype[Any]]: |
There was a problem hiding this comment.
This is a strange API. Why would require_raw need operation just to print a message in case it errors. The callee is already available in the traceback.
There was a problem hiding this comment.
yeah these are purely vibed to pass the mypy check. need to be addressed together with how we handle image overloading
Summary
Adds an opt-in H.264 image path for DimOS typed
Imagestreams.This PR keeps the public module contract as
Out[Image]/In[Image], preserves JPEG as the default memory2 image codec, and adds H.264 as an explicit transport/storage codec path:H264LcmTransportcompresses liveImagestreams over LCM, should be straightforward to extend to other transport as wellImagecan now carry raw pixels or encoded H.264 access-unit bytes throughencodingandcodec_metadata.H264ImageCodecstores encoded H.264Imagevalues through the existing memory2Backend+Codecpath.Key design decisions
Preserve
Imageas the public stream typeModules still publish and subscribe to
Image. H.264 is an opt-in codec detail, not a new default module payload type.Imagenow has explicit codec fields:encoding="raw":datais a NumPy pixel array.encoding="h264":datais one complete H.264 Annex B access unit for one source frame.Pixel operations require raw images and fail clearly for encoded images.
Keep memory2 generic
H.264 storage uses the normal memory2 codec path:
There is no H.264-specific memory2 backend, payload-strategy framework, or GOP index table in
StoreorBackend.Default image storage remains unchanged:
Let transport own live encode/decode state
H264LcmTransporthas two receive modes:decode_images=True: normal subscribers receive decoded rawImagevalues.decode_images=False: subscribers receive encodedImage(encoding="h264")values, suitable for memory2 recording.Match Foxglove-style packet shape
Each encoded image contains all H.264 Annex B NAL units emitted for one encoder input frame. This removes RTP-fragment handling from memory2 and LCM consumers.
A complete encoded frame packet is not necessarily independently decodable. P-frames still require prior GOP state.
Current limitations
The first version is best effort over LCM. H.264 delta frames depend on prior GOP state, so a subscriber that joins late, or a replay decoder that starts from an arbitrary timestamp, may first receive frames that cannot be decoded on their own. In that case, the decoder suppresses output until the next keyframe arrives. Video starts showing from that keyframe onward.
This is the same underlying problem for live late-join and memory2 random seek: DimOS does not yet have a uniform way to request, resend, or cache the keyframe needed to bootstrap decoding. A future DimOS QoS layer should solve this at the transport/session level by expressing reliability, keyframe request/resend, durable keyframe cache, or PLI-like recovery policy outside the H.264 codec itself.
ROS2 has a durability QoS setting that can resend old messages when new subscriber joins. Implementing something like that should be able to resolve the current issue.
Other current limits:
Benchmark
Adds
demo-h264-storage-benchmark, which records the same frames into two memory2 DBs:benchmark_jpeg.dbuses the default JPEG image codec.benchmark_h264.dbuses H.264 encoded images.On the OpenCV
vtest.avisample, resized to 320 x 240 and recorded for 150 frames:H.264 used 59.8% of the JPEG storage size for this sample.
Verification
Run