Skip to content

Add RPC v2 client protocol constants and URL advertisement#89

Draft
1egoman wants to merge 17 commits into
mainfrom
rpc-v2
Draft

Add RPC v2 client protocol constants and URL advertisement#89
1egoman wants to merge 17 commits into
mainfrom
rpc-v2

Conversation

@1egoman
Copy link
Copy Markdown
Collaborator

@1egoman 1egoman commented May 27, 2026

Work in progress.

Warning

This pull request was LLM generated and has only been lightly reviewed by a human. The author has tested this and confirms it works in the happy path, but no other validation has been done.

A more thorough review of this needs to occur before it could be merged.

1egoman and others added 17 commits May 27, 2026 13:39
Introduces the constants needed to negotiate RPC v1 vs v2 between peers:
- LIVEKIT_CLIENT_PROTOCOL_DEFAULT / LIVEKIT_CLIENT_PROTOCOL_DATA_STREAM_RPC
  enum, mirroring the values other LiveKit SDKs use on ParticipantInfo.
- LIVEKIT_RPC_MAX_ROUND_TRIP_MS for the ack-timeout window.

Also advertises this client's v2 support to the server via a new
&client_protocol=1 URL query parameter. The server propagates it into
the ParticipantInfo other peers see; later commits read it to choose
the transport for each RPC call.

No behavior change yet -- values are not consumed anywhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ESP-IDF v5.5.4 renamed DMA2D_DATA_BURST_LENGTH_128 to
PPA_DATA_BURST_LENGTH_128, breaking the gmf_video version transitively
pulled in by esp_capture ~0.7. The newer esp_capture 0.8.x line (latest
0.8.4, released 6 days ago) targets the renamed constants.

This is independent of the RPC v2 work and would have hit main once IDF
5.5 picked up the rename; landing it here unblocks CI for the rpc-v2
branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Post-v5.5.4 IDF on the release-v5.5 branch renamed
DMA2D_DATA_BURST_LENGTH_128 -> PPA_DATA_BURST_LENGTH_128, which breaks
the gmf_video version transitively pulled in by esp_capture ~0.7. Bumping
esp_capture to ~0.8 produced a downstream solver conflict (and triggered
a separate bug in idf_component_manager's failure formatter), so the
quickest path back to green CI is to pin the 5.5 matrix entry to the
v5.5.3 tag, which pre-dates the rename.

This is a stopgap; the proper fix is to update gmf_video (or whichever
ancestor dep pins it) once a compatible release ships.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the wire field that other LiveKit SDKs already populate on
ParticipantInfo. Once read (next commit), peers can be classified as
RPC v1 (DEFAULT = 0) or RPC v2 (DATA_STREAM_RPC = 1) so the SDK can
pick the right transport for each RPC call.

The nanopb toolchain is not available locally, so livekit_models.pb.h
is hand-edited to mirror what protoc-gen-nanopb would emit:
- new int32_t field appended to livekit_pb_participant_info_t
- INIT_DEFAULT / INIT_ZERO macros extended with a trailing 0
- LIVEKIT_PB_PARTICIPANT_INFO_CLIENT_PROTOCOL_TAG = 19
- FIELDLIST X-macro row (STATIC, SINGULAR, INT32, tag 19)

livekit_models.pb.c needs no change because it binds via
PB_BIND(..., AUTO), which derives the descriptor from FIELDLIST.

A proper protoc regen should produce a clean delta against these edits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each ParticipantInfo carries a client_protocol field (chunk 2). To decide
v1 vs v2 RPC transport per peer, the room layer now keeps an
identity -> client_protocol map populated from the existing
on_eng_participant_info callback:

- New entries are added with their advertised protocol (clamped to the
  values we recognize; everything unrecognized falls back to DEFAULT,
  per spec).
- DISCONNECTED participants are removed.
- Local identity is captured on first join and cleared on disconnect.

Helpers (set / remove / get / destroy) are file-static; later commits
expose `get` to the RPC client and server managers via callback when
those managers are wired in.

No public API change; current callers still see ParticipantInfo through
the existing user callback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The reader already receives the sender identity at header time and
forwards it via on_open, but it was dropped before chunk and trailer
callbacks fired. RPC v2 needs sender_identity at response arrival to
validate the response came from the destination of the request (spec
test 14), and other consumers will want it too.

The per-stream descriptor now strdups the sender identity on header
arrival and frees it when the stream ends, exposing it through the
public livekit_data_stream_chunk_t and livekit_data_stream_trailer_t
structs. Zero-initialized struct literals at existing call sites still
compile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The proto-defined map<string, string> attributes on DataStream.Header
and DataStream.Trailer was being silently dropped (FT_IGNORE). Switching
them to FT_CALLBACK lets the writer emit a real attributes map, which
RPC v2 needs for lk.rpc_request_id, lk.rpc_request_method, and friends.

Hand-edited generated files mirror what protoc-gen-nanopb produces:
- pb_callback_t attributes field on header and trailer structs
- New AttributesEntry submessage types with pb_callback_t key/value
- Matching FIELDLISTs, INIT macros, tag defines, PB_BIND entries, and
  message-type aliases. The header CALLBACK macro flips from NULL to
  pb_default_field_callback.

Public API: livekit_data_stream_attribute_t plus attributes/_count on
livekit_data_stream_options_t (sender side) and
livekit_data_stream_header_t (receiver side; populated in a later
commit by the decoder).

Writer side: send_header now wires per-call encode callbacks that
emit each (key, value) pair as a repeated AttributesEntry submessage.
The receive side still drops attributes for now (nanopbs auto-allocated
submessage cant be given a decode callback) -- a follow-up commit will
plumb raw bytes through and do a manual second-pass parse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The encode side (previous commit) emits attributes through nanopb
callbacks, but the decode side cant: nanopb auto-allocates
DataStream.Header as FT_POINTER and zeroes its pb_callback_t fields,
so the field-level callback gets called with NULL funcs.decode and
silently skips the attribute bytes.

Workaround: plumb the raw DataPacket bytes through the receive chain
and walk them manually with pb_decode_tag / pb_make_string_substream
to extract the attribute map. The standard pb_decode flow is left
unchanged.

Changes:
- protocol.{c,h}: new protocol_data_packet_extract_attributes() and
  matching _free() that descend into a chosen submessage tag (Header
  or Trailer) and decode its repeated AttributesEntry submessages
  into a malloc array.
- peer.{c,h}, engine.{c,h}: on_data_packet signature gains raw bytes.
  The bytes come straight from the WebRTC frame and are valid for the
  duration of the synchronous callback chain.
- livekit.c, data_stream_reader.{c,h}: handle_header receives the raw
  bytes, extracts the attribute map, surfaces it through on_open, and
  frees it after the callback returns.

Trailer attribute decoding is wired through the same helper but no
caller passes the trailers raw bytes yet -- thats fine for RPC v2,
which only needs request/response header attributes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds destination_identities and destination_identities_count to
livekit_data_stream_options_t. The writer deep-copies the list on
open so the caller does not need to keep it alive for the lifetime of
the stream, stamps every outgoing header/chunk/trailer DataPacket with
it, and frees the copies on close (or any failure path).

When the field is NULL or empty the stream continues to broadcast to
every participant in the room, matching the prior behavior. RPC v2
uses this to direct lk.rpc_request and lk.rpc_response streams to a
single peer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This module only owns the handler/server side of RPC: it receives
incoming RpcRequest packets, dispatches them to registered handlers,
sends RpcAck and RpcResponse packets back. The caller side
(performRpc, pending request tracking, response correlation) does not
yet exist and will land in a sibling rpc_client_manager. Naming both
explicitly keeps the split obvious and matches the RPC v2 spec's
recommended naming.

Pure rename: rpc_manager_* identifiers become rpc_server_manager_*,
RPC_MANAGER_ERR_* becomes RPC_SERVER_MANAGER_ERR_*, the .c/.h files
move, and the room field room->rpc_manager becomes room->rpc_server.
The component CMakeLists uses SRC_DIRS so it auto-discovers the
renamed file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The handler side of RPC has been working for a while; this commit fills
in the missing caller side. A new rpc_client_manager owns the pending
request map, runs ack/response timers, and matches incoming RpcAck and
RpcResponse packets back to the originating invocation.

Public surface in livekit.h:
- livekit_rpc_invoke_options_t (destination, method, payload, timeout)
- livekit_room_rpc_invoke()

The result is delivered asynchronously through the existing
on_rpc_result callback on livekit_room_options_t, using the request id
the manager generated internally. The user does not see the id.

Internals:
- khash of pending entries keyed by request_id UUID.
- Two FreeRTOS xTimers per entry. ack-timeout = 7000 ms
  (LIVEKIT_RPC_MAX_ROUND_TRIP_MS), response-timeout per request
  (default 10000 ms). xTimer was chosen over esp_timer because xTimer
  defers xTimerDelete via the timer-service command queue, which makes
  it safe to delete a timer from its own callback. esp_timers
  self-delete behavior is undocumented.
- A mutex serializes invoke / handle_packet / disconnect against the
  timer callbacks. on_result is always delivered outside the mutex to
  avoid re-entrancy deadlocks if the user calls back into the SDK.
- Pending entries are registered BEFORE the v1 packet is sent so that
  a response delivered synchronously during the publish path can match
  (spec tests 11 and 12).
- Oversized v1 payloads are rejected synchronously with
  LIVEKIT_RPC_RESULT_REQUEST_PAYLOAD_TOO_LARGE (spec test 20).
- Late ack/response after a timeout fires are silently dropped because
  the entry is gone (spec test 13).
- Participant DISCONNECT in livekit.c now also calls
  rpc_client_manager_on_participant_disconnect, which fails all pending
  requests targeted at the disconnected identity with
  RECIPIENT_DISCONNECTED (spec tests 10, 23).
- on_eng_data_packet routes RPC_ACK and RPC_RESPONSE to the new
  client manager (server only handles RPC_REQUEST now).

v2 transport for the client is still a TODO -- when the peer advertises
client_protocol=1 we currently fall back to v1 packets. Switching the
outgoing path to a data stream and consuming the response stream is the
next chunk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the destination peer advertises client_protocol=1
(DATA_STREAM_RPC), the client manager now sends the request as a text
data stream on topic lk.rpc_request with attributes
  - lk.rpc_request_id
  - lk.rpc_request_method
  - lk.rpc_request_response_timeout_ms
  - lk.rpc_request_version (2)
The ack mechanism is unchanged (RpcAck packet, same timer logic), so
v2 callers behave identically to v1 until the response transport
diverges. Oversized payloads are no longer rejected when both peers
are v2 -- the 15 KB cap only applies on the v1 path.

Reception of v2 responses goes through a new lk.rpc_response data
stream handler registered by the room layer. The handler forwards
header / chunk / trailer events to rpc_client_manager, which:
- on header arrival: reads lk.rpc_request_id from the headers
  attributes, validates the senders identity against the request
  destination (spec test 14), binds the stream to the pending entry,
  and stops the ack timer
- on chunk arrival: appends to a per-entry response buffer with a
  doubling realloc strategy
- on stream close: removes the entry from the map, NUL-terminates
  the accumulated payload, delivers it as an OK result, frees state

Error responses still arrive as RpcResponse packets (spec mandates
this even for v2-to-v2) and continue to be handled by the existing
RpcResponse packet path. v1 fallback is automatic when the peer is
unknown or advertises client_protocol=0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Server-side counterpart to chunk 8. The room layer now registers a
data stream handler on topic lk.rpc_request that forwards header /
chunk / trailer events to rpc_server_manager.

Receive path:
- on header arrival, read lk.rpc_request_id, lk.rpc_request_method,
  lk.rpc_request_version and lk.rpc_request_response_timeout_ms from
  attributes. The RpcAck packet is sent immediately, BEFORE any
  handler lookup or version check, so the callers ack contract holds
  even for unsupported methods (spec test 5).
- if the version attribute is not 2, reply with UNSUPPORTED_VERSION
  as a v1 RpcResponse packet.
- otherwise allocate an in_flight entry keyed by stream id; chunks
  append to its payload accumulator.
- on stream close, look up the handler by method. Missing handler
  yields UNSUPPORTED_METHOD as a v1 packet. Otherwise the handler
  is invoked synchronously with the NUL-terminated payload.

Response transport now follows the request transport. An invocation
context (manager + caller identity + caller_is_v2) is plumbed
through invocation->ctx so the on_result callback can choose:
- error result (any code != OK): always v1 RpcResponse packet,
  even between two v2 peers (spec mandate)
- success result from a v1 request: v1 RpcResponse packet (the
  15 KB cap still applies on this path)
- success result from a v2 request: a fresh lk.rpc_response data
  stream addressed to the caller with lk.rpc_request_id as an
  attribute. No 15 KB cap.

The v1 packet path is refactored onto the same dispatch_to_handler /
on_result plumbing as the v2 path so future behavior changes (auth,
metrics) land in one place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds test_app/main/test_rpc.c covering a representative subset of the
specs required tests. The tests drive rpc_client_manager and
rpc_server_manager directly with stub send_packet / get_peer_protocol
/ on_result callbacks plus a real data_stream_writer wired up to the
same stub, so they exercise the wire-encoding path end to end without
needing a connected room.

Tests in this initial pass:
- rpc_client_manager
  * v1 invoke happy path (request, ack, response, result)
  * v1 payload too large rejected synchronously (spec test 20)
  * v1 error response propagates code and message (spec test 22)
  * participant disconnect fails pending requests (spec tests 10, 23)
  * v2 invoke opens a stream and not a v1 packet (spec test 1)
  * v2 large payload is not rejected and is split into chunks
    (spec test 2)
  * v2 response from wrong sender is ignored (spec test 14)
- rpc_server_manager
  * v1 request dispatches to handler and sends ack before invoking
  * v1 unknown method still acks then sends UNSUPPORTED_METHOD
    (spec test 5 in its v1 variant)
  * handler error becomes a v1 RpcResponse packet (spec tests 7, 19)
  * v2 request acks immediately on header arrival, dispatches on
    close, sends a lk.rpc_response data stream on success
    (spec tests 3, 5)

Deliberately deferred (need long timer waits or finer hooks):
- ack timeout / response timeout / late ack (spec tests 8, 13, 21)
- fast-remote and publish-window races (spec tests 11, 12)
- concurrent invocations (spec test 15)
- the full v2 caller round-trip with response payload verification --
  the captured DataPacket cannot recover the request_id from the
  encoded attribute callback; future work could expose a peek API
  on the client manager or decode attributes via a second-pass

The test_app components CMakeLists picks up ../../core and
../../protocol as include dirs and adds nanopb to PRIV_REQUIRES so it
can include the private manager headers. A comment in CMakeLists
notes this is a deliberate test-only pattern. The test_app component
also adds livekit/nanopb to its idf_component.yml dependencies for
the same reason.

All new tests are tagged [basic] so the existing pytest harness
(dut.run_all_single_board_cases(group="basic")) picks them up
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a small standalone CMake project under components/livekit/host_test
that builds a slice of the SDK for Linux and exercises
protocol_data_packet_extract_attributes -- the manual second-pass
parser added in chunk 4c for RPC v2 attribute decoding. The parser is
the trickiest piece of new wire-format code in this branch and the
one most worth catching in CI without flashing a board.

The host build compiles only:
- the nanopb runtime (third_party/nanopb/src)
- generated .pb.c files for the protobuf types
- core/protocol.c
- minimal stubs for esp_log.h and cJSON (cJSON is only called from
  signal-channel helpers the tests do not reach)

Three test cases:
- three attributes encoded via the existing nanopb callback path,
  decoded back via extract_attributes, every key/value verified
- header with no attributes -> empty result
- one attribute with a 1023-byte value -> round-tripped intact

The full rpc_client_manager / rpc_server_manager test suite is not
hosted here -- the managers depend on FreeRTOS xTimer / xSemaphore
and esp_timer, which would need either a FreeRTOS POSIX port or
fresh shims. Those tests continue to run on real ESP32 hardware via
the existing pytest-embedded path; host_test/README.md documents
the limitation and a future direction.

A new .github/workflows/host_test.yml runs the build + ctest on every
PR, called from ci.yml alongside the existing IDF cross-compile build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shim files use informal banner comments instead of the formal
Apache 2.0 header that addlicense -check expects, so the license
job in CI flagged them. Reformatting the existing rationale comment
to sit below a standard LiveKit Apache 2.0 block keeps both the
attribution and the docs-comment intent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@1egoman
Copy link
Copy Markdown
Collaborator Author

1egoman commented May 28, 2026

According to the LLM, this is "done". @ladvoc Passing this off to you to get it over the line!

@1egoman 1egoman requested a review from ladvoc May 28, 2026 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant