feat: Publish OTel process context#324
Conversation
Imported `otel_process_ctx.c` (renamed to cpp) and `otel_process_ctx.h` from <https://github.com/open-telemetry/sig-profiling/tree/5a2e119ce66ad82f8ae7c6204599acce6ed69cde/process-context/c-and-cpp> with no changes.
**What does this PR do?** This PR adds support for publishing the OpenTelemetry Process Context (aka OTEP 4719 -- <https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/profiles/4719-process-ctx.md>). This mechanism is very similar to Datadog's existing "process discovery"/"tracer-info" mechanism, so the actual change in `trace.cpp` is very minimal (this is by design :p ). This data will be read by OTel-compliant tools, of which the eBPF Profiler (which Datadog will also support as part of our Continuous Profiler product) is the first one. **Motivation:** We're adding support for OTEP 4719 to all Datadog SDKs, so dd-trace-cpp is actually one of the last ones to adopt this. See <https://docs.google.com/document/d/1IwjjVJzEChcFPcnVV2N5Kkjg-4_Q4v4Q3ojpxntbdvY/edit?pli=1&tab=t.s7dfru3a9hnl> (Datadog-only link) for an up-to-date list of all SDKs. **Additional Notes:** The `otel_process_ctx.cpp` and `otel_process_ctx.h` come from the reference implementation we're maintaining upstream with OTel in <https://github.com/open-telemetry/sig-profiling/tree/main/process-context/c-and-cpp>. They are already in use in dd-trace-java as well. Review-wise, my suggestion for those would be to focus on correctness -- ideally we want to be able to quickly update to the latest version when needed, which is made easier if we have a minimal delta on upstream. (I'm the one maintaining this upstream so I can propagate fixes/changes as well so we don't have to diverge). **How to test the change?** This change includes test coverage. Also, I've tested this in practice using the `otel_process_ctx_dump.sh` script from the https://github.com/open-telemetry/sig-profiling/ repo. Here's how that looks: ``` $ sudo ./otel_process_ctx_dump.sh 2394947 Found OTEL context for PID 2394947 Start address: 7c072aace000 00000000 4f 54 45 4c 5f 43 54 58 02 00 00 00 c3 01 00 00 |OTEL_CTX........| 00000010 68 5b 15 20 9a 25 21 00 60 f0 6e 16 c5 5b 00 00 |h[. .%!.`.n..[..| 00000020 Parsed struct: otel_process_ctx_signature : "OTEL_CTX" otel_process_ctx_version : 2 otel_process_payload_size : 451 otel_process_monotonic_published_at_ns : 9330018124913512 otel_process_payload : 0x00005bc5166ef060 Payload dump (451 bytes): 00000000 0a af 02 0a 21 0a 1b 64 65 70 6c 6f 79 6d 65 6e |....!..deploymen| 00000010 74 2e 65 6e 76 69 72 6f 6e 6d 65 6e 74 2e 6e 61 |t.environment.na| 00000020 6d 65 12 02 0a 00 0a 3d 0a 13 73 65 72 76 69 63 |me.....=..servic| 00000030 65 2e 69 6e 73 74 61 6e 63 65 2e 69 64 12 26 0a |e.instance.id.&.| 00000040 24 38 32 65 39 38 38 38 64 2d 33 36 65 62 2d 34 |$82e9888d-36eb-4| 00000050 39 63 66 2d 61 63 63 38 2d 36 38 32 39 34 39 63 |9cf-acc8-682949c| 00000060 34 39 32 37 33 0a 20 0a 0c 73 65 72 76 69 63 65 |49273. ..service| 00000070 2e 6e 61 6d 65 12 10 0a 0e 63 70 70 2d 65 78 70 |.name....cpp-exp| 00000080 65 72 69 6d 65 6e 74 0a 15 0a 0f 73 65 72 76 69 |eriment....servi| 00000090 63 65 2e 76 65 72 73 69 6f 6e 12 02 0a 00 0a 1f |ce.version......| 000000a0 0a 16 74 65 6c 65 6d 65 74 72 79 2e 73 64 6b 2e |..telemetry.sdk.| 000000b0 6c 61 6e 67 75 61 67 65 12 05 0a 03 63 70 70 0a |language....cpp.| 000000c0 21 0a 15 74 65 6c 65 6d 65 74 72 79 2e 73 64 6b |!..telemetry.sdk| 000000d0 2e 76 65 72 73 69 6f 6e 12 08 0a 06 76 32 2e 31 |.version....v2.1| 000000e0 2e 31 0a 24 0a 12 74 65 6c 65 6d 65 74 72 79 2e |.1.$..telemetry.| 000000f0 73 64 6b 2e 6e 61 6d 65 12 0e 0a 0c 64 64 2d 74 |sdk.name....dd-t| 00000100 72 61 63 65 2d 63 70 70 0a 0f 0a 09 68 6f 73 74 |race-cpp....host| 00000110 2e 6e 61 6d 65 12 02 0a 00 0a 17 0a 0c 63 6f 6e |.name........con| 00000120 74 61 69 6e 65 72 2e 69 64 12 07 0a 05 31 32 39 |tainer.id....129| 00000130 37 30 12 8e 01 0a 14 64 61 74 61 64 6f 67 2e 70 |70.....datadog.p| 00000140 72 6f 63 65 73 73 5f 74 61 67 73 12 76 0a 74 65 |rocess_tags.v.te| 00000150 6e 74 72 79 70 6f 69 6e 74 2e 62 61 73 65 64 69 |ntrypoint.basedi| 00000160 72 3a 62 75 69 6c 64 2c 65 6e 74 72 79 70 6f 69 |r:build,entrypoi| 00000170 6e 74 2e 77 6f 72 6b 64 69 72 3a 63 70 70 2d 65 |nt.workdir:cpp-e| 00000180 78 70 65 72 69 6d 65 6e 74 2c 65 6e 74 72 79 70 |xperiment,entryp| 00000190 6f 69 6e 74 2e 74 79 70 65 3a 65 78 65 63 75 74 |oint.type:execut| 000001a0 61 62 6c 65 2c 65 6e 74 72 79 70 6f 69 6e 74 2e |able,entrypoint.| 000001b0 6e 61 6d 65 3a 63 70 70 2d 65 78 70 65 72 69 6d |name:cpp-experim| 000001c0 65 6e 74 |ent| 000001c3 Protobuf decode: resource { attributes { key: "deployment.environment.name" value { string_value: "" } } attributes { key: "service.instance.id" value { string_value: "82e9888d-36eb-49cf-acc8-682949c49273" } } attributes { key: "service.name" value { string_value: "cpp-experiment" } } attributes { key: "service.version" value { string_value: "" } } attributes { key: "telemetry.sdk.language" value { string_value: "cpp" } } attributes { key: "telemetry.sdk.version" value { string_value: "v2.1.1" } } attributes { key: "telemetry.sdk.name" value { string_value: "dd-trace-cpp" } } attributes { key: "host.name" value { string_value: "" } } attributes { key: "container.id" value { string_value: "12970" } } } extra_attributes { key: "datadog.process_tags" value { string_value: "entrypoint.basedir:build,entrypoint.workdir:cpp-experiment,entrypoint.type:executable,entrypoint.name:cpp-experiment" } } ```
|
🎯 Code Coverage (details) 🔗 Commit SHA: e2ab668 | Docs | Datadog PR Page | Give us feedback! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2db7a3588b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
BenchmarksBenchmark execution time: 2026-06-22 08:04:55 Comparing candidate commit e2ab668 in PR branch Found 1 performance improvements and 2 performance regressions! Performance is the same for 5 metrics, 0 unstable metrics.
|
This makes it more clear where they came from so in the future we can keep them up-to-date/in-sync.
| const std::shared_ptr<const IDGenerator>& generator); | ||
|
|
||
| // Drops all state, including process discovery and process context. | ||
| ~Tracer(); |
There was a problem hiding this comment.
Declaring a user-defined destructor suppresses the implicit move constructor and move assignment.
Moreover, the implicit copy is now problematic because it will lead to several calls to otel_process_ctx_drop_current(). More precisely, the problem is that:
otel_process_ctx_publish()is called from the constructor (viastore_config(), line 228);otel_process_ctx_drop_current()is called in then destructor (line 124).
They operate on a process-global singleton (published_state in otel_process_ctx.cpp). So the destructor's drop is global, and it fires for every Tracer instance destroyed, including copies and temporaries.
This notably breaks the implementation of nginx-datadog that uses move and return by value on Tracer.
I suggest adding a move-only RAII guard, held by a unique_ptr, so that to transfer the ownership on move. Copy constructor and assignment should be deleted. This is just an idea, maybe a better solution could come from thinking about the whole Otel process context handling.
By the way, TEST_TRACER("move semantics"), line 1885 of test_tracer.cpp is wrong: when the move constructor doesn't exist, it binds to the copy constructor. This test should be fixed!
There was a problem hiding this comment.
Ughhh thanks for spotting this, my C++-fu is not strong enough to have spotted this one.
In c561524 I've added a guard and a mutex that enforces thread-safe last-writer-wins semantics for the process context.
And in 0575cde I've tightened the "move semantics" test to not surprise us again in the future.
| { | ||
| Tracer tracer{*finalized}; | ||
|
|
||
| auto read_result = otel_process_ctx_read(); |
There was a problem hiding this comment.
I think we should also test the published context from the Tracer, so that to check the Tracer→context field mapping.
There was a problem hiding this comment.
I'm not sure I understood this suggestion -- do you mean checking the contents of the "datadog-tracer-info-somethingsomething" file as well, or something else?
(In particular, that code path is already covered I believe in test_tracer.cpp?)
Before this change, Tracer had a user-declared destructor that called
otel_process_ctx_drop_current() directly. That caused two problems:
1. Declaring ~Tracer() suppresses the implicit move constructor and move
assignment, while leaving copy operations in place. Tracer became
accidentally copyable and non-movable -- the exact wrong shape for a
class that owns a process-wide singleton. Callers like nginx-datadog
that return Tracer by value silently bound to the copy ctor, and the
source's destructor then dropped the global context out from under
the live copy.
2. otel_process_ctx_publish / otel_process_ctx_drop_current are
documented as not thread-safe. Concurrent construction or
destruction of Tracers across threads would race on the global
published_state.
This change introduces OtelCtxGuard, a small RAII type that owns the
published OpenTelemetry process context, and a
publish_otel_process_ctx()
factory that returns one. Tracer now holds the guard via
std::unique_ptr<OtelCtxGuard>, which gives us:
- Move-only Tracer. Move ops are implicitly defaulted (the previous
explicit ~Tracer() body is gone) and copy ops are implicitly deleted
through the unique_ptr member. A failed or skipped publish is
represented as a null guard, so the destructor is correctly a no-op
in that case.
- Symmetric publish/drop encapsulation. Both directions through the
C API go through OtelCtxGuard, and tracer.cpp no longer references
otel_process_ctx_publish / otel_process_ctx_drop_current directly.
- Thread safety. A mutex internal to the guard serializes publish and
drop, so concurrent Tracer construction/destruction across threads
is safe.
Multi-instance behavior is "last writer wins": the C API holds at most
one published context per process, and any guard destruction drops
whatever is current. This matches the common case (one Tracer per
process) and is documented at the top of otel_process_ctx_guard.h.
ivoanjo
left a comment
There was a problem hiding this comment.
First of all! Thanks for the through + quick review and big thanks for the patience with me taking such a long time to get back to this. Past few days have been very busy and it took me a bit to address all the feedback
This makes the Tracer no more thread safe. I'm not sure if this is fine or not.
We should investigate deeper about this, and at least publicize clearly this constraint.
Good point. On top of the RAII guard, I've added a mutex to ensure thread-safety.
| { | ||
| Tracer tracer{*finalized}; | ||
|
|
||
| auto read_result = otel_process_ctx_read(); |
There was a problem hiding this comment.
I'm not sure I understood this suggestion -- do you mean checking the contents of the "datadog-tracer-info-somethingsomething" file as well, or something else?
(In particular, that code path is already covered I believe in test_tracer.cpp?)
| const std::shared_ptr<const IDGenerator>& generator); | ||
|
|
||
| // Drops all state, including process discovery and process context. | ||
| ~Tracer(); |
There was a problem hiding this comment.
Ughhh thanks for spotting this, my C++-fu is not strong enough to have spotted this one.
In c561524 I've added a guard and a mutex that enforces thread-safe last-writer-wins semantics for the process context.
And in 0575cde I've tightened the "move semantics" test to not surprise us again in the future.
Description
This PR adds support for publishing the OpenTelemetry Process Context (aka OTEP 4719 -- https://github.com/open-telemetry/opentelemetry-specification/blob/main/oteps/profiles/4719-process-ctx.md).
This mechanism is very similar to Datadog's existing "process discovery"/"tracer-info" mechanism, so the actual change in
trace.cppis very minimal (this is by design :p ).This data will be read by OTel-compliant tools, of which the eBPF Profiler (which Datadog will also support as part of our Continuous Profiler product) is the first one.
Motivation
We're adding support for OTEP 4719 to all Datadog SDKs, so dd-trace-cpp is actually one of the last ones to adopt this.
See https://docs.google.com/document/d/1IwjjVJzEChcFPcnVV2N5Kkjg-4_Q4v4Q3ojpxntbdvY/edit?pli=1&tab=t.s7dfru3a9hnl (Datadog-only link) for an up-to-date list of all SDKs.
Additional Notes
Jira ticket: PROF-14789
The
otel_process_ctx.cppandotel_process_ctx.hcome from the reference implementation we're maintaining upstream with OTel in https://github.com/open-telemetry/sig-profiling/tree/main/process-context/c-and-cpp.They are already in use in dd-trace-java as well.
Review-wise, my suggestion for those would be to focus on correctness -- ideally we want to be able to quickly update to the latest version when needed, which is made easier if we have a minimal delta on upstream.
(I'm the one maintaining this upstream so I can propagate fixes/changes as well so we don't have to diverge).
How to test the change?
This change includes test coverage. Also, I've tested this in practice using the
otel_process_ctx_dump.shscript from the https://github.com/open-telemetry/sig-profiling/ repo.Here's how that looks: