Skip to content

feat: add byte tracking#745

Draft
OGKevin wants to merge 1 commit into
servo:mainfrom
OGKevin:ogkevin/push/uttvvtlxonpy
Draft

feat: add byte tracking#745
OGKevin wants to merge 1 commit into
servo:mainfrom
OGKevin:ogkevin/push/uttvvtlxonpy

Conversation

@OGKevin

@OGKevin OGKevin commented Jun 16, 2026

Copy link
Copy Markdown

This commit introduced a new feature flag that enables the ability to track byte offsets.

When the source-positions feature is enabled, the tokenizer will track the number of UTF-8 bytes consumed from the input. This is done by giving BufferQueue a bytes_consumed field, which is incremented every time a character is consumed.

The changes in cargo.toml were needed to make this project load as a git submodule in the Cadmus project.

The xhtml-self-closing feature was needed due to EPUBs using XHTML-compatible self-closing on RCDATA/RAWTEXT elements. (baskerville/plato#426 (comment))

Change-Id: 566446e2bca101b7fefdca639c1b4d26
Change-Id-Short: uttvvtlxonpy


🤖 was used to generate some of the rustdocs and tests. Was also used to educate me on the project and brainstorm implementation path.


I have a local branch in Cadmus that added this project as a submodule, works great and seems to not break anything, need to do some more testing by daily driving and asking others to test. (Will cross link once I open PR there, needs a tiny bit of more work)

I'll leave this in draft, so it won't get merged by accident, but I'm ready to get feedback on it. Will mark it as ready once I finish with the changes in Cadmus.


Closes: #734

Ref:

@github-actions github-actions Bot added the V-non-breaking A non-breaking change label Jun 16, 2026
This commit introduced a new feature flag that enables the ability to
track byte offsets.

When the source-positions feature is enabled, the tokenizer will track
the number of UTF-8 bytes consumed from the input so far. This is done
by giving BufferQueue a `bytes_consumed` field, which is incremented
every time a character is consumed.

The changes in cargo.toml were needed to make this project load as a git
submodule in the Cadmus project.

The xhtml-self-closing feature was needed due to EPUBs using
XHTML-compatible self-closing on RCDATA/RAWTEXT elements.

Change-Id: 566446e2bca101b7fefdca639c1b4d26
Change-Id-Short: uttvvtlxonpy
@OGKevin OGKevin force-pushed the ogkevin/push/uttvvtlxonpy branch from a332771 to 0d70705 Compare June 16, 2026 13:01
@github-actions github-actions Bot added V-non-breaking A non-breaking change and removed V-non-breaking A non-breaking change labels Jun 16, 2026
@simonwuelker

Copy link
Copy Markdown
Member

🤖 was used to generate some of the rustdocs and tests. Was also used to educate me on the project and brainstorm implementation path.

Note that servo does not accept AI-generated contributions: https://book.servo.org/contributing/getting-started.html#ai-contributions.

Admittedly we should do a better job of communicating that this applies to all repositories under the servo org, not just servo/servo.

@OGKevin

OGKevin commented Jun 17, 2026

Copy link
Copy Markdown
Author

Ahh, that might make things interesting now. I wasn't aware, mybad.

What now? As I see 2 solutions:

  1. Remove the tests and rustdocs, as thats the easiest way to remove all 🤖 touched code, but would also significantly lower the quality of the PR.
  2. Re-write them, but I'm now biased to what I've seen, reviewed and agreed with, so my manual rewrite might not be significantly different. Unless you have some feedback on it that makes it look diff. However, how would I then proof that was scaffolded by 🤖 is now 100% manually typed?

The tests turned out to be useful as they caught some gaps in the implementation, but since they've "served their purpose" I'm fine with removing them. As In this specific situation, I've similar tests in Cadmus that covers similar cases.

Do note that tests are the biggest diff:
image


I'm not challenging the usage of 🤖, just not sure how to move forward. Unfortunately, my daily life has come to 🤖 code reviewer. So I take the small manual implementations where I can, was just lazy when it came to tests and rustdocs 🙉 😭.

@simonwuelker

simonwuelker commented Jun 17, 2026

Copy link
Copy Markdown
Member

I'd be fine with you rewriting the tests and rustdoc.

but I'm now biased to what I've seen, reviewed and agreed with, so my manual rewrite might not be significantly different.

That's okay.

However, how would I then proof that was scaffolded by 🤖 is now 100% manually typed?

You don't need to prove that. We rely on the honesty of contributors when rejecting AI generated patches (unless the patch is obviously slop). Thanks for being upfront about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

V-non-breaking A non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Byte-accurate source positions in TreeSink

2 participants