Skip to content

Keep TextStringBuilder.reverse from splitting surrogate pairs#756

Merged
garydgregory merged 1 commit into
apache:masterfrom
alhudz:textstringbuilder-reverse-surrogate
Jun 27, 2026
Merged

Keep TextStringBuilder.reverse from splitting surrogate pairs#756
garydgregory merged 1 commit into
apache:masterfrom
alhudz:textstringbuilder-reverse-surrogate

Conversation

@alhudz

@alhudz alhudz commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Port of apache/commons-lang#1730 to the Commons Text copy of TextStringBuilder, as requested by @garydgregory.

TextStringBuilder.reverse() swaps the buffer one char at a time, so every surrogate pair is left in low-high order and a supplementary code point becomes malformed UTF-16. StringBuilder/StringBuffer, which this class is documented to mimic, reverse the same input correctly.

Repro: new TextStringBuilder("a😀b").reverse().toString() (a, U+1F600, b).
Before: b\uDE00\uD83Da, a low surrogate ahead of its high surrogate.
After: b😀a, matching new StringBuilder("a😀b").reverse().

Fix: after the char swap, walk the buffer once and swap each adjacent low-high surrogate pair back to high-low, gated on whether any surrogate was seen during the swap. BMP text, lone unpaired surrogates and odd-length buffers are untouched.

Added TextStringBuilderTest#testReverseSurrogatePairs, which fails on the current tree and passes with the fix.

  • Read the contribution guidelines for this project.
  • Read the ASF Generative Tooling Guidance if you use Artificial Intelligence (AI).
  • I used AI to create any part of, or all of, this pull request. Which AI tool was used to create this pull request, and to what extent did it contribute?
  • Run a successful build using the default Maven goal with mvn; that's mvn on the command line by itself.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied. This may not always be possible, but it is a best practice.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body. Note that a maintainer may squash commits during the merge process.

@garydgregory garydgregory merged commit f6db25e into apache:master Jun 27, 2026
10 checks passed
@garydgregory

Copy link
Copy Markdown
Member

Thank you @alhudz , merged 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants