Skip to content

Keep StrBuilder.reverse from splitting surrogate pairs#1730

Merged
garydgregory merged 1 commit into
apache:masterfrom
alhudz:strbuilder-reverse-surrogate
Jun 25, 2026
Merged

Keep StrBuilder.reverse from splitting surrogate pairs#1730
garydgregory merged 1 commit into
apache:masterfrom
alhudz:strbuilder-reverse-surrogate

Conversation

@alhudz

@alhudz alhudz commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Repro: new StrBuilder("a" + "😀" + "b").reverse() returns b\uDE00\uD83Da (a low surrogate ahead of its high surrogate), where new StringBuilder(...).reverse() returns b😀a.
Cause: reverse() swaps the buffer one char at a time, so every surrogate pair is left in low-high order and the supplementary code point becomes malformed UTF-16.
Fix: after the char swap, walk the buffer once and swap each adjacent low-high surrogate pair back to high-low, matching StringBuilder/StringBuffer which this class is documented to mimic. BMP text, lone surrogates and odd-length buffers are untouched.

@garydgregory garydgregory changed the title keep StrBuilder.reverse from splitting surrogate pairs Keep StrBuilder.reverse from splitting surrogate pairs Jun 25, 2026
@garydgregory

Copy link
Copy Markdown
Member

Hello @alhudz , the CI is running... please port to Apache Commons Text as this class is deprecated here. TY!

@garydgregory garydgregory merged commit 64c05dc into apache:master Jun 25, 2026
20 of 21 checks passed
@alhudz

alhudz commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

Done — ported to Commons Text in apache/commons-text#756, same surrogate-pair restore on TextStringBuilder.reverse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants