Skip to content

Keep WordUtils.wrap from splitting a surrogate pair#1731

Merged
garydgregory merged 1 commit into
apache:masterfrom
alhudz:wordutils-wrap-surrogate
Jun 25, 2026
Merged

Keep WordUtils.wrap from splitting a surrogate pair#1731
garydgregory merged 1 commit into
apache:masterfrom
alhudz:wordutils-wrap-surrogate

Conversation

@alhudz

@alhudz alhudz commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Repro: WordUtils.wrap("a😀😀😀😀", 4, "\n", true), four U+1F600 after a leading a.
Cause: with wrapLongWords set, a word longer than the column is hard-broken at the fixed char offset wrapLength + offset and the new-line is inserted there. When that offset lands between the high and low surrogate of a supplementary code point the pair is split, so a lossless wrap emits a lone high surrogate at the end of one line and a lone low surrogate at the start of the next.
Fix: nudge the break one char forward when it would land inside a pair, keeping the whole code point on the current line. BMP input and the delimiter-based wrap paths are unaffected.

@garydgregory garydgregory changed the title keep WordUtils.wrap from splitting a surrogate pair Keep WordUtils.wrap from splitting a surrogate pair Jun 25, 2026
@garydgregory garydgregory merged commit ae9c787 into apache:master Jun 25, 2026
20 of 21 checks passed
@garydgregory

Copy link
Copy Markdown
Member

Merged, TY @alhudz, please port to Apache Commons Text.

@alhudz

alhudz commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Ported in apache/commons-text#755 - same fix, the wrap test fails on the current Text tree without it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants