Skip to content

Keep WordUtils.wrap from splitting a surrogate pair#755

Merged
garydgregory merged 1 commit into
apache:masterfrom
alhudz:wordutils-wrap-surrogate
Jun 26, 2026
Merged

Keep WordUtils.wrap from splitting a surrogate pair#755
garydgregory merged 1 commit into
apache:masterfrom
alhudz:wordutils-wrap-surrogate

Conversation

@alhudz

@alhudz alhudz commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Port of apache/commons-lang#1731 to the Commons Text copy of WordUtils, as requested by @garydgregory.

WordUtils.wrap(str, wrapLength, newLineStr, true) hard-breaks a too-long word at the fixed char offset wrapLength + offset and inserts the new line there. When that offset lands between the high and low surrogate of a supplementary code point the pair is split, so a lossless wrap emits a lone high surrogate at the end of one line and a lone low surrogate at the start of the next.

Repro: WordUtils.wrap("a😀😀😀😀", 4, "\n", true) (a then four U+1F600).
Before: a😀\uD83D \n \uDE00😀\uD83D \n \uDE00, i.e. the 2nd and 4th emoji are split around the \n.
After: a😀😀\n😀😀, no lone surrogates.

Fix: nudge the break one char forward when it would land inside a pair, so the whole code point stays on the current line. BMP input and the delimiter-based wrap paths are unaffected, and the other wrap overloads delegate to this method.

Added assertions to WordUtilsTest#testWrap_StringIntStringBoolean that fail on the current tree and pass with the fix.

  • Read the contribution guidelines for this project.
  • Read the ASF Generative Tooling Guidance if you use Artificial Intelligence (AI).
  • I used AI to create any part of, or all of, this pull request. Which AI tool was used to create this pull request, and to what extent did it contribute?
  • Run a successful build using the default Maven goal with mvn; that's mvn on the command line by itself.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied. This may not always be possible, but it is a best practice.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body. Note that a maintainer may squash commits during the merge process.

@garydgregory garydgregory merged commit 6e8da45 into apache:master Jun 26, 2026
10 checks passed
@garydgregory

Copy link
Copy Markdown
Member

Thank you @alhudz , merged 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants