Skip to content

Fix WordUtils.containsAllWords missing words across line breaks#1732

Merged
garydgregory merged 1 commit into
apache:masterfrom
alhudz:wordutils-containsallwords-newline
Jun 25, 2026
Merged

Fix WordUtils.containsAllWords missing words across line breaks#1732
garydgregory merged 1 commit into
apache:masterfrom
alhudz:wordutils-containsallwords-newline

Conversation

@alhudz

@alhudz alhudz commented Jun 25, 2026

Copy link
Copy Markdown
Contributor
  1. containsAllWords builds the per-word regex .*\b<word>\b.* and runs it with matches(), but compiles it without Pattern.DOTALL, so . never matches a line terminator.
  2. matches() has to consume the whole input, so a word sitting on a different line than the surrounding .* can reach is never found.

Repro: WordUtils.containsAllWords("foo\nbar", "bar")
Expected: true (bar is present as a whole word)
Actual: false
Fix: compile the pattern with Pattern.DOTALL so the leading and trailing .* span line breaks. The \b anchors are unchanged, so single-line results and the documented examples behave the same and no false positive is introduced.

The same regex is in commons-text WordUtils.containsAllWords.

  • Read the contribution guidelines for this project.
  • Read the ASF Generative Tooling Guidance if you use Artificial Intelligence (AI).
  • I used AI to create any part of, or all of, this pull request. Which AI tool was used to create this pull request, and to what extent did it contribute?
  • Run a successful build using the default Maven goal with mvn; that's mvn on the command line by itself.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied. This may not always be possible, but it is a best practice.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body. Note that a maintainer may squash commits during the merge process.

@garydgregory garydgregory changed the title fix WordUtils.containsAllWords missing words across line breaks Fix WordUtils.containsAllWords missing words across line breaks Jun 25, 2026
@garydgregory

Copy link
Copy Markdown
Member

Hello @alhudz
Please port to Apache Commons Text.
Thank you!

@garydgregory garydgregory merged commit 6c4269e into apache:master Jun 25, 2026
20 of 21 checks passed
@alhudz

alhudz commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Ported in apache/commons-text#754 — same Pattern.DOTALL fix and a matching regression test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants