perf(relationships): bulk tree persistence, lean hydration and delta-only reindex when saving related content by jcastro-dotcms · Pull Request #36149 · dotCMS/core

jcastro-dotcms · 2026-06-12T18:05:17Z

This PR fixes: #36147

🎯 Problem

Saving a contentlet whose Content Type has several MANY_TO_MANY relationship fields became unusably slow as related-content volume grew: on a real-world dataset (~1,500 parents × ~20 children each, so each child relates back to ~350 parents), a single Save/Publish took from 40 seconds up to several minutes and fired tens of thousands of JDBC queries. Three compounding root causes:

Per-row relationship persistence — every related record cost a getTree + saveTree pair (2–3 queries each) plus one DELETE per row.
Unnecessary full hydration — the save path loaded complete contentlets (fields, permissions, version info) when it only needed identifiers or a count.
Over-broad dependency reindexing — dependenciesLeftToReindex() re-queued every currently-related contentlet for reindexing on every save, because the old-state/new-state comparison read the new DB state on both sides. Saving one child re-indexed hundreds of unchanged parents, each fully hydrated and remapped.

✨ Proposed Changes

Relationship persistence — bulk, batched, and permission-aware (ESContentletAPIImpl, TreeFactory)

deleteRelatedContent now splits into two paths: a fast identifier-only path for the system user / CMS Admins (single bulk DELETE, zero hydration, only removed parents loaded for reindex), and a permission-scoped path for limited users that preserves the legacy contract — relationships to content the user cannot READ are never deleted (bulk delete when the filter removes nothing; chunked IN deletes otherwise).
relateContent collects all rows and persists them through a new TreeFactory.insertTrees: a duplicate-safe batched upsert (dedup by the (child, parent, relation_type) PK, batched DELETE+INSERT wrapped in LocalTransaction so it is atomic for any caller). Duplicate records in the payload (same identifier twice, multiple languages) no longer risk a PK violation.
The tree-order base comes from a single MAX(tree_order)+1 lookup over surviving rows (replacing a full getRelatedContentFromIndex hydration), queried only on the parent branch where it is actually used.

Hydration elimination (ESContentletAPIImpl, VersionableFactory/Impl)

Related identifiers are read straight from the tree table via new TreeFactory identifier lookups — no contentlet hydration to compute delete scopes or counts.
Removed parents that genuinely need reindexing are loaded through a new batched, uncached VersionableFactory.findAllContentletVersionInfos(Collection) (chunked IN, 500 ids/statement) + the existing batch findContentlets, instead of one lookup per identifier. All SQL lives in factory classes per house architecture.

Delta-only dependency reindexing (ESMappingAPIImpl)

dependenciesLeftToReindex() now compares the search index (pre-save state) against the tree table (post-save state) and returns their true symmetric difference: only relationships actually added or removed. Unchanged parents are never re-queued — this is the biggest win for heavily related datasets.
Identifiers are deduped (the index holds one document per language/variant), so multilingual content no longer triggers spurious reindexing; a scroll-search fallback removes the silent 10,000-document truncation.

Hardening (review feedback)

assertTreeColumn allowlist guard on every TreeFactory helper that interpolates a column name into SQL (only parent/child accepted) — enforces what the Semgrep findings flagged conventionally.
Javadoc pass: class-level doc for TreeFactory, contracts on the bulk deletes (caller-enforced permissions), and a full explanation of the new reindex-delta model on dependenciesLeftToReindex.

✅ Checklist

Tests
Translations (N/A — backend only)
Security Implications Contemplated (add notes if applicable)

Security notes: permission semantics are preserved and now covered by an integration test (a limited user's save cannot delete relationships to content they cannot READ — this also fixes a data-loss risk present in the first draft of this PR). The Semgrep CUSTOM_INJECTION-2 findings are false positives (concatenated fragments are compile-time literals or generated ?,?,... placeholders; all user data is bound parameters) and are additionally guarded by the new column-name allowlist.

🧪 Testing

11 new integration tests, all green:

TreeFactoryTest (new class, 7 tests): batch upsert dedup, pre-existing row replacement, scoped deletes, >500-id chunking, ordered id lookups, next-tree-order.
ESContentletAPIImplTest (+3): limited-user permission preservation, duplicate-records save, append-after-preserved ordering. Full class: 90/90 passing.
ESMappingAPITest (+2): reindex delta returns only added+removed parents; multilingual parents are not spuriously re-indexed.

ℹ️ Additional Info

Verified against a production-scale dataset: re-saving heavily related content drops from minutes to seconds, with query counts reduced by orders of magnitude (bulk statements + delta-only reindex instead of per-row queries + full-graph reindex).
A manual backport to 24.12.27 LTS is prepared in parallel (adapted for -source 11), including the test suite.
Behavioral notes for reviewers: orphan tree rows (pointing to fully-deleted content) are now cleaned up by the bulk delete; tree_order values are rebased per save (relative ordering preserved — consumers only ORDER BY tree_order).

Screenshots

N/A — backend-only performance change.

🤖 Generated with Claude Code

…ent-performance-improvements

claude · 2026-06-12T18:06:30Z

Claude finished @dsilvam's task in 3m 1s —— View job

I'll analyze this and get back to you.

jcastro-dotcms · 2026-06-12T18:08:31Z

NOTE TO DEVS

This new PR was created to make the branch match a GitHub ticket number. The initial PR containing the original feedback form Claude Code can be found in the previous version, localed here:

perf(relationships): bulk tree persistence, lean hydration and delta-only reindex when saving related content #36092

Just to be clear, this new PR contains the exact same code changes from the previous one.

github-actions · 2026-06-12T18:08:43Z

❌ Codex Review failed — `openai.gpt-5.5`

The review job failed before producing output. See the run for details.

_{Run: #27433876927}

semgrep-dotcms · 2026-06-12T18:08:49Z

Semgrep found 15 CUSTOM_INJECTION-2 findings:

dotCMS/src/main/java/com/dotmarketing/factories/TreeFactory.java
- L477-478 - Triage
- L477 - Triage
- L477 - Triage
- L477 - Triage
- L517-518 - Triage
- L517-518 - Triage
- L517-518 - Triage
- L579-581 - Triage
- L579-581 - Triage
- L579-580 - Triage
- 3 more - Triage
dotCMS/src/main/java/com/dotmarketing/business/VersionableFactoryImpl.java
- L476-477 - Triage
- L476-477 - Triage

The method identified is susceptible to injection. The input should be validated and properly
escaped.

If this is a critical or high severity finding, please also link this issue in the #security channel in Slack.

…many-relationship-fields-and-large-related-content-volumes-is-extremely-slow

github-actions · 2026-07-02T16:58:25Z

🤖 dotBot Review (Bedrock)

Reviewed 9 file(s); 8 candidate(s) → 3 confirmed, 0 uncertain (unverified, kept for review).

Confirmed findings

🟠 High dotCMS/src/main/java/com/dotmarketing/business/VersionableFactoryImpl.java:473 — Missing variant filter in version info query
The SQL query in VersionableFactoryImpl.java line 473 selects contentlet version info without filtering by variant_id. This will return multiple records per identifier when content exists in multiple variants, leading to incorrect version data being used for reindexing operations. The query's WHERE clause includes 'identifier in (?)' but lacks a variant_id condition, violating the assumption that identifiers are unique across variants.
🟡 Medium dotCMS/src/main/java/com/dotmarketing/business/VersionableFactory.java:291 — Missing implementation for new abstract method
The abstract method 'findAllContentletVersionInfos' was added to VersionableFactory but not implemented in VersionableFactoryImpl. Grep results show no implementation in the PR diff, which would cause AbstractMethodError when the method is called.
🟡 Medium dotCMS/src/main/java/com/dotmarketing/business/VersionableFactory.java:291 — Missing exception declaration in VersionableFactory
The method findAllContentletVersionInfos declares throws DotDataException but similar methods like findContentletVersionInfo throw both DotDataException and DotStateException. Implementation in VersionableFactoryImpl likely throws DotStateException which would violate the interface contract, risking uncaught exceptions.

_{us.deepseek.r1-v1:0 · Run: #28613778124 · tokens: in: 83017 · out: 17864 · total: 100881 · calls: 21 · est. ~$0.209}

Adding missing comma.

github-actions · 2026-07-02T17:15:48Z

+     * Returns the identifiers of the children related to the given parent under the given
+     * relation type, ordered by tree order. Unlike content-returning lookups, this method never
+     * hydrates contentlets — it reads the tree table only.
+     *


🟠 [High] Unhandled DotDataException in getNextTreeOrder

The method getNextTreeOrder calls dc.loadObjectResults().get(0).get('max_order') followed by getInt() without handling DotDataException. Since DotConnect operations can throw DotDataException, this method should either catch it or declare it in throws. The exception propagates to callers like insertTrees, which also doesn't catch it, leading to potential transaction rollbacks and unhandled errors.

…many-relationship-fields-and-large-related-content-volumes-is-extremely-slow

github-actions · 2026-07-02T18:49:32Z

+		final List<String> idList = List.copyOf(identifiers);
+		final List<ContentletVersionInfo> versionInfos = new ArrayList<>();
+		for (int from = 0; from < idList.size(); from += VERSION_INFO_LOOKUP_CHUNK_SIZE) {
+			final List<String> chunk = idList.subList(from,


🟠 [High] Missing variant filter in version info query

The SQL query in VersionableFactoryImpl.java line 473 selects contentlet version info without filtering by variant_id. This will return multiple records per identifier when content exists in multiple variants, leading to incorrect version data being used for reindexing operations. The query's WHERE clause includes 'identifier in (?)' but lacks a variant_id condition, violating the assumption that identifiers are unique across variants.

jcastro-dotcms added 8 commits June 9, 2026 15:57

Creating draft PR with potential fix

e2ecab5

Adding more potential changes

80cd1c9

Merge branch 'main' into support-36004-save-content-with-related-cont…

ec92fa6

…ent-performance-improvements

Merge branch 'main' into support-36004-save-content-with-related-cont…

6b8cb38

…ent-performance-improvements

Merge branch 'main' into support-36004-save-content-with-related-cont…

9144940

…ent-performance-improvements

Adding more potential changes

f1946e1

Implementing feedback from Claude Bot.

48bc346

Merge branch 'main' into support-36004-save-content-with-related-cont…

f648126

…ent-performance-improvements

github-project-automation Bot added this to dotCMS - Product Planning Jun 12, 2026

jcastro-dotcms mentioned this pull request Jun 12, 2026

perf(relationships): bulk tree persistence, lean hydration and delta-only reindex when saving related content #36092

Closed

3 tasks

github-actions Bot mentioned this pull request Jun 12, 2026

Saving contentlets with many-to-many relationship fields and large related-content volumes is extremely slow #36147

Open

9 tasks

github-actions Bot added the Area : Backend PR changes Java/Maven backend code label Jun 12, 2026

claude Bot added the AI: Safe To Rollback label Jun 12, 2026

sfreudenthaler mentioned this pull request Jun 13, 2026

chore(ai-reviews): bump ai-workflows pin to v3.1.4 #36159

Merged

dsilvam approved these changes Jul 2, 2026

View reviewed changes

Merge branch 'main' into issue-36147-Saving-contentlets-with-many-to-…

42d1991

…many-relationship-fields-and-large-related-content-volumes-is-extremely-slow

github-actions Bot removed the AI: Safe To Rollback label Jul 2, 2026

dsilvam enabled auto-merge July 2, 2026 17:02

github-actions Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread dotcms-integration/src/test/java/com/dotcms/MainSuite3a.java Outdated

Update MainSuite3a.java

e59f1eb

Adding missing comma.

github-actions Bot reviewed Jul 2, 2026

View reviewed changes

Merge branch 'main' into issue-36147-Saving-contentlets-with-many-to-…

e7cafa2

…many-relationship-fields-and-large-related-content-volumes-is-extremely-slow

github-actions Bot reviewed Jul 2, 2026

View reviewed changes

aberlour1990-cloud mentioned this pull request Jul 2, 2026

Restructure README and CLAUDE.md — slim both to navigation hubs with links/imports #36192

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(relationships): bulk tree persistence, lean hydration and delta-only reindex when saving related content#36149

perf(relationships): bulk tree persistence, lean hydration and delta-only reindex when saving related content#36149
jcastro-dotcms wants to merge 11 commits into
mainfrom
issue-36147-Saving-contentlets-with-many-to-many-relationship-fields-and-large-related-content-volumes-is-extremely-slow

jcastro-dotcms commented Jun 12, 2026

Uh oh!

claude Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

jcastro-dotcms commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

semgrep-dotcms Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot Jul 2, 2026

Uh oh!

github-actions Bot Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jcastro-dotcms commented Jun 12, 2026

🎯 Problem

✨ Proposed Changes

✅ Checklist

🧪 Testing

ℹ️ Additional Info

Screenshots

Uh oh!

claude Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcastro-dotcms commented Jun 12, 2026

NOTE TO DEVS

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Codex Review failed — openai.gpt-5.5

Uh oh!

semgrep-dotcms Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 dotBot Review (Bedrock)

Confirmed findings

Uh oh!

Uh oh!

github-actions Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Jun 12, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading

❌ Codex Review failed — `openai.gpt-5.5`

github-actions Bot commented Jul 2, 2026 •

edited

Loading