Skip to content

perf(relationships): bulk tree persistence, lean hydration and delta-only reindex when saving related content#36149

Open
jcastro-dotcms wants to merge 11 commits into
mainfrom
issue-36147-Saving-contentlets-with-many-to-many-relationship-fields-and-large-related-content-volumes-is-extremely-slow
Open

perf(relationships): bulk tree persistence, lean hydration and delta-only reindex when saving related content#36149
jcastro-dotcms wants to merge 11 commits into
mainfrom
issue-36147-Saving-contentlets-with-many-to-many-relationship-fields-and-large-related-content-volumes-is-extremely-slow

Conversation

@jcastro-dotcms

Copy link
Copy Markdown
Member

This PR fixes: #36147

🎯 Problem

Saving a contentlet whose Content Type has several MANY_TO_MANY relationship fields became unusably slow as related-content volume grew: on a real-world dataset (~1,500 parents × ~20 children each, so each child relates back to ~350 parents), a single Save/Publish took from 40 seconds up to several minutes and fired tens of thousands of JDBC queries. Three compounding root causes:

  1. Per-row relationship persistence — every related record cost a getTree + saveTree pair (2–3 queries each) plus one DELETE per row.
  2. Unnecessary full hydration — the save path loaded complete contentlets (fields, permissions, version info) when it only needed identifiers or a count.
  3. Over-broad dependency reindexingdependenciesLeftToReindex() re-queued every currently-related contentlet for reindexing on every save, because the old-state/new-state comparison read the new DB state on both sides. Saving one child re-indexed hundreds of unchanged parents, each fully hydrated and remapped.

✨ Proposed Changes

Relationship persistence — bulk, batched, and permission-aware (ESContentletAPIImpl, TreeFactory)

  • deleteRelatedContent now splits into two paths: a fast identifier-only path for the system user / CMS Admins (single bulk DELETE, zero hydration, only removed parents loaded for reindex), and a permission-scoped path for limited users that preserves the legacy contract — relationships to content the user cannot READ are never deleted (bulk delete when the filter removes nothing; chunked IN deletes otherwise).
  • relateContent collects all rows and persists them through a new TreeFactory.insertTrees: a duplicate-safe batched upsert (dedup by the (child, parent, relation_type) PK, batched DELETE+INSERT wrapped in LocalTransaction so it is atomic for any caller). Duplicate records in the payload (same identifier twice, multiple languages) no longer risk a PK violation.
  • The tree-order base comes from a single MAX(tree_order)+1 lookup over surviving rows (replacing a full getRelatedContentFromIndex hydration), queried only on the parent branch where it is actually used.

Hydration elimination (ESContentletAPIImpl, VersionableFactory/Impl)

  • Related identifiers are read straight from the tree table via new TreeFactory identifier lookups — no contentlet hydration to compute delete scopes or counts.
  • Removed parents that genuinely need reindexing are loaded through a new batched, uncached VersionableFactory.findAllContentletVersionInfos(Collection) (chunked IN, 500 ids/statement) + the existing batch findContentlets, instead of one lookup per identifier. All SQL lives in factory classes per house architecture.

Delta-only dependency reindexing (ESMappingAPIImpl)

  • dependenciesLeftToReindex() now compares the search index (pre-save state) against the tree table (post-save state) and returns their true symmetric difference: only relationships actually added or removed. Unchanged parents are never re-queued — this is the biggest win for heavily related datasets.
  • Identifiers are deduped (the index holds one document per language/variant), so multilingual content no longer triggers spurious reindexing; a scroll-search fallback removes the silent 10,000-document truncation.

Hardening (review feedback)

  • assertTreeColumn allowlist guard on every TreeFactory helper that interpolates a column name into SQL (only parent/child accepted) — enforces what the Semgrep findings flagged conventionally.
  • Javadoc pass: class-level doc for TreeFactory, contracts on the bulk deletes (caller-enforced permissions), and a full explanation of the new reindex-delta model on dependenciesLeftToReindex.

✅ Checklist

  • Tests
  • Translations (N/A — backend only)
  • Security Implications Contemplated (add notes if applicable)

Security notes: permission semantics are preserved and now covered by an integration test (a limited user's save cannot delete relationships to content they cannot READ — this also fixes a data-loss risk present in the first draft of this PR). The Semgrep CUSTOM_INJECTION-2 findings are false positives (concatenated fragments are compile-time literals or generated ?,?,... placeholders; all user data is bound parameters) and are additionally guarded by the new column-name allowlist.

🧪 Testing

11 new integration tests, all green:

  • TreeFactoryTest (new class, 7 tests): batch upsert dedup, pre-existing row replacement, scoped deletes, >500-id chunking, ordered id lookups, next-tree-order.
  • ESContentletAPIImplTest (+3): limited-user permission preservation, duplicate-records save, append-after-preserved ordering. Full class: 90/90 passing.
  • ESMappingAPITest (+2): reindex delta returns only added+removed parents; multilingual parents are not spuriously re-indexed.

ℹ️ Additional Info

  • Verified against a production-scale dataset: re-saving heavily related content drops from minutes to seconds, with query counts reduced by orders of magnitude (bulk statements + delta-only reindex instead of per-row queries + full-graph reindex).
  • A manual backport to 24.12.27 LTS is prepared in parallel (adapted for -source 11), including the test suite.
  • Behavioral notes for reviewers: orphan tree rows (pointing to fully-deleted content) are now cleaned up by the bulk delete; tree_order values are rebased per save (relative ordering preserved — consumers only ORDER BY tree_order).

Screenshots

N/A — backend-only performance change.

🤖 Generated with Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Claude finished @dsilvam's task in 3m 1s —— View job


I'll analyze this and get back to you.

@jcastro-dotcms

Copy link
Copy Markdown
Member Author

NOTE TO DEVS

This new PR was created to make the branch match a GitHub ticket number. The initial PR containing the original feedback form Claude Code can be found in the previous version, localed here:

Just to be clear, this new PR contains the exact same code changes from the previous one.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

❌ Codex Review failed — openai.gpt-5.5

The review job failed before producing output. See the run for details.

Run: #27433876927

@semgrep-dotcms

Copy link
Copy Markdown
Contributor

Semgrep found 15 CUSTOM_INJECTION-2 findings:

The method identified is susceptible to injection. The input should be validated and properly
escaped.

If this is a critical or high severity finding, please also link this issue in the #security channel in Slack.

…many-relationship-fields-and-large-related-content-volumes-is-extremely-slow
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

🤖 dotBot Review (Bedrock)

Reviewed 9 file(s); 8 candidate(s) → 3 confirmed, 0 uncertain (unverified, kept for review).

Confirmed findings

  • 🟠 High dotCMS/src/main/java/com/dotmarketing/business/VersionableFactoryImpl.java:473 — Missing variant filter in version info query
    The SQL query in VersionableFactoryImpl.java line 473 selects contentlet version info without filtering by variant_id. This will return multiple records per identifier when content exists in multiple variants, leading to incorrect version data being used for reindexing operations. The query's WHERE clause includes 'identifier in (?)' but lacks a variant_id condition, violating the assumption that identifiers are unique across variants.
  • 🟡 Medium dotCMS/src/main/java/com/dotmarketing/business/VersionableFactory.java:291 — Missing implementation for new abstract method
    The abstract method 'findAllContentletVersionInfos' was added to VersionableFactory but not implemented in VersionableFactoryImpl. Grep results show no implementation in the PR diff, which would cause AbstractMethodError when the method is called.
  • 🟡 Medium dotCMS/src/main/java/com/dotmarketing/business/VersionableFactory.java:291 — Missing exception declaration in VersionableFactory
    The method findAllContentletVersionInfos declares throws DotDataException but similar methods like findContentletVersionInfo throw both DotDataException and DotStateException. Implementation in VersionableFactoryImpl likely throws DotStateException which would violate the interface contract, risking uncaught exceptions.

us.deepseek.r1-v1:0 · Run: #28613778124 · tokens: in: 83017 · out: 17864 · total: 100881 · calls: 21 · est. ~$0.209

@dsilvam dsilvam enabled auto-merge July 2, 2026 17:02
Comment thread dotcms-integration/src/test/java/com/dotcms/MainSuite3a.java Outdated
Adding missing comma.
* Returns the identifiers of the children related to the given parent under the given
* relation type, ordered by tree order. Unlike content-returning lookups, this method never
* hydrates contentlets — it reads the tree table only.
*

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 [High] Unhandled DotDataException in getNextTreeOrder

The method getNextTreeOrder calls dc.loadObjectResults().get(0).get('max_order') followed by getInt() without handling DotDataException. Since DotConnect operations can throw DotDataException, this method should either catch it or declare it in throws. The exception propagates to callers like insertTrees, which also doesn't catch it, leading to potential transaction rollbacks and unhandled errors.

…many-relationship-fields-and-large-related-content-volumes-is-extremely-slow
final List<String> idList = List.copyOf(identifiers);
final List<ContentletVersionInfo> versionInfos = new ArrayList<>();
for (int from = 0; from < idList.size(); from += VERSION_INFO_LOOKUP_CHUNK_SIZE) {
final List<String> chunk = idList.subList(from,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 [High] Missing variant filter in version info query

The SQL query in VersionableFactoryImpl.java line 473 selects contentlet version info without filtering by variant_id. This will return multiple records per identifier when content exists in multiple variants, leading to incorrect version data being used for reindexing operations. The query's WHERE clause includes 'identifier in (?)' but lacks a variant_id condition, violating the assumption that identifiers are unique across variants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Saving contentlets with many-to-many relationship fields and large related-content volumes is extremely slow

2 participants