perf(relationships): bulk tree persistence, lean hydration and delta-only reindex when saving related content#36149
Conversation
…ent-performance-improvements
…ent-performance-improvements
…ent-performance-improvements
…ent-performance-improvements
NOTE TO DEVSThis new PR was created to make the branch match a GitHub ticket number. The initial PR containing the original feedback form Claude Code can be found in the previous version, localed here: Just to be clear, this new PR contains the exact same code changes from the previous one. |
❌ Codex Review failed —
|
|
Semgrep found 15
The method identified is susceptible to injection. The input should be validated and properly If this is a critical or high severity finding, please also link this issue in the #security channel in Slack. |
…many-relationship-fields-and-large-related-content-volumes-is-extremely-slow
🤖 dotBot Review (Bedrock)Reviewed 9 file(s); 8 candidate(s) → 3 confirmed, 0 uncertain (unverified, kept for review). Confirmed findings
us.deepseek.r1-v1:0 · Run: #28613778124 · tokens: in: 83017 · out: 17864 · total: 100881 · calls: 21 · est. ~$0.209 |
Adding missing comma.
| * Returns the identifiers of the children related to the given parent under the given | ||
| * relation type, ordered by tree order. Unlike content-returning lookups, this method never | ||
| * hydrates contentlets — it reads the tree table only. | ||
| * |
There was a problem hiding this comment.
🟠 [High] Unhandled DotDataException in getNextTreeOrder
The method getNextTreeOrder calls dc.loadObjectResults().get(0).get('max_order') followed by getInt() without handling DotDataException. Since DotConnect operations can throw DotDataException, this method should either catch it or declare it in throws. The exception propagates to callers like insertTrees, which also doesn't catch it, leading to potential transaction rollbacks and unhandled errors.
…many-relationship-fields-and-large-related-content-volumes-is-extremely-slow
| final List<String> idList = List.copyOf(identifiers); | ||
| final List<ContentletVersionInfo> versionInfos = new ArrayList<>(); | ||
| for (int from = 0; from < idList.size(); from += VERSION_INFO_LOOKUP_CHUNK_SIZE) { | ||
| final List<String> chunk = idList.subList(from, |
There was a problem hiding this comment.
🟠 [High] Missing variant filter in version info query
The SQL query in VersionableFactoryImpl.java line 473 selects contentlet version info without filtering by variant_id. This will return multiple records per identifier when content exists in multiple variants, leading to incorrect version data being used for reindexing operations. The query's WHERE clause includes 'identifier in (?)' but lacks a variant_id condition, violating the assumption that identifiers are unique across variants.
This PR fixes: #36147
🎯 Problem
Saving a contentlet whose Content Type has several MANY_TO_MANY relationship fields became unusably slow as related-content volume grew: on a real-world dataset (~1,500 parents × ~20 children each, so each child relates back to ~350 parents), a single Save/Publish took from 40 seconds up to several minutes and fired tens of thousands of JDBC queries. Three compounding root causes:
getTree+saveTreepair (2–3 queries each) plus oneDELETEper row.dependenciesLeftToReindex()re-queued every currently-related contentlet for reindexing on every save, because the old-state/new-state comparison read the new DB state on both sides. Saving one child re-indexed hundreds of unchanged parents, each fully hydrated and remapped.✨ Proposed Changes
Relationship persistence — bulk, batched, and permission-aware (
ESContentletAPIImpl,TreeFactory)deleteRelatedContentnow splits into two paths: a fast identifier-only path for the system user / CMS Admins (single bulkDELETE, zero hydration, only removed parents loaded for reindex), and a permission-scoped path for limited users that preserves the legacy contract — relationships to content the user cannot READ are never deleted (bulk delete when the filter removes nothing; chunkedINdeletes otherwise).relateContentcollects all rows and persists them through a newTreeFactory.insertTrees: a duplicate-safe batched upsert (dedup by the(child, parent, relation_type)PK, batchedDELETE+INSERTwrapped inLocalTransactionso it is atomic for any caller). Duplicate records in the payload (same identifier twice, multiple languages) no longer risk a PK violation.MAX(tree_order)+1lookup over surviving rows (replacing a fullgetRelatedContentFromIndexhydration), queried only on the parent branch where it is actually used.Hydration elimination (
ESContentletAPIImpl,VersionableFactory/Impl)treetable via newTreeFactoryidentifier lookups — no contentlet hydration to compute delete scopes or counts.VersionableFactory.findAllContentletVersionInfos(Collection)(chunkedIN, 500 ids/statement) + the existing batchfindContentlets, instead of one lookup per identifier. All SQL lives in factory classes per house architecture.Delta-only dependency reindexing (
ESMappingAPIImpl)dependenciesLeftToReindex()now compares the search index (pre-save state) against thetreetable (post-save state) and returns their true symmetric difference: only relationships actually added or removed. Unchanged parents are never re-queued — this is the biggest win for heavily related datasets.Hardening (review feedback)
assertTreeColumnallowlist guard on everyTreeFactoryhelper that interpolates a column name into SQL (onlyparent/childaccepted) — enforces what the Semgrep findings flagged conventionally.TreeFactory, contracts on the bulk deletes (caller-enforced permissions), and a full explanation of the new reindex-delta model ondependenciesLeftToReindex.✅ Checklist
Security notes: permission semantics are preserved and now covered by an integration test (a limited user's save cannot delete relationships to content they cannot READ — this also fixes a data-loss risk present in the first draft of this PR). The Semgrep
CUSTOM_INJECTION-2findings are false positives (concatenated fragments are compile-time literals or generated?,?,...placeholders; all user data is bound parameters) and are additionally guarded by the new column-name allowlist.🧪 Testing
11 new integration tests, all green:
TreeFactoryTest(new class, 7 tests): batch upsert dedup, pre-existing row replacement, scoped deletes, >500-id chunking, ordered id lookups, next-tree-order.ESContentletAPIImplTest(+3): limited-user permission preservation, duplicate-records save, append-after-preserved ordering. Full class: 90/90 passing.ESMappingAPITest(+2): reindex delta returns only added+removed parents; multilingual parents are not spuriously re-indexed.ℹ️ Additional Info
-source 11), including the test suite.treerows (pointing to fully-deleted content) are now cleaned up by the bulk delete;tree_ordervalues are rebased per save (relative ordering preserved — consumers onlyORDER BY tree_order).Screenshots
N/A — backend-only performance change.
🤖 Generated with Claude Code