Skip to content

Conversation

@dsolistorres
Copy link
Contributor

@dsolistorres dsolistorres commented Dec 8, 2025

Closes #33661

This PR addresses performance issues and pagination errors in the site copy job by implementing ElasticSearch Scroll API for a large result set.

Proposed Changes

  • When copying sites with large numbers of contentlets, the copy host job was encountering deep pagination errors when the offset exceeded ElasticSearch's max_result_window (100,000), and also performance degradation with offset-based pagination for large result sets.
  • A refactoring was done on the indexSearchScroll method from the ESContentFactoryImp class to expose the ES scroll API in a new wrapper interface ESContentletScroll. The PaginatedContentlets class uses this new interface to iterate on results using the ES scroll API.
  • SQL queries in HostFactoryImpl were optimized to use structure_inode field from contentlet table to filter hosts, and also to use the ILIKE clause in SQL conditions to match case insensitive values.

Checklist

  • Tests
@dsolistorres dsolistorres force-pushed the issue-33661-optimize-copy-host-job branch 2 times, most recently from 9ef0091 to 163e018 Compare December 12, 2025 23:34
@dsolistorres dsolistorres force-pushed the issue-33661-optimize-copy-host-job branch from 39318d2 to fec8286 Compare December 29, 2025 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

6 participants