Skip to content

feat: parallel loop running based on asyncio#932

Merged
you-n-g merged 48 commits intomainfrom
multi-proc
Jun 12, 2025
Merged

feat: parallel loop running based on asyncio#932
you-n-g merged 48 commits intomainfrom
multi-proc

Conversation

@you-n-g
Copy link
Copy Markdown
Contributor

@you-n-g you-n-g commented Jun 4, 2025

Description

Motivation and Context

How Has This Been Tested?

  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Your own tests:

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

📚 Documentation preview 📚: https://RDAgent--932.org.readthedocs.build/en/932/

@you-n-g you-n-g marked this pull request as ready for review June 6, 2025 15:09
@you-n-g you-n-g merged commit c63e207 into main Jun 12, 2025
9 checks passed
@you-n-g you-n-g deleted the multi-proc branch June 12, 2025 03:44
WinstonLiyt pushed a commit that referenced this pull request Jun 16, 2025
* refactor: split workflow into pkg, add WorkflowTracker & wait_retry

* feat: add async LoopBase with parallel workers and step semaphores

* fix: replace pickle with dill and run blocking tasks via joblib wrapper

* feat: add log format settings, dynamic parallelism & pickle-based snapshot

* fix: default step semaphore to 1 and avoid subprocess when single worker

* merge bowen's changes

* merge tim's changes

* refactor: extract component task mapping, add conditional logger setup

* lint

* refactor: add type hints and safer remain_time metric logging in workflow

* lint

* fix: allow BadRequestError to be pickled via custom copyreg reducer

* fix: stop loop when LoopTerminationError is raised in LoopBase

* lint

* refactor: make log tag context-local using ContextVar for thread safety

* feat: add subproc_step flag and helper to decide subprocess execution

* fix: use ./cache path and normalize relative volume bind paths

* fix: reset loop_idx to 0 on loop restart/resume to ensure correct flow

* fix: avoid chmod on cache and input dirs in Env timeout wrapper

* fix: skip chmod on 'cache' and 'input' dirs using find -prune

* fix: restrict chmod to immediate mount dirs excluding cache/input

* fix: chmod cache and input dirs alongside their contents after entry run

* fix: guard chmod with directory checks for cache and input

* fix: prefix mount_path in chmod command for cache/input dirs

* fix: drop quotes from find exclude patterns to ensure chmod executes

* fix: skip chmod on cache/input directories to avoid warning spam

* feat: support string volume mappings and poll subprocess stdout/stderr

* support remove symbolic link

* test: use dynamic home path and code volume in LocalEnv local_simple

* fix: skip trace and progress update when loop step is withdrawn

* refactor: add clean_workspace util and non-destructive workspace backup

* fix: preserve symlinks when backing up workspace with copytree

* fix: prevent AttributeError when _pbar not yet initialized in LoopBase

* perf: replace shutil.copytree with rsync for faster workspace backup

* fix: cast log directory Path to str in tar command of data science loop

* fix: use portable 'cp -r -P' instead of rsync for workspace backup

* fix: add retry and logging to workspace backup for robustness

* refactor: extract backup_folder helper and reuse in DataScienceRDLoop

* fix: propagate backup errors & default _pbar getattr to avoid error

* fix the division by zero bug

* refactor: execute RD loops via asyncio.run and add necessary imports

* lint

* lint

* lint

---------

Co-authored-by: Xu <v-xuminrui@microsoft.com>
qew21 pushed a commit that referenced this pull request Jun 16, 2025
* refactor: split workflow into pkg, add WorkflowTracker & wait_retry

* feat: add async LoopBase with parallel workers and step semaphores

* fix: replace pickle with dill and run blocking tasks via joblib wrapper

* feat: add log format settings, dynamic parallelism & pickle-based snapshot

* fix: default step semaphore to 1 and avoid subprocess when single worker

* merge bowen's changes

* merge tim's changes

* refactor: extract component task mapping, add conditional logger setup

* lint

* refactor: add type hints and safer remain_time metric logging in workflow

* lint

* fix: allow BadRequestError to be pickled via custom copyreg reducer

* fix: stop loop when LoopTerminationError is raised in LoopBase

* lint

* refactor: make log tag context-local using ContextVar for thread safety

* feat: add subproc_step flag and helper to decide subprocess execution

* fix: use ./cache path and normalize relative volume bind paths

* fix: reset loop_idx to 0 on loop restart/resume to ensure correct flow

* fix: avoid chmod on cache and input dirs in Env timeout wrapper

* fix: skip chmod on 'cache' and 'input' dirs using find -prune

* fix: restrict chmod to immediate mount dirs excluding cache/input

* fix: chmod cache and input dirs alongside their contents after entry run

* fix: guard chmod with directory checks for cache and input

* fix: prefix mount_path in chmod command for cache/input dirs

* fix: drop quotes from find exclude patterns to ensure chmod executes

* fix: skip chmod on cache/input directories to avoid warning spam

* feat: support string volume mappings and poll subprocess stdout/stderr

* support remove symbolic link

* test: use dynamic home path and code volume in LocalEnv local_simple

* fix: skip trace and progress update when loop step is withdrawn

* refactor: add clean_workspace util and non-destructive workspace backup

* fix: preserve symlinks when backing up workspace with copytree

* fix: prevent AttributeError when _pbar not yet initialized in LoopBase

* perf: replace shutil.copytree with rsync for faster workspace backup

* fix: cast log directory Path to str in tar command of data science loop

* fix: use portable 'cp -r -P' instead of rsync for workspace backup

* fix: add retry and logging to workspace backup for robustness

* refactor: extract backup_folder helper and reuse in DataScienceRDLoop

* fix: propagate backup errors & default _pbar getattr to avoid error

* fix the division by zero bug

* refactor: execute RD loops via asyncio.run and add necessary imports

* lint

* lint

* lint

---------

Co-authored-by: Xu <v-xuminrui@microsoft.com>
licong01-cloud pushed a commit to licong01-cloud/RD-Agent that referenced this pull request Dec 13, 2025
* refactor: split workflow into pkg, add WorkflowTracker & wait_retry

* feat: add async LoopBase with parallel workers and step semaphores

* fix: replace pickle with dill and run blocking tasks via joblib wrapper

* feat: add log format settings, dynamic parallelism & pickle-based snapshot

* fix: default step semaphore to 1 and avoid subprocess when single worker

* merge bowen's changes

* merge tim's changes

* refactor: extract component task mapping, add conditional logger setup

* lint

* refactor: add type hints and safer remain_time metric logging in workflow

* lint

* fix: allow BadRequestError to be pickled via custom copyreg reducer

* fix: stop loop when LoopTerminationError is raised in LoopBase

* lint

* refactor: make log tag context-local using ContextVar for thread safety

* feat: add subproc_step flag and helper to decide subprocess execution

* fix: use ./cache path and normalize relative volume bind paths

* fix: reset loop_idx to 0 on loop restart/resume to ensure correct flow

* fix: avoid chmod on cache and input dirs in Env timeout wrapper

* fix: skip chmod on 'cache' and 'input' dirs using find -prune

* fix: restrict chmod to immediate mount dirs excluding cache/input

* fix: chmod cache and input dirs alongside their contents after entry run

* fix: guard chmod with directory checks for cache and input

* fix: prefix mount_path in chmod command for cache/input dirs

* fix: drop quotes from find exclude patterns to ensure chmod executes

* fix: skip chmod on cache/input directories to avoid warning spam

* feat: support string volume mappings and poll subprocess stdout/stderr

* support remove symbolic link

* test: use dynamic home path and code volume in LocalEnv local_simple

* fix: skip trace and progress update when loop step is withdrawn

* refactor: add clean_workspace util and non-destructive workspace backup

* fix: preserve symlinks when backing up workspace with copytree

* fix: prevent AttributeError when _pbar not yet initialized in LoopBase

* perf: replace shutil.copytree with rsync for faster workspace backup

* fix: cast log directory Path to str in tar command of data science loop

* fix: use portable 'cp -r -P' instead of rsync for workspace backup

* fix: add retry and logging to workspace backup for robustness

* refactor: extract backup_folder helper and reuse in DataScienceRDLoop

* fix: propagate backup errors & default _pbar getattr to avoid error

* fix the division by zero bug

* refactor: execute RD loops via asyncio.run and add necessary imports

* lint

* lint

* lint

---------

Co-authored-by: Xu <v-xuminrui@microsoft.com>
yongbin4 pushed a commit to yongbin4/RD-Agent that referenced this pull request Mar 8, 2026
* refactor: split workflow into pkg, add WorkflowTracker & wait_retry

* feat: add async LoopBase with parallel workers and step semaphores

* fix: replace pickle with dill and run blocking tasks via joblib wrapper

* feat: add log format settings, dynamic parallelism & pickle-based snapshot

* fix: default step semaphore to 1 and avoid subprocess when single worker

* merge bowen's changes

* merge tim's changes

* refactor: extract component task mapping, add conditional logger setup

* lint

* refactor: add type hints and safer remain_time metric logging in workflow

* lint

* fix: allow BadRequestError to be pickled via custom copyreg reducer

* fix: stop loop when LoopTerminationError is raised in LoopBase

* lint

* refactor: make log tag context-local using ContextVar for thread safety

* feat: add subproc_step flag and helper to decide subprocess execution

* fix: use ./cache path and normalize relative volume bind paths

* fix: reset loop_idx to 0 on loop restart/resume to ensure correct flow

* fix: avoid chmod on cache and input dirs in Env timeout wrapper

* fix: skip chmod on 'cache' and 'input' dirs using find -prune

* fix: restrict chmod to immediate mount dirs excluding cache/input

* fix: chmod cache and input dirs alongside their contents after entry run

* fix: guard chmod with directory checks for cache and input

* fix: prefix mount_path in chmod command for cache/input dirs

* fix: drop quotes from find exclude patterns to ensure chmod executes

* fix: skip chmod on cache/input directories to avoid warning spam

* feat: support string volume mappings and poll subprocess stdout/stderr

* support remove symbolic link

* test: use dynamic home path and code volume in LocalEnv local_simple

* fix: skip trace and progress update when loop step is withdrawn

* refactor: add clean_workspace util and non-destructive workspace backup

* fix: preserve symlinks when backing up workspace with copytree

* fix: prevent AttributeError when _pbar not yet initialized in LoopBase

* perf: replace shutil.copytree with rsync for faster workspace backup

* fix: cast log directory Path to str in tar command of data science loop

* fix: use portable 'cp -r -P' instead of rsync for workspace backup

* fix: add retry and logging to workspace backup for robustness

* refactor: extract backup_folder helper and reuse in DataScienceRDLoop

* fix: propagate backup errors & default _pbar getattr to avoid error

* fix the division by zero bug

* refactor: execute RD loops via asyncio.run and add necessary imports

* lint

* lint

* lint

---------

Co-authored-by: Xu <v-xuminrui@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant