Skip to content

Conversation

@danieljvickers
Copy link
Member

@danieljvickers danieljvickers commented Dec 13, 2025

User description

User description

Description

I accidentally deleted this branch before it got merged. Alright @sbryngelson, get merging before I delete it again.


PR Type

Bug fix, Enhancement


Description

  • Removed deprecated _OLD macro variants from GPU parallel loop macros

  • Replaced all GPU_PARALLEL_LOOP_OLD calls with GPU_PARALLEL_LOOP in MPI communication code

  • Added #ifdef MFC_MPI guards to post-processing FFT and MPI-specific functions

  • Fixed compilation issues when post-processing is enabled without MPI support


Diagram Walkthrough

flowchart LR
  A["Deprecated _OLD Macros"] -->|Remove| B["Clean Macro Definitions"]
  C["GPU_PARALLEL_LOOP_OLD Calls"] -->|Replace| D["GPU_PARALLEL_LOOP Calls"]
  E["MPI Code in Post-Processing"] -->|Guard with ifdef| F["MFC_MPI Conditional Blocks"]
  B --> G["Fixed Compilation"]
  D --> G
  F --> G
Loading

File Walkthrough

Relevant files
Code cleanup
acc_macros.fpp
Remove deprecated ACC_PARALLEL_LOOP_OLD macro                       

src/common/include/acc_macros.fpp

  • Removed deprecated ACC_PARALLEL_LOOP_OLD macro definition
  • Kept ACC_PARALLEL_LOOP macro unchanged
  • Cleaned up 32 lines of obsolete macro code
+0/-32   
omp_macros.fpp
Remove deprecated OMP_PARALLEL_LOOP_OLD macro                       

src/common/include/omp_macros.fpp

  • Removed deprecated OMP_PARALLEL_LOOP_OLD macro definition
  • Kept OMP_PARALLEL_LOOP macro unchanged
  • Eliminated 47 lines of obsolete OpenMP macro code
+0/-47   
parallel_macros.fpp
Remove deprecated GPU_PARALLEL_LOOP_OLD macro                       

src/common/include/parallel_macros.fpp

  • Removed deprecated GPU_PARALLEL_LOOP_OLD macro definition
  • Kept GPU_PARALLEL_LOOP macro unchanged
  • Cleaned up 22 lines of obsolete wrapper macro code
+0/-17   
Bug fix
m_mpi_common.fpp
Replace GPU_PARALLEL_LOOP_OLD with GPU_PARALLEL_LOOP calls

src/common/m_mpi_common.fpp

  • Replaced all 12 instances of #:call GPU_PARALLEL_LOOP_OLD with
    $:GPU_PARALLEL_LOOP
  • Replaced all 12 instances of #:endcall GPU_PARALLEL_LOOP_OLD with
    $:END_GPU_PARALLEL_LOOP()
  • Fixed indentation and formatting of nested loop structures
  • Updated MPI buffer unpacking code for all three spatial directions
+119/-119
m_start_up.fpp
Add MFC_MPI conditional guards to post-processing functions

src/post_process/m_start_up.fpp

  • Added #ifdef MFC_MPI guard around MPI_ALLREDUCE call in FFT energy
    computation
  • Added #ifdef MFC_MPI guards around s_mpi_transpose_x2y subroutine body
  • Added #ifdef MFC_MPI guards around s_mpi_transpose_y2z subroutine body
  • Added #ifdef MFC_MPI guards around FFT initialization code in
    s_initialize_modules
  • Added #ifdef MFC_MPI guards around s_mpi_FFT_fwd subroutine body
  • Added #ifdef MFC_MPI guards around MPI domain initialization in
    s_initialize_mpi_domain
  • Added #ifdef MFC_MPI guards around MPI communicator cleanup in
    s_finalize_modules
  • Moved num_dims calculation outside MPI conditional block
+24/-0   


CodeAnt-AI Description

Ensure post-processing compiles and runs correctly when MPI or legacy macros are absent

What Changed

  • Replaced deprecated GPU_PARALLEL_LOOP_OLD usages with the current GPU_PARALLEL_LOOP/END_GPU_PARALLEL_LOOP pairs so parallel unpack loops expand with the updated macros
  • Wrapped MPI-dependent reductions, transposes, FFT preparations, domain initialization, and communicator cleanup in conditional checks so post-processing does not invoke MPI functions or allocate MPI buffers when MPI is not enabled
  • Removed legacy OpenACC/OpenMP "OLD" macro definitions to avoid using outdated macro variants during code generation

Impact

✅ Builds post-processing without MPI enabled
✅ Fewer compilation failures due to removed legacy macros
✅ Avoids calling MPI routines when MPI is not present

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

  • Refactor

    • Removed legacy parallel-loop variants and consolidated modern parallel-loop usage for cleaner, more maintainable parallel execution.
    • Rewrote loop-wrapping syntax across MPI-aware sections to use the updated parallel loop pattern.
  • Chores

    • Added preprocessor guards around MPI allocations, communications, and cleanup to keep serial builds unaffected.
    • General cleanup and modernization of parallel/MPI-related code paths.

✏️ Tip: You can customize this high-level summary in your review settings.

@codeant-ai
Copy link

codeant-ai bot commented Dec 13, 2025

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 13, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Removes legacy parallel-loop macros (ACC_PARALLEL_LOOP_OLD, OMP_PARALLEL_LOOP_OLD, GPU_PARALLEL_LOOP_OLD), updates call sites to use the new GPU_PARALLEL_LOOP/END markers in MPI packing/unpacking, and wraps MPI-specific startup/FFT code with #ifdef MFC_MPI guards for serial builds.

Changes

Cohort / File(s) Summary
Macro definition removals
src/common/include/acc_macros.fpp, src/common/include/omp_macros.fpp, src/common/include/parallel_macros.fpp
Deleted legacy macros ACC_PARALLEL_LOOP_OLD, OMP_PARALLEL_LOOP_OLD, and GPU_PARALLEL_LOOP_OLD including their full parameter lists and implementation blocks. Existing non-OLD macro definitions remain unchanged.
MPI macro usage migration
src/common/m_mpi_common.fpp
Replaced GPU_PARALLEL_LOOP_OLD(...) call sites with $:GPU_PARALLEL_LOOP(...) and substituted old end markers with $:END_GPU_PARALLEL_LOOP() across MPI recv/send and unpack/pack sections (qbmm and q_comm/pb_in/mv_in branches).
MPI conditional compilation
src/post_process/m_start_up.fpp
Added #ifdef MFC_MPI guards around MPI allocations, communications, FFT initialization/aggregation, domain decomposition, broadcasts, and MPI cleanup. Introduced local num_dims initialization within guarded sections where needed to preserve serial builds.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Review focuses:
    • src/post_process/m_start_up.fpp: verify all MPI blocks are correctly guarded and that num_dims and other locals are in scope for both MPI and non-MPI builds.
    • src/common/m_mpi_common.fpp: confirm every GPU_PARALLEL_LOOP_OLD replacement has matching start/end markers and preserved private/reduction lists.
    • Macro files (acc_macros.fpp, omp_macros.fpp, parallel_macros.fpp): ensure no remaining references to removed OLD macros exist.

Possibly related PRs

Suggested labels

Review effort 4/5, size:XXL

Poem

🐰 Hopping through the code with ears held high,
Old macros left the patch, new markers catch the sky,
Guards fold round MPI so serial builds can play,
Loops now start and end in a tidy, brighter way,
I nibble at bugs — then scamper off to bounce and say hooray! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Code Clean Up Redux' is vague and generic, using non-descriptive terms that do not convey meaningful information about the specific changes in the changeset. Use a more descriptive title that highlights the main changes, such as 'Remove deprecated _OLD macros and add MFC_MPI guards to post-processing' or 'Fix post-processing compilation without MPI by removing legacy macros'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive and well-structured, covering the motivation, detailed changes across multiple files, and impact. However, it does not follow the provided repository template structure with its required sections and checkboxes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The replacements of GPU_PARALLEL_LOOP_OLD with $:GPU_PARALLEL_LOOP introduce $:END_GPU_PARALLEL_LOOP() terminators. Ensure this macro exists and is correctly paired with $:GPU_PARALLEL_LOOP(...) in this codegen system; otherwise codegen may fail or emit stray directives.

                    $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
                    do l = 0, p
                        do k = 0, n
                            do j = -buff_size, -1
                                do i = 1, nVar
                                    r = (i - 1) + v_size* &
                                        (j + buff_size*((k + 1) + (n + 1)*l))
                                    q_comm(i)%sf(j + unpack_offset, k, l) = real(buff_recv(r), kind=stp)
#if defined(__INTEL_COMPILER)
                                    if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
                                        print *, "Error", j, k, l, i
                                        error stop "NaN(s) in recv"
                                    end if
#endif
                                end do
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                    if (qbmm_comm) then
                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
                        do l = 0, p
                            do k = 0, n
                                do j = -buff_size, -1
                                    do i = nVar + 1, nVar + 4
                                        do q = 1, nb
                                            r = (i - 1) + (q - 1)*4 + v_size* &
                                                (j + buff_size*((k + 1) + (n + 1)*l))
                                            pb_in(j + unpack_offset, k, l, i - nVar, q) = real(buff_recv(r), kind=stp)
                                        end do
                                    end do
                                end do
                            end do
                        end do
                        $:END_GPU_PARALLEL_LOOP()

                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
                        do l = 0, p
                            do k = 0, n
                                do j = -buff_size, -1
                                    do i = nVar + 1, nVar + 4
                                        do q = 1, nb
                                            r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
                                                (j + buff_size*((k + 1) + (n + 1)*l))
                                            mv_in(j + unpack_offset, k, l, i - nVar, q) = real(buff_recv(r), kind=stp)
                                        end do
                                    end do
                                end do
                            end do
                        end do
                        $:END_GPU_PARALLEL_LOOP()
                    end if
                #:elif mpi_dir == 2
                    $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
                    do i = 1, nVar
                        do l = 0, p
                            do k = -buff_size, -1
                                do j = -buff_size, m + buff_size
                                    r = (i - 1) + v_size* &
                                        ((j + buff_size) + (m + 2*buff_size + 1)* &
                                         ((k + buff_size) + buff_size*l))
                                    q_comm(i)%sf(j, k + unpack_offset, l) = real(buff_recv(r), kind=stp)
#if defined(__INTEL_COMPILER)
                                    if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
                                        print *, "Error", j, k, l, i
                                        error stop "NaN(s) in recv"
                                    end if
#endif
                                end do
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                    if (qbmm_comm) then
                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
                        do i = nVar + 1, nVar + 4
                            do l = 0, p
                                do k = -buff_size, -1
                                    do j = -buff_size, m + buff_size
                                        do q = 1, nb
                                            r = (i - 1) + (q - 1)*4 + v_size* &
                                                ((j + buff_size) + (m + 2*buff_size + 1)* &
                                                 ((k + buff_size) + buff_size*l))
                                            pb_in(j, k + unpack_offset, l, i - nVar, q) = real(buff_recv(r), kind=stp)
                                        end do
                                    end do
                                end do
                            end do
                        end do
                        $:END_GPU_PARALLEL_LOOP()

                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
                        do i = nVar + 1, nVar + 4
                            do l = 0, p
                                do k = -buff_size, -1
                                    do j = -buff_size, m + buff_size
                                        do q = 1, nb
                                            r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
                                                ((j + buff_size) + (m + 2*buff_size + 1)* &
                                                 ((k + buff_size) + buff_size*l))
                                            mv_in(j, k + unpack_offset, l, i - nVar, q) = real(buff_recv(r), kind=stp)
                                        end do
                                    end do
                                end do
                            end do
                        end do
                        $:END_GPU_PARALLEL_LOOP()
                    end if
                #:else
                    ! Unpacking buffer from bc_z%beg
                    $:GPU_PARALLEL_LOOP(collapse=4,private='[r]')
                    do i = 1, nVar
                        do l = -buff_size, -1
                            do k = -buff_size, n + buff_size
                                do j = -buff_size, m + buff_size
                                    r = (i - 1) + v_size* &
                                        ((j + buff_size) + (m + 2*buff_size + 1)* &
                                         ((k + buff_size) + (n + 2*buff_size + 1)* &
                                          (l + buff_size)))
                                    q_comm(i)%sf(j, k, l + unpack_offset) = real(buff_recv(r), kind=stp)
#if defined(__INTEL_COMPILER)
                                    if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
                                        print *, "Error", j, k, l, i
                                        error stop "NaN(s) in recv"
                                    end if
#endif
                                end do
                            end do
                        end do
                    end do
                    $:END_GPU_PARALLEL_LOOP()

                    if (qbmm_comm) then
                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
                        do i = nVar + 1, nVar + 4
                            do l = -buff_size, -1
                                do k = -buff_size, n + buff_size
                                    do j = -buff_size, m + buff_size
                                        do q = 1, nb
                                            r = (i - 1) + (q - 1)*4 + v_size* &
                                                ((j + buff_size) + (m + 2*buff_size + 1)* &
                                                 ((k + buff_size) + (n + 2*buff_size + 1)* &
                                                  (l + buff_size)))
                                            pb_in(j, k, l + unpack_offset, i - nVar, q) = real(buff_recv(r), kind=stp)
                                        end do
                                    end do
                                end do
                            end do
                        end do
                        $:END_GPU_PARALLEL_LOOP()

                        $:GPU_PARALLEL_LOOP(collapse=5,private='[r]')
                        do i = nVar + 1, nVar + 4
                            do l = -buff_size, -1
                                do k = -buff_size, n + buff_size
                                    do j = -buff_size, m + buff_size
                                        do q = 1, nb
                                            r = (i - 1) + (q - 1)*4 + nb*4 + v_size* &
                                                ((j + buff_size) + (m + 2*buff_size + 1)* &
                                                 ((k + buff_size) + (n + 2*buff_size + 1)* &
                                                  (l + buff_size)))
                                            mv_in(j, k, l + unpack_offset, i - nVar, q) = real(buff_recv(r), kind=stp)
                                        end do
                                    end do
                                end do
                            end do
                        end do
                        $:END_GPU_PARALLEL_LOOP()
MPI Guards

New #ifdef MFC_MPI guards wrap full subroutine bodies (e.g., s_mpi_transpose_x2y, s_mpi_transpose_y2z, s_mpi_FFT_fwd). When MFC_MPI is undefined, these routines become effectively no-ops but still referenced elsewhere. Verify callers handle the non-MPI path to avoid silent skips or uninitialized buffers.

#ifdef MFC_MPI

        allocate (sendbuf(Nx*Nyloc*Nzloc))
        allocate (recvbuf(Nx*Nyloc*Nzloc))

        do dest_rank = 0, num_procs_y - 1
            do l = 1, Nzloc
                do k = 1, Nyloc
                    do j = 1, Nxloc
                        sendbuf(j + (k - 1)*Nxloc + (l - 1)*Nxloc*Nyloc + dest_rank*Nxloc*Nyloc*Nzloc) = data_cmplx(j + dest_rank*Nxloc, k, l)
                    end do
                end do
            end do
        end do

        call MPI_Alltoall(sendbuf, Nxloc*Nyloc*Nzloc, MPI_C_DOUBLE_COMPLEX, &
                          recvbuf, Nxloc*Nyloc*Nzloc, MPI_C_DOUBLE_COMPLEX, MPI_COMM_CART12, ierr)

        do src_rank = 0, num_procs_y - 1
            do l = 1, Nzloc
                do k = 1, Nyloc
                    do j = 1, Nxloc
                        data_cmplx_y(j, k + src_rank*Nyloc, l) = recvbuf(j + (k - 1)*Nxloc + (l - 1)*Nxloc*Nyloc + src_rank*Nxloc*Nyloc*Nzloc)
                    end do
                end do
            end do
        end do

        deallocate (sendbuf)
        deallocate (recvbuf)

#endif

    end subroutine s_mpi_transpose_x2y

    subroutine s_mpi_transpose_y2z
        complex(c_double_complex), allocatable :: sendbuf(:), recvbuf(:)
        integer :: dest_rank, src_rank
        integer :: j, k, l

#ifdef MFC_MPI

        allocate (sendbuf(Ny*Nxloc*Nzloc))
        allocate (recvbuf(Ny*Nxloc*Nzloc))

        do dest_rank = 0, num_procs_z - 1
            do l = 1, Nzloc
                do j = 1, Nxloc
                    do k = 1, Nyloc2
                        sendbuf(k + (j - 1)*Nyloc2 + (l - 1)*(Nyloc2*Nxloc) + dest_rank*Nyloc2*Nxloc*Nzloc) = data_cmplx_y(j, k + dest_rank*Nyloc2, l)
                    end do
                end do
            end do
        end do

        call MPI_Alltoall(sendbuf, Nyloc2*Nxloc*Nzloc, MPI_C_DOUBLE_COMPLEX, &
                          recvbuf, Nyloc2*Nxloc*Nzloc, MPI_C_DOUBLE_COMPLEX, MPI_COMM_CART13, ierr)

        do src_rank = 0, num_procs_z - 1
            do l = 1, Nzloc
                do j = 1, Nxloc
                    do k = 1, Nyloc2
                        data_cmplx_z(j, k, l + src_rank*Nzloc) = recvbuf(k + (j - 1)*Nyloc2 + (l - 1)*(Nyloc2*Nxloc) + src_rank*Nyloc2*Nxloc*Nzloc)
                    end do
                end do
            end do
        end do

        deallocate (sendbuf)
        deallocate (recvbuf)

#endif

    end subroutine s_mpi_transpose_y2z

    impure subroutine s_initialize_modules
        ! Computation of parameters, allocation procedures, and/or any other tasks
        ! needed to properly setup the modules
        integer :: size_n(1), inembed(1), onembed(1)

        call s_initialize_global_parameters_module()
        if (bubbles_euler .and. nb > 1) then
            call s_simpson(weight, R0)
        end if
        if (bubbles_euler .and. .not. polytropic) then
            call s_initialize_nonpoly()
        end if
        if (num_procs > 1) then
            call s_initialize_mpi_proxy_module()
            call s_initialize_mpi_common_module()
        end if
        call s_initialize_boundary_common_module()
        call s_initialize_variables_conversion_module()
        call s_initialize_data_input_module()
        call s_initialize_derived_variables_module()
        call s_initialize_data_output_module()

        ! Associate pointers for serial or parallel I/O
        if (parallel_io .neqv. .true.) then
            s_read_data_files => s_read_serial_data_files
        else
            s_read_data_files => s_read_parallel_data_files
        end if

#ifdef MFC_MPI
        if (fft_wrt) then

            num_procs_x = (m_glb + 1)/(m + 1)
            num_procs_y = (n_glb + 1)/(n + 1)
            num_procs_z = (p_glb + 1)/(p + 1)

            Nx = m_glb + 1
            Ny = n_glb + 1
            Nz = p_glb + 1

            Nxloc = (m_glb + 1)/num_procs_y
            Nyloc = n + 1
            Nyloc2 = (n_glb + 1)/num_procs_z
            Nzloc = p + 1

            Nf = max(Nx, Ny, Nz)

            @:ALLOCATE(data_in(Nx*Nyloc*Nzloc))
            @:ALLOCATE(data_out(Nx*Nyloc*Nzloc))

            @:ALLOCATE(data_cmplx(Nx, Nyloc, Nzloc))
            @:ALLOCATE(data_cmplx_y(Nxloc, Ny, Nzloc))
            @:ALLOCATE(data_cmplx_z(Nxloc, Nyloc2, Nz))

            @:ALLOCATE(En_real(Nxloc, Nyloc2, Nz))
            @:ALLOCATE(En(Nf))

            size_n(1) = Nx
            inembed(1) = Nx
            onembed(1) = Nx

            fwd_plan_x = fftw_plan_many_dft(1, size_n, Nyloc*Nzloc, &
                                            data_in, inembed, 1, Nx, &
                                            data_out, onembed, 1, Nx, &
                                            FFTW_FORWARD, FFTW_MEASURE)

            size_n(1) = Ny
            inembed(1) = Ny
            onembed(1) = Ny

            fwd_plan_y = fftw_plan_many_dft(1, size_n, Nxloc*Nzloc, &
                                            data_out, inembed, 1, Ny, &
                                            data_in, onembed, 1, Ny, &
                                            FFTW_FORWARD, FFTW_MEASURE)

            size_n(1) = Nz
            inembed(1) = Nz
            onembed(1) = Nz

            fwd_plan_z = fftw_plan_many_dft(1, size_n, Nxloc*Nyloc2, &
                                            data_in, inembed, 1, Nz, &
                                            data_out, onembed, 1, Nz, &
                                            FFTW_FORWARD, FFTW_MEASURE)

            call MPI_CART_CREATE(MPI_COMM_WORLD, 3, (/num_procs_x, &
                                                      num_procs_y, num_procs_z/), &
                                 (/.true., .true., .true./), &
                                 .false., MPI_COMM_CART, ierr)
            call MPI_CART_COORDS(MPI_COMM_CART, proc_rank, 3, &
                                 cart3d_coords, ierr)

            call MPI_Cart_SUB(MPI_COMM_CART, (/.true., .true., .false./), MPI_COMM_CART12, ierr)
            call MPI_COMM_RANK(MPI_COMM_CART12, proc_rank12, ierr)
            call MPI_CART_COORDS(MPI_COMM_CART12, proc_rank12, 2, cart2d12_coords, ierr)

            call MPI_Cart_SUB(MPI_COMM_CART, (/.true., .false., .true./), MPI_COMM_CART13, ierr)
            call MPI_COMM_RANK(MPI_COMM_CART13, proc_rank13, ierr)
            call MPI_CART_COORDS(MPI_COMM_CART13, proc_rank13, 2, cart2d13_coords, ierr)

        end if
#endif
    end subroutine s_initialize_modules

    subroutine s_mpi_FFT_fwd

        integer :: j, k, l

#ifdef MFC_MPI

        do l = 1, Nzloc
            do k = 1, Nyloc
                do j = 1, Nx
                    data_in(j + (k - 1)*Nx + (l - 1)*Nx*Nyloc) = data_cmplx(j, k, l)
                end do
            end do
        end do

        call fftw_execute_dft(fwd_plan_x, data_in, data_out)

        do l = 1, Nzloc
            do k = 1, Nyloc
                do j = 1, Nx
                    data_cmplx(j, k, l) = data_out(j + (k - 1)*Nx + (l - 1)*Nx*Nyloc)
                end do
            end do
        end do

        call s_mpi_transpose_x2y !!Change Pencil from data_cmplx to data_cmpx_y

        do l = 1, Nzloc
            do k = 1, Nxloc
                do j = 1, Ny
                    data_out(j + (k - 1)*Ny + (l - 1)*Ny*Nxloc) = data_cmplx_y(k, j, l)
                end do
            end do
        end do

        call fftw_execute_dft(fwd_plan_y, data_out, data_in)

        do l = 1, Nzloc
            do k = 1, Nxloc
                do j = 1, Ny
                    data_cmplx_y(k, j, l) = data_in(j + (k - 1)*Ny + (l - 1)*Ny*Nxloc)
                end do
            end do
        end do

        call s_mpi_transpose_y2z !!Change Pencil from data_cmplx_y to data_cmpx_z

        do l = 1, Nyloc2
            do k = 1, Nxloc
                do j = 1, Nz
                    data_in(j + (k - 1)*Nz + (l - 1)*Nz*Nxloc) = data_cmplx_z(k, l, j)
                end do
            end do
        end do

        call fftw_execute_dft(fwd_plan_z, data_in, data_out)

        do l = 1, Nyloc2
            do k = 1, Nxloc
                do j = 1, Nz
                    data_cmplx_z(k, l, j) = data_out(j + (k - 1)*Nz + (l - 1)*Nz*Nxloc)
                end do
            end do
        end do

#endif

    end subroutine s_mpi_FFT_fwd
NaN Checks

Intel-compiler-only NaN checks in unpack loops remain inside GPU parallel regions. Confirm that these conditionals compile under targeted accelerators and don’t inhibit vectorization/offloading; consider moving checks outside or gating by device.

        ! Associate pointers for serial or parallel I/O
        if (parallel_io .neqv. .true.) then
            s_read_data_files => s_read_serial_data_files
        else
            s_read_data_files => s_read_parallel_data_files
        end if

#ifdef MFC_MPI
        if (fft_wrt) then

            num_procs_x = (m_glb + 1)/(m + 1)
            num_procs_y = (n_glb + 1)/(n + 1)
            num_procs_z = (p_glb + 1)/(p + 1)

            Nx = m_glb + 1
            Ny = n_glb + 1
            Nz = p_glb + 1

            Nxloc = (m_glb + 1)/num_procs_y
            Nyloc = n + 1
            Nyloc2 = (n_glb + 1)/num_procs_z
            Nzloc = p + 1

            Nf = max(Nx, Ny, Nz)

            @:ALLOCATE(data_in(Nx*Nyloc*Nzloc))
            @:ALLOCATE(data_out(Nx*Nyloc*Nzloc))

            @:ALLOCATE(data_cmplx(Nx, Nyloc, Nzloc))
            @:ALLOCATE(data_cmplx_y(Nxloc, Ny, Nzloc))
            @:ALLOCATE(data_cmplx_z(Nxloc, Nyloc2, Nz))

            @:ALLOCATE(En_real(Nxloc, Nyloc2, Nz))
            @:ALLOCATE(En(Nf))

            size_n(1) = Nx
            inembed(1) = Nx
            onembed(1) = Nx

            fwd_plan_x = fftw_plan_many_dft(1, size_n, Nyloc*Nzloc, &
                                            data_in, inembed, 1, Nx, &
                                            data_out, onembed, 1, Nx, &
                                            FFTW_FORWARD, FFTW_MEASURE)

            size_n(1) = Ny
            inembed(1) = Ny
            onembed(1) = Ny

            fwd_plan_y = fftw_plan_many_dft(1, size_n, Nxloc*Nzloc, &
                                            data_out, inembed, 1, Ny, &
                                            data_in, onembed, 1, Ny, &
                                            FFTW_FORWARD, FFTW_MEASURE)

            size_n(1) = Nz
            inembed(1) = Nz
            onembed(1) = Nz

            fwd_plan_z = fftw_plan_many_dft(1, size_n, Nxloc*Nyloc2, &
                                            data_in, inembed, 1, Nz, &
                                            data_out, onembed, 1, Nz, &
                                            FFTW_FORWARD, FFTW_MEASURE)

            call MPI_CART_CREATE(MPI_COMM_WORLD, 3, (/num_procs_x, &
                                                      num_procs_y, num_procs_z/), &
                                 (/.true., .true., .true./), &
                                 .false., MPI_COMM_CART, ierr)
            call MPI_CART_COORDS(MPI_COMM_CART, proc_rank, 3, &
                                 cart3d_coords, ierr)

            call MPI_Cart_SUB(MPI_COMM_CART, (/.true., .true., .false./), MPI_COMM_CART12, ierr)
            call MPI_COMM_RANK(MPI_COMM_CART12, proc_rank12, ierr)
            call MPI_CART_COORDS(MPI_COMM_CART12, proc_rank12, 2, cart2d12_coords, ierr)

            call MPI_Cart_SUB(MPI_COMM_CART, (/.true., .false., .true./), MPI_COMM_CART13, ierr)
            call MPI_COMM_RANK(MPI_COMM_CART13, proc_rank13, ierr)
            call MPI_CART_COORDS(MPI_COMM_CART13, proc_rank13, 2, cart2d13_coords, ierr)

        end if
#endif
    end subroutine s_initialize_modules

    subroutine s_mpi_FFT_fwd

        integer :: j, k, l

#ifdef MFC_MPI

        do l = 1, Nzloc
            do k = 1, Nyloc
                do j = 1, Nx
                    data_in(j + (k - 1)*Nx + (l - 1)*Nx*Nyloc) = data_cmplx(j, k, l)
                end do
            end do
        end do

        call fftw_execute_dft(fwd_plan_x, data_in, data_out)

        do l = 1, Nzloc
            do k = 1, Nyloc
                do j = 1, Nx
                    data_cmplx(j, k, l) = data_out(j + (k - 1)*Nx + (l - 1)*Nx*Nyloc)
                end do
            end do
        end do

        call s_mpi_transpose_x2y !!Change Pencil from data_cmplx to data_cmpx_y

        do l = 1, Nzloc
            do k = 1, Nxloc
                do j = 1, Ny
                    data_out(j + (k - 1)*Ny + (l - 1)*Ny*Nxloc) = data_cmplx_y(k, j, l)
                end do
            end do
        end do

        call fftw_execute_dft(fwd_plan_y, data_out, data_in)

        do l = 1, Nzloc
            do k = 1, Nxloc
                do j = 1, Ny
                    data_cmplx_y(k, j, l) = data_in(j + (k - 1)*Ny + (l - 1)*Ny*Nxloc)
                end do
            end do
        end do

        call s_mpi_transpose_y2z !!Change Pencil from data_cmplx_y to data_cmpx_z

        do l = 1, Nyloc2
            do k = 1, Nxloc
                do j = 1, Nz
                    data_in(j + (k - 1)*Nz + (l - 1)*Nz*Nxloc) = data_cmplx_z(k, l, j)
                end do
            end do
        end do

        call fftw_execute_dft(fwd_plan_z, data_in, data_out)

        do l = 1, Nyloc2
            do k = 1, Nxloc
                do j = 1, Nz
                    data_cmplx_z(k, l, j) = data_out(j + (k - 1)*Nz + (l - 1)*Nz*Nxloc)
                end do
            end do
        end do

#endif

    end subroutine s_mpi_FFT_fwd
@codeant-ai codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Dec 13, 2025
Comment on lines +968 to 974
q_comm(i)%sf(j + unpack_offset, k, l) = real(buff_recv(r), kind=stp)
#if defined(__INTEL_COMPILER)
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Fix the array index in the ieee_is_nan check to use j + unpack_offset to match the array element being assigned, ensuring correct NaN validation. [possible issue, importance: 8]

Suggested change
q_comm(i)%sf(j + unpack_offset, k, l) = real(buff_recv(r), kind=stp)
#if defined(__INTEL_COMPILER)
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
#endif
q_comm(i)%sf(j + unpack_offset, k, l) = real(buff_recv(r), kind=stp)
#if defined(__INTEL_COMPILER)
if (ieee_is_nan(q_comm(i)%sf(j + unpack_offset, k, l))) then
print *, "Error", j, k, l, i
error stop "NaN(s) in recv"
end if
#endif
Comment on lines 1139 to 1145
impure subroutine s_initialize_mpi_domain

num_dims = 1 + min(1, n) + min(1, p)

#ifdef MFC_MPI
! Initialization of the MPI environment
call s_mpi_initialize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Move the num_dims calculation back inside the #ifdef MFC_MPI block to prevent using uninitialized variables n and p when compiling without MPI. [possible issue, importance: 9]

Suggested change
impure subroutine s_initialize_mpi_domain
num_dims = 1 + min(1, n) + min(1, p)
#ifdef MFC_MPI
! Initialization of the MPI environment
call s_mpi_initialize()
impure subroutine s_initialize_mpi_domain
#ifdef MFC_MPI
! Initialization of the MPI environment
call s_mpi_initialize()
! Processor with rank 0 assigns default user input values prior to reading
! from the input file. This is necessary because the other processors do not
! have access to it.
if (proc_rank == 0) call s_set_default_user_inputs()
! The root processor (rank = 0) broadcasts the user-defined parameters to
! all other processors. This must be done before the computational domain
! can be decomposed among all available processors.
call s_mpi_bcast_user_inputs()
num_dims = 1 + min(1, n) + min(1, p)
@codeant-ai
Copy link

codeant-ai bot commented Dec 13, 2025

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Device vs host code
    The new $:GPU_PARALLEL_LOOP regions may run on the device (GPU). Host-only operations (printing, ieee_is_nan, error stop, MPI calls, or any library calls not available on device) inside those regions will fail or produce undefined behaviour. Verify the new macro semantics and ensure any host-only checks are moved out or guarded.

  • FFT / MPI initialization coupling
    FFT and MPI initialisation code is now guarded by MFC_MPI. Ensure the variables used (Nx, Nxloc, fwd_plan_x, etc.) are consistently initialized only when both FFT is requested (fft_wrt) and MPI is enabled; check call-sites that might assume these are valid even when one of those conditions is false.

  • s_mpi_FFT_fwd guard correctness
    The entire FFT forward routine body is now wrapped with #ifdef MFC_MPI. Verify that callers of s_mpi_FFT_fwd won't rely on side-effects when MFC_MPI is undefined, and that wrapper placement prevents accidental use of unallocated FFT buffers or plans.

  • MPI collective call
    The added Allreduce call may not match the MPI module's procedure naming or intended communicator. Confirm the symbol MPI_ALLREDUCE is available (or should be MPI_Allreduce), that the MPI datatype (mpi_p) is correct for a real array En, and that the correct communicator is used (MPI_COMM_WORLD vs a Cartesian communicator).

@codeant-ai
Copy link

codeant-ai bot commented Dec 13, 2025

CodeAnt AI finished reviewing your PR.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files

Prompt for AI agents (all 1 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/post_process/m_start_up.fpp">

<violation number="1" location="src/post_process/m_start_up.fpp:1141">
P1: `num_dims` is calculated before `n` and `p` are read from the input file. The `s_read_input_file()` call that populates `n` and `p` is inside the `#ifdef MFC_MPI` block below this line. This line should be moved inside the MPI block, after the input file is read.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR


impure subroutine s_initialize_mpi_domain

num_dims = 1 + min(1, n) + min(1, p)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: num_dims is calculated before n and p are read from the input file. The s_read_input_file() call that populates n and p is inside the #ifdef MFC_MPI block below this line. This line should be moved inside the MPI block, after the input file is read.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/post_process/m_start_up.fpp, line 1141:

<comment>`num_dims` is calculated before `n` and `p` are read from the input file. The `s_read_input_file()` call that populates `n` and `p` is inside the `#ifdef MFC_MPI` block below this line. This line should be moved inside the MPI block, after the input file is read.</comment>

<file context>
@@ -1118,9 +1132,15 @@ contains
 
     impure subroutine s_initialize_mpi_domain
+
+        num_dims = 1 + min(1, n) + min(1, p)
+
+#ifdef MFC_MPI
</file context>
Fix with Cubic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is interesting. i'm surprised it hasn't come up as an issue before w/ no-mpi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifically, that it vaguely appears that num_dims is only set in this subroutine, so remains unset in --no-mpi configurations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've saved this as a new learning to improve future reviews.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:

specifically, that it vaguely appears that num_dims is only set in this subroutine, so remains unset in --no-mpi configurations

Answer:

You can steer Qodo Merge to reliably catch this kind of “no-MPI path leaves a variable unset / initialized too early” issue by tightening Improve/Review guidance and raising severity for build-time/runtime correctness over style:

  • Instruct Improve to look for guard-dependent init, early computations, and variables set only in one preprocessor branch:
    [pr_code_suggestions]
    extra_instructions="""
    Focus on correctness over style:

    • Flag variables initialized only inside conditional compile-time guards (e.g., #ifdef MFC_MPI) when those values are later used in both guarded and unguarded paths.
    • Detect computations that occur before required configuration is loaded (e.g., num_dims computed before n/p are read).
    • Prefer moving such computations after input parsing, or duplicating minimal initialization in the non-MPI path.
    • When suggesting fixes, show both MPI and non-MPI branches and ensure num_dims is set in --no-mpi builds.
      """
  • Make Improve more thorough for these cases (optional):
    [pr_code_suggestions]
    suggestions_depth = "exhaustive"
    focus_only_on_problems = true

  • Nudge Review to emphasize these checks and surface them prominently:
    [pr_reviewer]
    extra_instructions="""
    Prioritize build/runtime correctness:

    • Verify that critical initialization (reading inputs, defaults) happens in both MPI and non-MPI builds.
    • Identify variables computed before their inputs are loaded.
    • Call out any value set inside #ifdef that’s later used outside the guard.
      """

Tip: If you want high-signal, fewer-but-critical findings, keep suggestions_depth="regular" and set:
[pr_code_suggestions]
suggestions_score_threshold = 6

This configuration will guide Qodo Merge to propose moving num_dims computation after input parsing and ensuring a non-MPI initialization path, directly addressing the concern.

Relevant Sources:

@cursor
Copy link

cursor bot commented Dec 13, 2025

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on January 8.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@codeant-ai
Copy link

codeant-ai bot commented Dec 13, 2025

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻��� Duplicate comments (1)
src/common/m_mpi_common.fpp (1)

994-1000: Fix NaN check to validate the element that was just assigned (offset bug).

You assign into halo-shifted indices (e.g., j + unpack_offset / k + unpack_offset / l + unpack_offset) but the Intel-only ieee_is_nan checks read the unshifted indices, which can (a) miss NaNs in the received data and/or (b) falsely trip on unrelated/uninitialized halo cells. This was also flagged in prior review comments.

#if defined(__INTEL_COMPILER)
-  if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
+  if (ieee_is_nan(q_comm(i)%sf(j + unpack_offset, k, l))) then
     print *, "Error", j, k, l, i
     error stop "NaN(s) in recv"
   end if
#endif
#if defined(__INTEL_COMPILER)
-  if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
+  if (ieee_is_nan(q_comm(i)%sf(j, k + unpack_offset, l))) then
     print *, "Error", j, k, l, i
     error stop "NaN(s) in recv"
   end if
#endif
#if defined(__INTEL_COMPILER)
-  if (ieee_is_nan(q_comm(i)%sf(j, k, l))) then
+  if (ieee_is_nan(q_comm(i)%sf(j, k, l + unpack_offset))) then
     print *, "Error", j, k, l, i
     error stop "NaN(s) in recv"
   end if
#endif

Also applies to: 1049-1055, 1108-1114

🧹 Nitpick comments (1)
src/common/m_mpi_common.fpp (1)

995-999: Avoid error stop in these parallel-loop regions; prefer project abort path (and verify compiler/offload behavior).

Project guidelines prohibit stop/error stop and also discourage such termination paths in GPU/device code. Since these checks sit inside the GPU-parallel wrapper, please verify this block is never emitted in accelerator-offload builds; otherwise, consider switching to a host-safe debug check or an abort routine consistent with the codebase. Based on coding guidelines/learnings, avoid stop/error stop and use the project’s abort mechanism.

#if defined(__INTEL_COMPILER)
   if (ieee_is_nan(q_comm(i)%sf(...))) then
     print *, "Error", j, k, l, i
-    error stop "NaN(s) in recv"
+    call s_mpi_abort('NaN(s) in recv')
   end if
#endif

Also applies to: 1050-1054, 1109-1113

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d5760b and c24ca01.

📒 Files selected for processing (1)
  • src/common/m_mpi_common.fpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{fpp,f90}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{fpp,f90}: Use 2-space indentation; continuation lines align beneath &
Use lower-case keywords and intrinsics (do, end subroutine, etc.)
Name modules with m_ pattern (e.g., m_transport)
Name public subroutines with s_ pattern (e.g., s_compute_flux)
Name public functions with f
_ pattern
Keep subroutine size ≤ 500 lines, helper subroutines ≤ 150 lines, functions ≤ 100 lines, files ≤ 1000 lines
Limit routine arguments to ≤ 6; use derived-type params struct if more are needed
Forbid goto statements (except in legacy code), COMMON blocks, and save globals
Every argument must have explicit intent; use dimension/allocatable/pointer as appropriate
Call s_mpi_abort() for errors, never use stop or error stop

**/*.{fpp,f90}: Indent 2 spaces; continuation lines align under &
Use lower-case keywords and intrinsics (do, end subroutine, etc.)
Name modules with m_<feature> prefix (e.g., m_transport)
Name public subroutines as s_<verb>_<noun> (e.g., s_compute_flux) and functions as f_<verb>_<noun>
Keep private helpers in the module; avoid nested procedures
Enforce size limits: subroutine ≤ 500 lines, helper ≤ 150, function ≤ 100, module/file ≤ 1000
Limit subroutines to ≤ 6 arguments; otherwise pass a derived-type 'params' struct
Avoid goto statements (except unavoidable legacy); avoid global state (COMMON, save)
Every variable must have intent(in|out|inout) specification and appropriate dimension / allocatable / pointer
Use s_mpi_abort(<msg>) for error termination instead of stop
Use !> style documentation for header comments; follow Doxygen Fortran format with !! @param and !! @return tags
Use implicit none statement in all modules
Use private declaration followed by explicit public exports in modules
Use derived types with pointers for encapsulation (e.g., pointer, dimension(:,:,:) => null())
Use pure and elemental attributes for side-effect-free functions; combine them for array ...

Files:

  • src/common/m_mpi_common.fpp
src/**/*.fpp

📄 CodeRabbit inference engine (.cursor/rules/mfc-agent-rules.mdc)

src/**/*.fpp: Use .fpp file extension for Fypp preprocessed files; CMake transpiles them to .f90
Start module files with Fypp include for macros: #:include 'macros.fpp'
Use the fypp ASSERT macro for validating conditions: @:ASSERT(predicate, message)
Use fypp macro @:ALLOCATE(var1, var2) for device-aware allocation instead of standard Fortran allocate
Use fypp macro @:DEALLOCATE(var1, var2) for device-aware deallocation instead of standard Fortran deallocate

Files:

  • src/common/m_mpi_common.fpp
🧠 Learnings (12)
📓 Common learnings
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Wrap tight loops with `$:GPU_PARALLEL_FOR(private='[...]', copy='[...]')` macro; add `collapse=n` for safe nested loop merging
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Wrap tight GPU loops with !$acc parallel loop gang vector default(present) reduction(...); add collapse(n) when safe; declare loop-local variables with private(...)
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Do not use OpenACC or OpenMP directives directly; use Fypp macros from `src/common/include/parallel_macros.fpp` instead
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Declare loop-local variables with `private='[...]'` in GPU parallel loop macros
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Mark GPU-callable helpers with `$:GPU_ROUTINE(function_name='...', parallelism='[seq]')` immediately after declaration
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Ensure GPU code compiles with Cray ftn, NVIDIA nvfortran, GNU gfortran, and Intel ifx/ifort compilers
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to **/*.{fpp,f90} : Favor array operations over explicit loops when possible; use `collapse=N` directive to optimize nested loops
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Avoid stop/error stop inside GPU device code
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Allocate large GPU arrays with managed memory or move them into persistent !$acc enter data regions at start-up
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with GNU `gfortran` and Intel `ifx`/`ifort` for portability
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Wrap tight loops with `$:GPU_PARALLEL_FOR(private='[...]', copy='[...]')` macro; add `collapse=n` for safe nested loop merging

Applied to files:

  • src/common/m_mpi_common.fpp
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Wrap tight GPU loops with !$acc parallel loop gang vector default(present) reduction(...); add collapse(n) when safe; declare loop-local variables with private(...)

Applied to files:

  • src/common/m_mpi_common.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Mark GPU-callable helpers with `$:GPU_ROUTINE(function_name='...', parallelism='[seq]')` immediately after declaration

Applied to files:

  • src/common/m_mpi_common.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Declare loop-local variables with `private='[...]'` in GPU parallel loop macros

Applied to files:

  • src/common/m_mpi_common.fpp
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Ensure GPU code compiles with Cray ftn, NVIDIA nvfortran, GNU gfortran, and Intel ifx/ifort compilers

Applied to files:

  • src/common/m_mpi_common.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Do not use OpenACC or OpenMP directives directly; use Fypp macros from `src/common/include/parallel_macros.fpp` instead

Applied to files:

  • src/common/m_mpi_common.fpp
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Avoid stop/error stop inside GPU device code

Applied to files:

  • src/common/m_mpi_common.fpp
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Allocate large GPU arrays with managed memory or move them into persistent !$acc enter data regions at start-up

Applied to files:

  • src/common/m_mpi_common.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to **/*.{fpp,f90} : Use `s_mpi_abort(<msg>)` for error termination instead of `stop`

Applied to files:

  • src/common/m_mpi_common.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to **/*.{fpp,f90} : Favor array operations over explicit loops when possible; use `collapse=N` directive to optimize nested loops

Applied to files:

  • src/common/m_mpi_common.fpp
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Allocate large arrays with `managed` or move them into a persistent `$:GPU_ENTER_DATA(...)` region at start-up

Applied to files:

  • src/common/m_mpi_common.fpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: Detect File Changes
  • GitHub Check: Code Cleanliness Check
  • GitHub Check: Georgia Tech | Phoenix (gpu-acc)
  • GitHub Check: Oak Ridge | Frontier (cpu)
  • GitHub Check: Georgia Tech | Phoenix (cpu)
  • GitHub Check: Github (macos, mpi, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, false)
  • GitHub Check: Github (macos, mpi, debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, true)
  • GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, false)
  • GitHub Check: Oak Ridge | Frontier (gpu-omp)
  • GitHub Check: Github (ubuntu, mpi, no-debug, true)
  • GitHub Check: Georgia Tech | Phoenix (gpu-omp)
  • GitHub Check: Oak Ridge | Frontier (gpu-acc)
  • GitHub Check: Coverage Test on CodeCov
  • GitHub Check: Build & Publish
🔇 Additional comments (1)
src/common/m_mpi_common.fpp (1)

987-1061: Loop-wrapper migration looks consistent; END markers are correctly paired.

The $:GPU_PARALLEL_LOOP(...)/$:END_GPU_PARALLEL_LOOP() blocks appear properly scoped around the intended nested loops, and the r index math still matches the pack-side layout (including the +1 / +buff_size shifts for negative halo indices).

Also applies to: 1099-1119, 1008-1039, 1063-1096, 1122-1157

@sbryngelson
Copy link
Member

get merging before I delete it again.

-- @danieljvickers

This is the kind of adversarial mentor-mentee relationship @henryleberre appreciates (prefers, arguably). Well, it worked the first time!

@codecov
Copy link

codecov bot commented Dec 13, 2025

Codecov Report

❌ Patch coverage is 16.92308% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.08%. Comparing base (9a3093b) to head (c24ca01).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/common/m_mpi_common.fpp 18.03% 50 Missing ⚠️
src/post_process/m_start_up.fpp 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1087      +/-   ##
==========================================
- Coverage   44.13%   44.08%   -0.06%     
==========================================
  Files          71       71              
  Lines       20279    20332      +53     
  Branches     1981     1981              
==========================================
+ Hits         8951     8963      +12     
- Misses      10195    10236      +41     
  Partials     1133     1133              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@sbryngelson sbryngelson merged commit 8ae42aa into MFlowCode:master Dec 13, 2025
60 of 62 checks passed
DimAdam-01 pushed a commit to DimAdam-01/MFC-Adam that referenced this pull request Dec 13, 2025
…d macros (MFlowCode#1087)

Co-authored-by: Spencer Bryngelson <sbryngelson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 3/5 size:L This PR changes 100-499 lines, ignoring generated files

2 participants