Conversation
…o levelset-refactor
…o levelset-refactor
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
| ! 3D Patch Geometries | ||
| if (p > 0) then | ||
|
|
||
| $:GPU_PARALLEL_LOOP(private='[i,patch_id,patch_geometry]', copy='[gps]', copyin='[patch_ib,Np]') |
There was a problem hiding this comment.
Suggestion: The GPU parallel loop over ghost points in the 3D branch calls the STL model levelset routine, which accesses the global models array, but models is not included in the loop's copyin list, so on GPU offload the kernel will read models from host memory or uninitialized device memory, causing incorrect results or runtime errors; models should be explicitly mapped to the device. [possible bug]
Severity Level: Major ⚠️
- ❌ STL model levelset incorrect on GPU.
- ⚠️ GPU-enabled simulations using STL IB unreliable.
- ⚠️ s_model_levelset (models reads) produces garbage.| $:GPU_PARALLEL_LOOP(private='[i,patch_id,patch_geometry]', copy='[gps]', copyin='[patch_ib,Np]') | |
| $:GPU_PARALLEL_LOOP(private='[i,patch_id,patch_geometry]', copy='[gps]', copyin='[patch_ib,models,Np]') |
Steps of Reproduction ✅
1. Build and run with GPU offload enabled so the $:GPU_PARALLEL_LOOP is executed on device
(the loop is declared in src/simulation/m_compute_levelset.fpp around lines 39-40).
2. Ensure at least one ghost point belongs to an STL model patch (patch_geometry == 12) so
s_apply_levelset invokes s_model_levelset from inside the GPU-parallel loop
(s_model_levelset uses models(...) at lines ~661-666 in the s_model_levelset routine).
3. At runtime the GPU kernel will execute s_model_levelset but the loop's copyin list does
not include the global models array, so the kernel will attempt to read models on device
without an explicit device mapping.
4. Observe either incorrect gp%levelset/gp%levelset_norm values for STL patches or offload
runtime errors depending on the compiler/device mapping behavior; the incorrect values
originate from reads of unmapped/invalid device memory for models.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/simulation/m_compute_levelset.fpp
**Line:** 39:39
**Comment:**
*Possible Bug: The GPU parallel loop over ghost points in the 3D branch calls the STL model levelset routine, which accesses the global `models` array, but `models` is not included in the loop's `copyin` list, so on GPU offload the kernel will read `models` from host memory or uninitialized device memory, causing incorrect results or runtime errors; `models` should be explicitly mapped to the device.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
|
||
| normal_vector = xy_local | ||
| normal_vector(2) = normal_vector(2)*(ellipse_coeffs(1)/ellipse_coeffs(2))**2._wp ! get the normal direction via the coordinate transformation method | ||
| normal_vector = normal_vector/sqrt(dot_product(normal_vector, normal_vector)) ! normalize the vector |
There was a problem hiding this comment.
Suggestion: In the ellipse levelset computation, the normal vector is normalized without checking for zero length; when the ghost point lies exactly at the ellipse centroid, this makes the dot product zero and causes a division by zero, leading to NaNs in both the normal and the levelset value. [division by zero]
Severity Level: Major ⚠️
- ❌ Ellipse IB levelset computation returns NaN.
- ⚠️ s_apply_levelset for ellipse patches unreliable.
- ⚠️ Downstream IB operations consume NaN normals.| normal_vector = normal_vector/sqrt(dot_product(normal_vector, normal_vector)) ! normalize the vector | |
| if (f_approx_equal(dot_product(normal_vector, normal_vector), 0._wp)) then | |
| gp%levelset = 0._wp | |
| gp%levelset_norm = 0._wp | |
| return | |
| end if |
Steps of Reproduction ✅
1. Ensure a ghost point is processed by s_apply_levelset in file
src/simulation/m_compute_levelset.fpp (s_apply_levelset branches call s_ellipse_levelset
for patch_geometry==6; see s_apply_levelset around lines 61-66 in the PR hunk).
2. Configure an ellipse IB patch (patch_ib(...)%geometry == 6) whose centroid equals a
grid cell center so that xy_local computed in s_ellipse_levelset becomes exactly zero
(s_ellipse_levelset computes xy_local at lines ~410-412, then uses it at lines 413-416).
3. Run the simulation/path that triggers s_apply_levelset; execution enters
s_ellipse_levelset (file src/simulation/m_compute_levelset.fpp, around lines 410-426) and
reaches the normalization at lines 413-415.
4. At line 415 the code divides by sqrt(dot_product(normal_vector, normal_vector)) which
is zero, producing a division-by-zero and resulting NaNs in gp%levelset_norm (line 416)
and potentially in subsequent gp%levelset computation (line 424). Observe NaN values in
gp%levelset_norm/gp%levelset for that ghost point.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/simulation/m_compute_levelset.fpp
**Line:** 415:415
**Comment:**
*Division By Zero: In the ellipse levelset computation, the normal vector is normalized without checking for zero length; when the ghost point lies exactly at the ellipse centroid, this makes the dot product zero and causes a division by zero, leading to NaNs in both the normal and the levelset value.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| quadratic_coeffs(3) = (xy_local(1)/ellipse_coeffs(1))**2._wp + (xy_local(2)/ellipse_coeffs(2))**2._wp - 1._wp | ||
|
|
||
| ! compute the levelset with the quadratic equation [ -B + sqrt(B^2 - 4AC) ] / 2A | ||
| gp%levelset = -0.5_wp*(-quadratic_coeffs(2) + sqrt(quadratic_coeffs(2)**2._wp - 4._wp*quadratic_coeffs(1)*quadratic_coeffs(3)))/quadratic_coeffs(1) |
There was a problem hiding this comment.
Suggestion: The quadratic formula used to compute the ellipse levelset has an extra leading minus sign, so instead of computing (-B + sqrt(B^2 - 4AC)) / (2A) as indicated by the comment, it returns the negative of that value, flipping the sign of the distance and potentially inverting the inside/outside convention. [logic error]
Severity Level: Critical 🚨
- ❌ Ellipse levelset sign inverted.
- ⚠️ Ghost-point boundary classification incorrect.
- ⚠️ IB boundary condition application affected.| gp%levelset = -0.5_wp*(-quadratic_coeffs(2) + sqrt(quadratic_coeffs(2)**2._wp - 4._wp*quadratic_coeffs(1)*quadratic_coeffs(3)))/quadratic_coeffs(1) | |
| gp%levelset = 0.5_wp*(-quadratic_coeffs(2) + sqrt(quadratic_coeffs(2)**2._wp - 4._wp*quadratic_coeffs(1)*quadratic_coeffs(3)))/quadratic_coeffs(1) |
Steps of Reproduction ✅
1. Trigger s_apply_levelset in src/simulation/m_compute_levelset.fpp for a 2D ghost point
whose patch_ib geometry is an ellipse (s_apply_levelset calls s_ellipse_levelset for
geometry==6; see the s_apply_levelset call in the 2D branch around lines 61-66).
2. Inside s_ellipse_levelset (file src/simulation/m_compute_levelset.fpp, around lines
410-426) the code computes quadratic_coeffs and then evaluates gp%levelset at line 424
using the quadratic formula.
3. Because the assignment at line 424 includes a leading negative sign, the computed
gp%levelset is the negative of the intended root; run a unit test that places a point
known to be outside the ellipse and verify the sign of gp%levelset is inverted compared to
mathematical expectation.
4. Observe that the levelset sign is flipped for all ellipse ghost points (incorrect
inside/outside convention for ellipses).Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/simulation/m_compute_levelset.fpp
**Line:** 424:424
**Comment:**
*Logic Error: The quadratic formula used to compute the ellipse levelset has an extra leading minus sign, so instead of computing (-B + sqrt(B^2 - 4AC)) / (2A) as indicated by the comment, it returns the negative of that value, flipping the sign of the distance and potentially inverting the inside/outside convention.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
src/simulation/m_ib_patches.fpp
Outdated
| ! update allocatables | ||
| $:GPU_UPDATE(device='[models]') | ||
| $:GPU_UPDATE(device='[models(patch_id)%boundary_v]') | ||
| $:GPU_UPDATE(device='[models(patch_id)%interpolated_boundary_v]') |
There was a problem hiding this comment.
✅ Suggestion: The GPU update macro is invoked on models(patch_id)%interpolated_boundary_v even when no interpolation was performed, so the allocatable component may be unallocated and any internal size/shape query in the macro will trigger a runtime error when interpolate is false. [possible bug]
Severity Level: Major ⚠️
- ❌ STL model initialization may crash on startup.
- ⚠️ Some STL models skip interpolation frequently.
- ⚠️ GPU device update macros interact with allocatables.
- ❌ Loading of 2D/3D STL patches may be aborted.| $:GPU_UPDATE(device='[models(patch_id)%interpolated_boundary_v]') | |
| if (allocated(models(patch_id)%interpolated_boundary_v)) then | |
| $:GPU_UPDATE(device='[models(patch_id)%interpolated_boundary_v]') | |
| end if |
Steps of Reproduction ✅
1. Prepare an input case that includes an STL patch but where interpolation is not
required (patch_ib(...)%model_interpolate → f_check_interpolation_2D/3D sets interpolate =
.false()). This path executes inside s_instantiate_STL_models in
src/simulation/m_ib_patches.fpp (see the s_instantiate_STL_models block around lines
236-247, specifically the update allocatables region at lines 239-243).
2. Run the simulation initialization so s_instantiate_STL_models executes: f_model_read →
f_check_interpolation_* sets interpolate = .false() and the code does NOT allocate
models(patch_id)%interpolated_boundary_v (allocation only happens inside the IF
interpolate branch just above lines 239-243).
3. When execution reaches the unguarded GPU update sequence at lines 239-243, the macro
$:GPU_UPDATE(device='[models(patch_id)%interpolated_boundary_v]') is invoked while
models(patch_id)%interpolated_boundary_v remains unallocated.
4. Depending on the macro expansion and GPU toolchain behavior, the macro may query
allocation/shape or attempt device update and trigger a runtime error (allocation query or
segfault) during initialization. If compilation emits only warnings, no runtime failure
occurs; otherwise a crash/reported runtime error will reproduce the problem.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/simulation/m_ib_patches.fpp
**Line:** 242:242
**Comment:**
*Possible Bug: The GPU update macro is invoked on `models(patch_id)%interpolated_boundary_v` even when no interpolation was performed, so the allocatable component may be unallocated and any internal size/shape query in the macro will trigger a runtime error when `interpolate` is false.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| xyz_local = [x_cc(i) - center(1), y_cc(j) - center(2), 0._wp] | ||
| if (p > 0) then | ||
| point(3) = z_cc(k) | ||
| xyz_local(3) = z_cc(k) - center(3) | ||
| end if | ||
| xyz_local = matmul(inverse_rotation, xyz_local) | ||
|
|
||
| if (grid_geometry == 3) then | ||
| point = f_convert_cyl_to_cart(point) | ||
| xyz_local = f_convert_cyl_to_cart(xyz_local) |
There was a problem hiding this comment.
Suggestion: In the STL IB patch, the code rotates coordinates in the native grid frame and only then applies a cylindrical-to-Cartesian conversion via f_convert_cyl_to_cart, which assumes [x,r,θ] input; this mismatched order and interpretation will misplace the geometry for cylindrical grids and produce incorrect immersed boundary marking. [logic error]
Severity Level: Critical 🚨
- ❌ Incorrect immersed-boundary placement in cylindrical grids.
- ⚠️ s_ib_model incorrectly transforms coordinates.
- ⚠️ s_apply_ib_patches produces wrong ib_markers_sf.
- ❌ Cylindrical STL simulations yield invalid geometry.| xyz_local = [x_cc(i) - center(1), y_cc(j) - center(2), 0._wp] | |
| if (p > 0) then | |
| point(3) = z_cc(k) | |
| xyz_local(3) = z_cc(k) - center(3) | |
| end if | |
| xyz_local = matmul(inverse_rotation, xyz_local) | |
| if (grid_geometry == 3) then | |
| point = f_convert_cyl_to_cart(point) | |
| xyz_local = f_convert_cyl_to_cart(xyz_local) | |
| if (grid_geometry == 3) then | |
| call s_convert_cylindrical_to_cartesian_coord(y_cc(j), z_cc(k)) | |
| xyz_local = [x_cc(i) - center(1), cart_y - center(2), 0._wp] | |
| if (p > 0) then | |
| xyz_local(3) = cart_z - center(3) | |
| end if | |
| else | |
| xyz_local = [x_cc(i) - center(1), y_cc(j) - center(2), 0._wp] | |
| if (p > 0) then | |
| xyz_local(3) = z_cc(k) - center(3) | |
| end if | |
| end if | |
| xyz_local = matmul(inverse_rotation, xyz_local) |
Steps of Reproduction ✅
1. Configure a simulation using cylindrical coordinates (grid_geometry == 3) and include
an STL patch (patch_ib geometry 5 or 12) so s_ib_model is executed. The relevant code
resides in src/simulation/m_ib_patches.fpp inside subroutine s_ib_model around lines
930-941 (xyz_local handling).
2. Start the simulation so s_apply_ib_patches calls s_ib_model for the STL patch
(s_apply_ib_patches dispatches to s_ib_model for geometry 5/12; s_ib_model contains the
xyz_local formation at lines 933-940).
3. In the current code path the cell coordinates are translated and rotated
(matmul(inverse_rotation, xyz_local)) with cylindrical components still in native [x, r,
theta] ordering, and only afterwards f_convert_cyl_to_cart is applied (line 940). This
converts a vector whose components have already been rotated, misplacing the geometry.
4. Observe incorrect IB marking: ib_markers_sf(i,j,k) assignments (later in s_ib_model)
will not match the physical STL location in cylindrical runs — reproduce by comparing
expected STL position vs ib_markers_sf mask output for a simple STL centered at known
radius/angle.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/simulation/m_ib_patches.fpp
**Line:** 933:940
**Comment:**
*Logic Error: In the STL IB patch, the code rotates coordinates in the native grid frame and only then applies a cylindrical-to-Cartesian conversion via `f_convert_cyl_to_cart`, which assumes `[x,r,θ]` input; this mismatched order and interpretation will misplace the geometry for cylindrical grids and produce incorrect immersed boundary marking.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI Incremental review completed. |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/common/m_model.fpp (3)
1060-1114:⚠️ Potential issue | 🟡 MinorPotential out-of-bounds access when
model%ntrs == 0.If
model%ntrsis zero, the loop at line 1081 never executes andtri_idxremains0. Line 1111 then accessesmodel%trs(0)%n(1), which is an out-of-bounds array access. While an empty model is unlikely in practice, this will cause undefined behavior or a crash with bounds-checking enabled.🛡️ Proposed guard
+ if (tri_idx == 0) then + normals = 0._wp + distance = 0._wp + return + end if + normals(1) = model%trs(tri_idx)%n(1) normals(2) = model%trs(tri_idx)%n(2) normals(3) = model%trs(tri_idx)%n(3)
1159-1194:⚠️ Potential issue | 🟡 MinorSame zero-index risk:
idx_bufferstays0whenboundary_edge_count == 0.Analogous to
tri_idxinf_distance_normals_3D, ifboundary_edge_countis zero, the loop never executes andidx_bufferremains0, causingboundary_v(0, 3, 1)at line 1190 — an out-of-bounds access.🛡️ Proposed guard
+ if (idx_buffer == 0) then + normals = 0._wp + return + end if + normals(1) = boundary_v(idx_buffer, 3, 1)
1125-1152:⚠️ Potential issue | 🟠 MajorRemove VLA and simplify to scalar running minimum in GPU
[seq]routine.The automatic array
dist_buffer(1:boundary_edge_count)(line 1136) is a variable-length array whose size depends on a runtime argument. In GPU device code, nvfortran enforces a 512 KiB compile-time stack-frame limit per procedure, and automatic arrays of unknown size easily exceed this or cause runtime stack overflow. Since the array is only used to computeminval, eliminate it entirely and track the minimum with a scalar accumulator in the loop. This pattern is already used in similar functions in this file (e.g.,f_interpolated_distance).Also remove the commented-out old declaration on line 1131.
⚡ Proposed simplification
function f_distance(boundary_v, boundary_edge_count, point) result(distance) $:GPU_ROUTINE(parallelism='[seq]') integer, intent(in) :: boundary_edge_count real(wp), intent(in), dimension(:, :, :) :: boundary_v - ! real(wp), intent(in), dimension(1:boundary_edge_count, 1:3, 1:2) :: boundary_v real(wp), dimension(1:3), intent(in) :: point integer :: i real(wp) :: dist_buffer1, dist_buffer2 - real(wp), dimension(1:boundary_edge_count) :: dist_buffer real(wp) :: distance - distance = 0._wp + distance = huge(1._wp) do i = 1, boundary_edge_count dist_buffer1 = sqrt((point(1) - boundary_v(i, 1, 1))**2 + & & (point(2) - boundary_v(i, 1, 2))**2) dist_buffer2 = sqrt((point(1) - boundary_v(i, 2, 1))**2 + & & (point(2) - boundary_v(i, 2, 2))**2) - dist_buffer(i) = minval((/dist_buffer1, dist_buffer2/)) + distance = min(distance, min(dist_buffer1, dist_buffer2)) end do - distance = minval(dist_buffer) - end function f_distance
🧹 Nitpick comments (2)
src/common/m_model.fpp (2)
490-511: GPU directive is intentionally disabled; the TODO should be tracked.The
! $:GPU_ROUTINEis commented out becauserandom_numberis not callable from GPU device code. The fibonacci-sphere approach in the comment block is a viable deterministic replacement that would also eliminate the non-reproducibility of the ray-casting test.Consider opening an issue to track this so the TODO doesn't go stale, especially since the PR objective explicitly targets GPU-parallelized levelset computation and this function (
f_model_is_inside) remains a CPU-only bottleneck for STL models.Would you like me to open an issue to track replacing
random_numberwith the fibonacci-sphere approach to enable GPU execution off_model_is_inside?
8-26: File exceeds the 1000-line guideline limit (1251 lines).While this is pre-existing, the refactor could be an opportunity to split geometry-specific helpers (e.g., interpolation routines, boundary-check routines) into a separate module. This would also align with the PR's broader goal of consolidating levelset logic into dedicated modules. As per coding guidelines: "Keep … files ≤ 1000."
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/simulation/m_ib_patches.fpp">
<violation number="1" location="src/simulation/m_ib_patches.fpp:242">
P2: Guard the GPU update on interpolated_boundary_v with an allocated() check to avoid touching an unallocated array when interpolation is skipped.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@src/simulation/m_compute_levelset.fpp`:
- Around line 239-240: z_min/z_max are computed in global coordinates (using
center(3)) but later compared to the already-translated local coordinate
xyz_local(3), causing wrong distances and normals; change the code so z bounds
are expressed in the local frame (e.g. z_min = -lz/2 and z_max = +lz/2) or
compute z_min/z_max after subtracting center (so they are relative to the
translated origin) and then use those when computing dist_side and normals for
xyz_local(3). Update the analogous occurrence around line ~288 as well (keep
references: z_min, z_max, center, lz, xyz_local, dist_side).
- Around line 64-66: The 2D GPU_PARALLEL_LOOP uses copy='[gps]' and
patch_ib(1:num_ibs) which is inconsistent with the 3D path; change the 2D loop's
data clauses to use present='[gps]' (since gps is already on-device via
GPU_UPDATE) and make the patch_ib clause consistent with the 3D loop (use the
full patch_ib array form rather than the sliced (1:num_ibs)), leaving
private='[i,patch_id,patch_geometry]' and copyin='[Np,patch_ib]' (or the same
copyin form used in the 3D branch) so the kernel uses present gps and a
consistent patch_ib specifier.
- Around line 39-60: The GPU loops call s_model_levelset which dereferences the
host array-of-structs models (e.g. models(patch_id)%interpolate,
%boundary_edge_count, %total_vertices, %model, %interpolated_boundary_v,
%boundary_v) but models is not included in the GPU data clauses; add models to
device data so the kernel won't fault. Best fix: after
s_instantiate_STL_models() returns, create a persistent GPU data region (using
$:GPU_ENTER_DATA(present='[models]', copyin='[models]') / $:END_GPU_ENTER_DATA()
or the project macro equivalent) to keep models on-device for reuse;
alternatively, add models to the copyin list of the $:GPU_PARALLEL_LOOP(...)
that calls s_model_levelset (and the corresponding 2D loop) so models is present
during the loops. Ensure the same symbol name models is used in the data clause
and that its lifetime covers both loops.
In `@src/simulation/m_ibm.fpp`:
- Around line 72-73: s_finalize_ibm_module must free the allocatable array
models and each of its allocatable components to avoid host/device leaks: in
s_finalize_ibm_module iterate over models (if allocated(models)) and for each
element check and deallocate its allocatable fields model, boundary_v, and
interpolated_boundary_v (use ALLOCATED() guards), then deallocate the models
array itself (again guarding with ALLOCATED()) and handle any device-specific
deallocation if these components were moved to the device; update
s_finalize_ibm_module to perform these checks and DEALLOCATE calls for model,
boundary_v, interpolated_boundary_v, then DEALLOCATE(models).
🧹 Nitpick comments (3)
src/simulation/m_compute_levelset.fpp (2)
557-559: Usewpreal literals in array constructors for type consistency.
gp%levelset_norm = (/1, 0, 0/)creates an integer array that gets implicitly converted. While Fortran handles this correctly for these values, using explicit real literals is more consistent with the rest of the file (e.g., line 292 uses0._wp).The same pattern appears in
s_cylinder_levelset(lines 601–612) with(/1, 0, 0/),(/0, 1, 0/), etc.✏️ Example fix for sphere
- gp%levelset_norm = (/1, 0, 0/) + gp%levelset_norm = (/1._wp, 0._wp, 0._wp/)
310-311: Stale docstring:s_rectangle_levelsetis labeled "Initialize IBM module".The
!> Initialize IBM modulecomment on line 310 is copy-paste from a different subroutine and does not describe what this routine does.✏️ Fix docstring
- !> Initialize IBM module - subroutine s_rectangle_levelset(gp) + !> Computes the levelset distance and normal for a rectangle patch + subroutine s_rectangle_levelset(gp)src/simulation/m_ibm.fpp (1)
97-98: Remove the commented-outGPU_ENTER_DATAline.The commented-out directive on line 97 is dead code. The
GPU_UPDATEon line 98 is the intended replacement—clean up the leftover.🧹 Proposed fix
- ! $:GPU_ENTER_DATA(copyin='[patch_ib(1:num_ibs)]') $:GPU_UPDATE(device='[patch_ib(1:num_ibs)]')
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/common/m_model.fpp (3)
1125-1152:⚠️ Potential issue | 🟠 MajorVLA
dist_bufferinside aGPU_ROUTINE(seq)function may fail on GPU devices.Line 1136 declares
dist_buffer(1:boundary_edge_count)as an automatic (stack) array with a runtime size. GPU device stacks are typically very limited, and some compilers (nvfortran) may reject or silently miscompile VLAs in device code. This can be eliminated by tracking the running minimum directly, which also avoids the extraminvalpass.Also, line 1131 is a leftover commented-out signature — please remove it.
🔧 Proposed fix: eliminate VLA with running minimum
function f_distance(boundary_v, boundary_edge_count, point) result(distance) $:GPU_ROUTINE(parallelism='[seq]') integer, intent(in) :: boundary_edge_count real(wp), intent(in), dimension(:, :, :) :: boundary_v - ! real(wp), intent(in), dimension(1:boundary_edge_count, 1:3, 1:2) :: boundary_v real(wp), dimension(1:3), intent(in) :: point integer :: i real(wp) :: dist_buffer1, dist_buffer2 - real(wp), dimension(1:boundary_edge_count) :: dist_buffer real(wp) :: distance - distance = 0._wp + distance = huge(1._wp) do i = 1, boundary_edge_count dist_buffer1 = sqrt((point(1) - boundary_v(i, 1, 1))**2 + & & (point(2) - boundary_v(i, 1, 2))**2) dist_buffer2 = sqrt((point(1) - boundary_v(i, 2, 1))**2 + & & (point(2) - boundary_v(i, 2, 2))**2) - dist_buffer(i) = minval((/dist_buffer1, dist_buffer2/)) + distance = min(distance, min(dist_buffer1, dist_buffer2)) end do - distance = minval(dist_buffer) - end function f_distance
1159-1194:⚠️ Potential issue | 🟡 MinorPotential out-of-bounds access when
boundary_edge_count == 0.If
boundary_edge_countis 0,idx_bufferstays at 0 (line 1174), and the access at line 1190 (boundary_v(idx_buffer, 3, 1)) is an out-of-bounds read. Since this is aGPU_ROUTINE(seq), it cannot calls_mpi_abort, but an early-return guard would prevent the OOB.🛡️ Proposed defensive guard
dist_buffer = 0._wp dist_min = initial_distance_buffer idx_buffer = 0 + if (boundary_edge_count == 0) then + normals = 0._wp + return + end if + do i = 1, boundary_edge_count
1060-1116:⚠️ Potential issue | 🟡 MinorSame OOB risk:
tri_idx = 0whenmodel%ntrs == 0.Lines 1111–1113 access
model%trs(tri_idx)%n(...)withtri_idxinitialized to 0 (line 1080). If the model has zero triangles, this is an out-of-bounds read. Same pattern asf_normalsabove.
🤖 Fix all issues with AI agents
In `@src/common/m_model.fpp`:
- Line 65: Remove the extra space between the allocate keyword and the opening
parenthesis in the allocation call so it matches the formatter expectation:
change the call to use allocate(model%trs(model%ntrs)) (referencing the allocate
invocation and the model%trs/model%ntrs symbols) — ensure no space exists
between "allocate" and "(".
In `@src/simulation/m_ib_patches.fpp`:
- Around line 216-220: The numeric literal 0.e5_wp is effectively zero and
misleading; in the grid padding assignments for grid_mm (when computing extents
from x_cc/y_cc/z_cc) replace the erroneous 0.e5_wp multiplier with the intended
half-cell padding 0.5_wp (e.g., use
0.5_wp*dx(0)/0.5_wp*dx(m)/0.5_wp*dy(0)/0.5_wp*dy(n)/0.5_wp*dz(0)/0.5_wp*dz(p) in
the grid_mm lines) — if you really intended zero padding instead, replace
0.e5_wp with the explicit 0._wp to make that clear.
🧹 Nitpick comments (1)
src/simulation/m_ib_patches.fpp (1)
487-489: Inconsistent allocation: plainallocateinstead of@:ALLOCATE.The 2D
s_ib_airfoil(lines 333–334) was updated to use@:ALLOCATEforairfoil_grid_uandairfoil_grid_l, but the 3Ds_ib_3D_airfoilstill uses plainallocate. This inconsistency means the 3D path may not properly allocate in GPU-managed memory. As per coding guidelines, use@:ALLOCATEfor device-aware allocation insrc/**/*.fpp.♻️ Proposed fix
if (.not. allocated(airfoil_grid_u)) then - allocate (airfoil_grid_u(1:Np)) - allocate (airfoil_grid_l(1:Np)) + @:ALLOCATE(airfoil_grid_u(1:Np)) + @:ALLOCATE(airfoil_grid_l(1:Np))
src/common/m_model.fpp
Outdated
| model%ntrs = nTriangles | ||
|
|
||
| allocate (model%trs(model%ntrs)) | ||
| allocate(model%trs(model%ntrs)) |
There was a problem hiding this comment.
Pipeline failure: allocate spacing.
The CI formatting check reports that this line has a space between allocate and (. Remove the space to match the expected format.
✏️ Suggested fix
- allocate (model%trs(model%ntrs))
+ allocate(model%trs(model%ntrs))🤖 Prompt for AI Agents
In `@src/common/m_model.fpp` at line 65, Remove the extra space between the
allocate keyword and the opening parenthesis in the allocation call so it
matches the formatter expectation: change the call to use
allocate(model%trs(model%ntrs)) (referencing the allocate invocation and the
model%trs/model%ntrs symbols) — ensure no space exists between "allocate" and
"(".
| grid_mm(1, :) = (/minval(x_cc(0:m)) - 0.e5_wp*dx(0), maxval(x_cc(0:m)) + 0.e5_wp*dx(m)/) | ||
| grid_mm(2, :) = (/minval(y_cc(0:n)) - 0.e5_wp*dy(0), maxval(y_cc(0:n)) + 0.e5_wp*dy(n)/) | ||
|
|
||
| if (p > 0) then | ||
| grid_mm(3, :) = (/minval(z_cc(0:p)) - 0.e5_wp*dz(0), maxval(z_cc(0:p)) + 0.e5_wp*dz(p)/) |
There was a problem hiding this comment.
0.e5_wp evaluates to 0.0 — likely a typo.
0.e5_wp in Fortran is 0.0 × 10^5 = 0.0, so the padding term 0.e5_wp*dx(0) is always zero. If the intent was to add half-cell padding for display, this should be 0.5_wp*dx(0). If zero padding is intentional, use 0._wp for clarity instead of the misleading 0.e5_wp.
🤖 Prompt for AI Agents
In `@src/simulation/m_ib_patches.fpp` around lines 216 - 220, The numeric literal
0.e5_wp is effectively zero and misleading; in the grid padding assignments for
grid_mm (when computing extents from x_cc/y_cc/z_cc) replace the erroneous
0.e5_wp multiplier with the intended half-cell padding 0.5_wp (e.g., use
0.5_wp*dx(0)/0.5_wp*dx(m)/0.5_wp*dy(0)/0.5_wp*dy(n)/0.5_wp*dz(0)/0.5_wp*dz(p) in
the grid_mm lines) — if you really intended zero padding instead, replace
0.e5_wp with the explicit 0._wp to make that clear.
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
|
||
| integer :: i, patch_id, patch_geometry | ||
|
|
||
| $:GPU_UPDATE(device='[gps(1:num_gps)]') |
There was a problem hiding this comment.
Suggestion: The explicit device update of the gps array before the GPU parallel loop is redundant with the copy='[gps]' clause on the GPU loop and, in typical accelerator models (e.g., OpenACC/OpenMP target), issuing an update for data that is not yet present on the device can produce a runtime "variable not present on device" error; removing this update and relying on the copy clause avoids this inconsistency. [possible bug]
Severity Level: Major ⚠️
- ❌ Levelset GPU path may abort at s_apply_levelset (src/...m_compute_levelset.fpp:29).
- ⚠️ GPU-accelerated runs of IB levelset computation unstable.
- ⚠️ Affects simulations using device backends (GPU/accelerator).| $:GPU_UPDATE(device='[gps(1:num_gps)]') | |
| ! gps data is transferred via the GPU_PARALLEL_LOOP copy clause; no explicit device update needed here |
Steps of Reproduction ✅
1. Execute code path that runs subroutine s_apply_levelset defined in
src/simulation/m_compute_levelset.fpp:29 (impure subroutine s_apply_levelset(gps,
num_gps)).
2. At runtime s_apply_levelset reaches the explicit device update on line 36:
"$:GPU_UPDATE(device='[gps(1:num_gps)]')".
3. Immediately after the update the code enters the GPU parallel region beginning at the
GPU_PARALLEL_LOOP on line 41 which contains copy='[gps]'. If the accelerator runtime does
not already have a device allocation for gps, the explicit GPU_UPDATE on line 36 can cause
the accelerator to error with "variable not present on device" (device update invoked
before device allocation).
4. Observable failure: execution abort or runtime accelerator error reported at or before
line 36 when running with GPU/accelerator enabled. If the explicit update is removed, the
gpu-parallel loop's copy clause at line 41 will allocate and transfer gps as needed and
avoid the pre-update error.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/simulation/m_compute_levelset.fpp
**Line:** 36:36
**Comment:**
*Possible Bug: The explicit device update of the `gps` array before the GPU parallel loop is redundant with the `copy='[gps]'` clause on the GPU loop and, in typical accelerator models (e.g., OpenACC/OpenMP target), issuing an update for data that is not yet present on the device can produce a runtime "variable not present on device" error; removing this update and relying on the `copy` clause avoids this inconsistency.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
|
||
| end if | ||
|
|
||
| $:GPU_UPDATE(host='[gps(1:num_gps)]') |
There was a problem hiding this comment.
Suggestion: The explicit host update of the gps array after the GPU parallel loop is redundant with the copy='[gps]' clause on the GPU loop and, if the device copy created by the loop is deallocated at loop exit (as is typical), a subsequent host update for non-present data can raise a runtime error; removing this update ensures consistent device data lifetime. [possible bug]
Severity Level: Major ⚠️
- ❌ Post-levelset GPU-to-host transfer may abort simulation.
- ⚠️ Host-visible gps updates redundant with loop copy clause.
- ⚠️ Affects GPU-enabled runs of s_apply_levelset (src/...m_compute_levelset.fpp:29).| $:GPU_UPDATE(host='[gps(1:num_gps)]') | |
| ! gps data is already copied back to the host by the GPU_PARALLEL_LOOP copy clause; no explicit host update needed here |
Steps of Reproduction ✅
1. Run the same s_apply_levelset subroutine in src/simulation/m_compute_levelset.fpp:29
which executes a GPU_PARALLEL_LOOP (3D branch at ~line 41 or 2D branch at ~line 66) using
copy='[gps]'.
2. The GPU_PARALLEL_LOOP completes (END_GPU_PARALLEL_LOOP around lines 60/85) and typical
accelerator runtimes will deallocate per-loop device copies that were created by the
loop's copy clause.
3. Immediately after the loop the code executes the explicit host update at line 90:
"$:GPU_UPDATE(host='[gps(1:num_gps)]')". If the runtime has already released the device
allocation for gps, the explicit host update attempts to access a non-existent device copy
and may fail with "variable not present on device".
4. Observable failure: runtime error or abort reported at the host-update statement
(src/simulation/m_compute_levelset.fpp:90) on GPU-enabled runs. Relying on the loop's copy
clause avoids this double-management of gps memory.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/simulation/m_compute_levelset.fpp
**Line:** 90:90
**Comment:**
*Possible Bug: The explicit host update of the `gps` array after the GPU parallel loop is redundant with the `copy='[gps]'` clause on the GPU loop and, if the device copy created by the loop is deallocated at loop exit (as is typical), a subsequent host update for non-present data can raise a runtime error; removing this update ensures consistent device data lifetime.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI Incremental review completed. |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/common/m_model.fpp (3)
1060-1116:⚠️ Potential issue | 🟡 MinorPotential out-of-bounds access when
model%ntrs == 0.With the
GPU_ROUTINEdirective now marking this subroutine as device-callable, the pre-existing edge case becomes more concerning: ifmodel%ntrsis 0, the loop on line 1081 is skipped,tri_idxremains 0 (line 1080), and lines 1111–1113 accessmodel%trs(0)%n, which is an out-of-bounds read. On a GPU this is undefined behavior with no clean abort path.🛡️ Proposed guard
+ if (tri_idx == 0) then + distance = 0._wp + normals = 0._wp + return + end if + normals(1) = model%trs(tri_idx)%n(1)
1159-1194:⚠️ Potential issue | 🟡 MinorSame zero-count out-of-bounds risk as
f_distance_normals_3D.If
boundary_edge_countis 0,idx_bufferstays 0 (line 1174) and lines 1190–1192 readboundary_v(0, …). Now that this is a GPU device routine, the undefined access is silent. Consider adding an early-return guard, mirroring the fix suggested forf_distance_normals_3D.
1125-1152:⚠️ Potential issue | 🟠 MajorVariable-length automatic array in GPU device code.
dist_buffer(1:boundary_edge_count)(line 1136) is a stack-allocated VLA inside aGPU_ROUTINE(seq)function. Device threads have very limited stack space; a largeboundary_edge_countwill overflow it. Moreover, some GPU compilers (nvfortran) may reject automatic arrays in device routines outright.The array is only used to collect per-edge minimums and then take
minval. Replace it with a scalar running minimum:♻️ Proposed refactor
function f_distance(boundary_v, boundary_edge_count, point) result(distance) $:GPU_ROUTINE(parallelism='[seq]') integer, intent(in) :: boundary_edge_count real(wp), intent(in), dimension(:, :, :) :: boundary_v real(wp), dimension(1:3), intent(in) :: point integer :: i real(wp) :: dist_buffer1, dist_buffer2 - real(wp), dimension(1:boundary_edge_count) :: dist_buffer real(wp) :: distance - distance = 0._wp + distance = initial_distance_buffer do i = 1, boundary_edge_count dist_buffer1 = sqrt((point(1) - boundary_v(i, 1, 1))**2 + & & (point(2) - boundary_v(i, 1, 2))**2) dist_buffer2 = sqrt((point(1) - boundary_v(i, 2, 1))**2 + & & (point(2) - boundary_v(i, 2, 2))**2) - dist_buffer(i) = minval((/dist_buffer1, dist_buffer2/)) + distance = min(distance, min(dist_buffer1, dist_buffer2)) end do - distance = minval(dist_buffer) - end function f_distanceDoes nvfortran support automatic arrays in OpenACC device routines?
🤖 Fix all issues with AI agents
In `@src/simulation/m_compute_levelset.fpp`:
- Around line 310-311: The Doxygen header above subroutine s_rectangle_levelset
is a copy-paste artifact ("Initialize IBM module") and should be replaced with a
concise description of what s_rectangle_levelset actually does: compute and
initialize a rectangular level-set (or populate level-set fields for a
rectangle), list key inputs/outputs (e.g., gp or grid/state being modified), and
any important assumptions; update the comment block associated with
s_rectangle_levelset accordingly so it accurately documents the rectangle
levelset computation and its parameters.
- Around line 643-648: The Doxygen `@param` tags are out-of-date for subroutine
s_model_levelset(gp): remove or replace the nonexistent parameters (patch_id,
STL_levelset, STL_levelset_norm) and instead document the actual argument gp and
its purpose (e.g., type/structure and role in the routine) in the comment header
above s_model_levelset; alternatively, if those other parameters are meant to be
inputs, update the subroutine signature to include them consistently—ensure the
docblock and the subroutine argument list match and use correct Doxygen `@param`
entries for gp (and any real parameters).
- Around line 558-559: The assignment to gp%levelset_norm uses an integer array
constructor (/1, 0, 0/) which relies on implicit conversion and doesn’t ensure
working precision; replace it with wp-suffixed real literals to match the module
precision (e.g., set gp%levelset_norm to a real(wp) array like (1._wp, 0._wp,
0._wp)) so the values are stored in the correct precision and consistent with
f_approx_equal and other wp-typed variables.
- Around line 708-714: The comments for the interpolate branches are swapped:
update the comment above the interpolate==.true. branch (where gp%levelset is
set via f_interpolated_distance using models(patch_id)%interpolated_boundary_v
and total_vertices) to say it is computing the shortest distance to the
interpolated model boundary, and update the comment above the
interpolate==.false. branch (where gp%levelset is set via f_distance using
models(patch_id)%boundary_v and boundary_edge_count) to say it is computing the
shortest distance to the actual model boundary; leave the code (interpolate,
gp%levelset, f_interpolated_distance, f_distance, xyz_local) unchanged.
- Line 419: Normalize normal_vector safely by guarding against division by zero:
compute the squared norm via dot_product(normal_vector, normal_vector), clamp it
with sgm_eps (e.g. use max(dot_product(normal_vector, normal_vector), sgm_eps)),
take the sqrt and divide by that clamped value instead of dividing by raw
sqrt(dot_product(...)); update the normalization at the end of the
m_compute_levelset routine where normal_vector is normalized to use this clamped
denominator (refer to normal_vector, dot_product and sgm_eps).
🧹 Nitpick comments (4)
src/common/m_model.fpp (3)
490-511: Commented-out GPU directive — consider implementing the fibonacci sphere now.The
GPU_ROUTINEdirective is commented out (line 490) while all other routines in this file (f_distance_normals_3D,f_distance,f_normals,f_interpolated_distance) have active directives. The TODO block (lines 504–511) correctly identifiesrandom_numberas the blocker and proposes a fibonacci sphere replacement.Since this PR's objective is to enable GPU-parallelized levelset computation, leaving
f_model_is_insideas the sole non-GPU-callable routine may limit STL model support on GPU. If the fibonacci sphere approach is validated, implementing it here would make the GPU story complete.
1130-1131: Remove commented-out old declaration.Line 1131 is a leftover from the previous fixed-shape signature. Dead commented-out code adds noise.
1-27: File exceeds 1000-line guideline (1251 lines).This is pre-existing, but with the new
m_compute_levelsetmodule being introduced in this PR, it may be a good time to evaluate whether some of the interpolation routines (f_interpolate_2D,f_interpolate_3D,f_check_interpolation_2D,f_check_interpolation_3D) could be split into a separate module to bring this file under the limit. As per coding guidelines: "Keep … files ≤ 1000 lines."src/simulation/m_compute_levelset.fpp (1)
598-613: Integer array constructors fordist_sides_vecanddist_surface_vec.Same
wpprecision concern as the sphere subroutine —(/1, 0, 0/),(/0, 1, 0/), etc. produce default-real values rather thanwp-precision. These are then used indot_productand arithmetic withreal(wp)variables.Proposed fix (representative — apply to all six occurrences)
- dist_sides_vec = (/1, 0, 0/) - dist_surface_vec = (/0, 1, 1/) + dist_sides_vec = (/1._wp, 0._wp, 0._wp/) + dist_surface_vec = (/0._wp, 1._wp, 1._wp/)As per coding guidelines: "Use
wp(working precision) parameter fromm_precision_selectinstead of hardcoded precision."
| !> Initialize IBM module | ||
| subroutine s_rectangle_levelset(gp) |
There was a problem hiding this comment.
Misleading Doxygen: "Initialize IBM module" on a rectangle levelset subroutine.
Copy-paste artifact. Should describe the rectangle levelset computation.
Proposed fix
- !> Initialize IBM module
+ !> Compute the levelset and normal for a rectangle patch
subroutine s_rectangle_levelset(gp)🤖 Prompt for AI Agents
In `@src/simulation/m_compute_levelset.fpp` around lines 310 - 311, The Doxygen
header above subroutine s_rectangle_levelset is a copy-paste artifact
("Initialize IBM module") and should be replaced with a concise description of
what s_rectangle_levelset actually does: compute and initialize a rectangular
level-set (or populate level-set fields for a rectangle), list key
inputs/outputs (e.g., gp or grid/state being modified), and any important
assumptions; update the comment block associated with s_rectangle_levelset
accordingly so it accurately documents the rectangle levelset computation and
its parameters.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/simulation/m_compute_levelset.fpp`:
- Around line 36-41: The GPU loops use copy='[gps]' while gps is already moved
to the device by GPU_UPDATE(device='[gps(1:num_gps)]'), causing
redundant/conflicting data regions; in the GPU_PARALLEL_LOOP pragmas (the loops
that list private='[i,patch_id,patch_geometry]', copy='[gps]',
copyin='[patch_ib,Np]') replace copy='[gps]' with present='[gps]' (and apply the
same change to the second loop around line 66) so the loops reference the
existing device-resident gps and rely on the GPU_UPDATE(host=...) copy-back
later.
- Around line 118-119: The assignment uses an integer literal 0 for the real(wp)
array element gp%levelset_norm; change the literal to the working precision kind
(0._wp) to avoid implicit conversion and match the fix in
s_sphere_levelset—update the assignment in the block where f_approx_equal(dist,
0._wp) is checked so it reads gp%levelset_norm = 0._wp (ensure _wp from
m_precision_select is used).
🧹 Nitpick comments (3)
src/simulation/m_compute_levelset.fpp (3)
156-197: Multiple integer literals assigned toreal(wp)variables.Lines 175, 181, 195 assign
0(integer) todist_vec(3)which isreal(wp). Use0._wpconsistently.Proposed fix (representative)
- dist_vec(3) = 0 + dist_vec(3) = 0._wpApply at lines 175, 181, and 195.
222-222: Unused variablelength_z.
length_zis declared but never referenced —lzis used instead. Remove it to avoid confusion.Proposed fix
- real(wp) :: length_z
598-613: Integer array constructors assigned toreal(wp)vectors.Lines 601–612 use
(/1, 0, 0/),(/0, 1, 0/), etc. fordist_sides_vecanddist_surface_vecwhich arereal(wp). Use_wpsuffix on all literals for type safety and precision consistency.Proposed fix (representative)
- dist_sides_vec = (/1, 0, 0/) - dist_surface_vec = (/0, 1, 1/) + dist_sides_vec = (/1._wp, 0._wp, 0._wp/) + dist_surface_vec = (/0._wp, 1._wp, 1._wp/)Apply similarly for lines 606–607 and 611–612.
| $:GPU_UPDATE(device='[gps(1:num_gps)]') | ||
|
|
||
| ! 3D Patch Geometries | ||
| if (p > 0) then | ||
|
|
||
| $:GPU_PARALLEL_LOOP(private='[i,patch_id,patch_geometry]', copy='[gps]', copyin='[patch_ib,Np]') |
There was a problem hiding this comment.
copy='[gps]' in both GPU loops conflicts with the GPU_UPDATE(device=...) at line 36.
Line 36 moves gps to the device via GPU_UPDATE. The subsequent copy='[gps]' in both loops (lines 41, 66) creates a separate structured data region that redundantly allocates/copies gps, potentially conflicting with the unstructured data update. Use present='[gps]' instead, since gps is already on the device, and rely on the GPU_UPDATE(host=...) at line 90 for the copy-back.
Proposed fix
- $:GPU_PARALLEL_LOOP(private='[i,patch_id,patch_geometry]', copy='[gps]', copyin='[patch_ib,Np]')
+ $:GPU_PARALLEL_LOOP(private='[i,patch_id,patch_geometry]', present='[gps]', copyin='[patch_ib,Np]')Apply similarly at line 66.
Also applies to: 64-66
🤖 Prompt for AI Agents
In `@src/simulation/m_compute_levelset.fpp` around lines 36 - 41, The GPU loops
use copy='[gps]' while gps is already moved to the device by
GPU_UPDATE(device='[gps(1:num_gps)]'), causing redundant/conflicting data
regions; in the GPU_PARALLEL_LOOP pragmas (the loops that list
private='[i,patch_id,patch_geometry]', copy='[gps]', copyin='[patch_ib,Np]')
replace copy='[gps]' with present='[gps]' (and apply the same change to the
second loop around line 66) so the loops reference the existing device-resident
gps and rely on the GPU_UPDATE(host=...) copy-back later.
| if (f_approx_equal(dist, 0._wp)) then | ||
| gp%levelset_norm = 0 |
There was a problem hiding this comment.
Integer literal 0 assigned to real(wp) array.
Same issue that was fixed for s_sphere_levelset at line 559 — use 0._wp for consistency and to avoid implicit conversion.
Proposed fix
- gp%levelset_norm = 0
+ gp%levelset_norm = 0._wpAs per coding guidelines: "Use wp (working precision) parameter from m_precision_select instead of hardcoded precision like real*8."
🤖 Prompt for AI Agents
In `@src/simulation/m_compute_levelset.fpp` around lines 118 - 119, The assignment
uses an integer literal 0 for the real(wp) array element gp%levelset_norm;
change the literal to the working precision kind (0._wp) to avoid implicit
conversion and match the fix in s_sphere_levelset—update the assignment in the
block where f_approx_equal(dist, 0._wp) is checked so it reads gp%levelset_norm
= 0._wp (ensure _wp from m_precision_select is used).
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/simulation/m_ib_patches.fpp (1)
64-88:⚠️ Potential issue | 🟠 MajorRemove the
optionalattribute fromib_markers_sfparameter.Line 66 declares
ib_markers_sfasoptional, but it is used unconditionally throughout the subroutine and passed directly to geometry-specific subroutines (s_ib_sphere,s_ib_cuboid,s_ib_cylinder,s_ib_3D_airfoil,s_ib_model,s_ib_ellipse,s_ib_circle,s_ib_rectangle,s_ib_airfoil) that declare it as non-optionalintent(inout). Both call sites inm_ibm.fpp(lines 111 and 1002) always provide this argument. Remove theoptionalattribute to match actual usage and prevent potential undefined behavior.
🤖 Fix all issues with AI agents
In `@src/simulation/m_ib_patches.fpp`:
- Line 143: Replace the raw Fortran allocate call with the Fypp device-aware
macro: in the allocation of models(patch_id)%model (referencing the models
array, patch_id index and model component), remove the plain "allocate" and use
the Fypp macro invocation @:ALLOCATE(models(patch_id)%model) so the allocation
becomes device-aware and satisfies the pipeline formatting rule.
- Around line 58-60: The allocation for GPU-managed array models (declared with
GPU_DECLARE) must use the fypp device-aware macro rather than plain Fortran
allocate: replace the plain allocate of models(patch_id)%model with
@:ALLOCATE(models(patch_id)%model) so the memory is allocated on the device
(symbols: models, GPU_DECLARE, models(patch_id)%model, @:ALLOCATE). Also fix the
scalar concatenation bug by replacing constructs that form (/dx, dy, dz/) with
scalar minima using minval(dx), minval(dy), minval(dz) (addresses the
concatenation at the spots handling dx/dy/dz) and correct the typo 0.e5_wp to
0.e-5_wp where tolerance is intended (fix occurrences around the 215–219
region).
🧹 Nitpick comments (1)
src/simulation/m_ib_patches.fpp (1)
928-958: Serial triple-nested loop over the entire grid is a significant performance bottleneck.The TODO at line 928 explains that
f_model_is_insideusesrandom_number()which is non-GPU-compatible. Until resolved, this loop runs serially over all(m+1)×(n+1)×(p+1)cells, which will be very slow on large grids.Additionally, the loop order (
ioutermost,kinnermost at line 930–932) is suboptimal for Fortran's column-major layout withib_markers_sf(i, j, k). Consider swapping tok-j-iordering (k outermost, i innermost) to improve cache locality while this remains CPU-only.Suggested loop reorder for better cache performance
- do i = 0, m - do j = 0, n - do k = 0, p + do k = 0, p + do j = 0, n + do i = 0, m
| type(t_model_array), allocatable, target :: models(:) | ||
| $:GPU_DECLARE(create='[models]') | ||
| !! array of STL models that can be allocated and then used in IB marker and levelset compute |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -nP 'allocate\s*\(\s*models\s*\(' --type fortran -A2 -B2Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
rg -nP 's_instantiate_STL_models|@:ALLOCATE.*models' --type fortran -B3 -A3Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
rg -nP '\bmodels\s*=' --type fortran -B2 -A2 | head -100Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
cd src/simulation && sed -n '118,250p' m_ib_patches.fppRepository: MFlowCode/MFC
Length of output: 6421
🏁 Script executed:
rg -nP 'call\s+s_instantiate_STL_models|allocate\s*\(\s*models\s*\(' --type fortran -B3 -A3Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
rg -nP '@:ALLOCATE|allocate\s*\(\s*\w+\s*\(' src/simulation/ --type fortran -B1 -A1 | head -80Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
fd -e fpp -x grep -l '@:ALLOCATE\|allocate' {} \; src/simulation/ | head -5 | xargs -I{} sh -c 'echo "=== {} ===" && grep -n "@:ALLOCATE\|allocate" {} | head -10'Repository: MFlowCode/MFC
Length of output: 422
🏁 Script executed:
fd -e fpp . src/simulation/ | head -5 | xargs -I{} sh -c 'echo "=== {} ===" && grep -n "@:ALLOCATE" {} | head -5'Repository: MFlowCode/MFC
Length of output: 1335
🏁 Script executed:
sed -n '210,225p' src/simulation/m_ib_patches.fppRepository: MFlowCode/MFC
Length of output: 1064
🏁 Script executed:
sed -n '180,210p' src/simulation/m_ib_patches.fppRepository: MFlowCode/MFC
Length of output: 1499
Use @:ALLOCATE macro for device-aware allocation at line 143.
Line 143 allocates models(patch_id)%model using standard Fortran allocate(), but models is GPU-managed (declared with GPU_DECLARE at line 59). Per coding guidelines, use the fypp @:ALLOCATE() macro for all device-aware memory: @:ALLOCATE(models(patch_id)%model). This also resolves the pipeline formatting check failure.
Additionally, the prior review flagged two issues still present in the code:
- Lines 183, 200, 202:
(/dx, dy, dz/)concatenates allocatable arrays instead of scalars. Should useminval()(partially done at line 185 for 2D). - Lines 215–219:
0.e5_wpappears to be a typo for0.e-5_wp(current value is 0.0).
🤖 Prompt for AI Agents
In `@src/simulation/m_ib_patches.fpp` around lines 58 - 60, The allocation for
GPU-managed array models (declared with GPU_DECLARE) must use the fypp
device-aware macro rather than plain Fortran allocate: replace the plain
allocate of models(patch_id)%model with @:ALLOCATE(models(patch_id)%model) so
the memory is allocated on the device (symbols: models, GPU_DECLARE,
models(patch_id)%model, @:ALLOCATE). Also fix the scalar concatenation bug by
replacing constructs that form (/dx, dy, dz/) with scalar minima using
minval(dx), minval(dy), minval(dz) (addresses the concatenation at the spots
handling dx/dy/dz) and correct the typo 0.e5_wp to 0.e-5_wp where tolerance is
intended (fix occurrences around the 215–219 region).
src/simulation/m_ib_patches.fpp
Outdated
|
|
||
| do patch_id = 1, num_ibs | ||
| if (patch_ib(patch_id)%geometry == 5 .or. patch_ib(patch_id)%geometry == 12) then | ||
| allocate(models(patch_id)%model) |
There was a problem hiding this comment.
Use @:ALLOCATE macro for device-aware allocation.
Per coding guidelines, use the Fypp @:ALLOCATE macro instead of raw allocate. This also addresses the pipeline formatting failure flagged at this location.
Proposed fix
- allocate(models(patch_id)%model)
+ @:ALLOCATE(models(patch_id)%model)As per coding guidelines: "Use fypp macro @:ALLOCATE(var1, var2) for device-aware allocation instead of standard Fortran allocate"
🤖 Prompt for AI Agents
In `@src/simulation/m_ib_patches.fpp` at line 143, Replace the raw Fortran
allocate call with the Fypp device-aware macro: in the allocation of
models(patch_id)%model (referencing the models array, patch_id index and model
component), remove the plain "allocate" and use the Fypp macro invocation
@:ALLOCATE(models(patch_id)%model) so the allocation becomes device-aware and
satisfies the pipeline formatting rule.
…han on frontier. We should look into that eventually
|
@sbryngelson I ran the Phoenix ACC test suite manually and got it to pass, but every time I submit on the runnners I get this weird device error that I have never seen before. printed over and over before it just gets killed: Any idea what this could be? |
User description
User description
Description
This PR is a significant refactor of the levelset code. It contains
Fixes #1011
Type of change
Scope
If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.
How Has This Been Tested?
This currently passes all tests on MacOS and Ubuntu operating systms using the GNU compiler.
Checklist
docs/)examples/that demonstrate my new feature performing as expected.They run to completion and demonstrate "interesting physics"
./mfc.sh formatbefore committing my codeIf your code changes any code source files (anything in
src/simulation)To make sure the code is performing as expected on GPU devices, I have:
nvtxranges so that they can be identified in profiles./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.PR Type
Enhancement, Refactor
Description
Removed levelset computation from preprocessing: IB markers and levelset values are no longer computed during pre-processing; all computations now occur during simulation runtime
Eliminated global levelset arrays: Replaced global
levelsetandlevelset_normarrays with local storage in theghost_pointderived type, reducing memory allocation by orders of magnitudeParallelized levelset computation over ghost points: Refactored levelset code to parallelize over ghost points instead of grid cells, improving memory efficiency and GPU performance
GPU-accelerated STL model processing: Implemented GPU compute for STL IB markers and levelset values by loading STL models into global memory with separate parallelized loops
New
m_compute_levelsetmodule: Created dedicated module for all levelset computations with geometry-specific functions (circles, rectangles, ellipses, spheres, cylinders, airfoils, STL models)Added STL model configuration parameters: Extended IB patch parameters to support
model_filepath,model_spc,model_threshold, and transformation parameters (model_translate,model_rotate)Simplified data I/O: Removed levelset and levelset norm from preprocessing output; added dedicated IB data output subroutines for simulation runtime
Updated test infrastructure: Adjusted grid resolution for circle test cases and updated golden test files to reflect removal of pre-computed IB markers
Diagram Walkthrough
File Walkthrough
8 files
m_ib_patches.fpp
Refactor levelset computation from preprocessing to simulation runtimesrc/simulation/m_ib_patches.fpp
m_compute_levelsetmodule dependency and levelset computationcalls from
s_apply_ib_patchess_instantiate_STL_models()subroutine to load and preprocessSTL models during initialization
s_ib_model()to use pre-instantiated models from globalmodelsarray instead of computing levelsets inlineetavariable declarations and replaced with local scopeusage in geometry-specific functions
offsetvariable to airfoil subroutines for cleaner centroidoffset handling
m_data_output.fpp
Remove levelset data output from preprocessingsrc/pre_process/m_data_output.fpp
ib_markers,levelset, andlevelset_normparameters from dataoutput subroutine signatures
levelset norm data
s_write_abstract_data_files()m_ibm.fpp
Refactor IBM module to use ghost-point-based levelsetssrc/simulation/m_ibm.fpp
levelsetandlevelset_normarrays and replaced withghost-point-local storage
s_instantiate_STL_models()during IBM setups_apply_ib_patches()calls to remove levelset parameterss_compute_image_points()to use levelset values fromghost_pointstructure instead of global arrayss_update_mib()to call news_apply_levelset()instead ofcomputing levelsets inline
patch_id_fparray allocationm_start_up.fpp
Remove levelset restart file I/O from simulation startupsrc/simulation/m_start_up.fpp
from restart files
s_write_ib_data_file()after IBM setup duringinitialization
s_read_ic_data_files()call signature by removing IB markerparameter
m_start_up.fpp
Remove levelset handling from preprocessing startupsrc/pre_process/m_start_up.fpp
m_ib_patchesmodule importib_markers,levelset, andlevelset_normvariable declarationsand allocations
s_read_ic_data_files()interface by removing IB markerparameter
m_mpi_common.fpp
Simplify MPI data initialization for IBMsrc/common/m_mpi_common.fpp
levelsetandlevelset_normparameters froms_initialize_mpi_data()subroutinestructures
m_initial_condition.fpp
Remove IBM-related variables from preprocessingsrc/pre_process/m_initial_condition.fpp
m_ib_patchesmodule importib_markers,levelset, andlevelset_normvariable declarationss_apply_ib_patches()call from preprocessingm_time_steppers.fpp
Remove levelset parameters from immersed boundary updatesrc/simulation/m_time_steppers.fpp
s_update_mibsubroutine call by removinglevelsetandlevelset_normparameterssimulation rather than passed as arguments
5 files
m_compute_levelset.fpp
New module for ghost-point-based levelset computationsrc/simulation/m_compute_levelset.fpp
boundaries
s_apply_levelset()subroutine that computes levelsets forghost points instead of grid cells
ellipses, spheres, cylinders, airfoils, and STL models
ghost_pointderived type instead of global arrays
efficiency
m_data_output.fpp
Add dedicated IB data output subroutinessrc/simulation/m_data_output.fpp
s_write_serial_ib_data()ands_write_parallel_ib_data()for IB data outputs_write_ib_data_file()wrapper to handle both serial andparallel I/O
routines
parameters
m_global_parameters.fpp
Add default initialization for IB patch parameterssrc/simulation/m_global_parameters.fpp
patch_ib()array with default values forall IB patch parameters
translate, rotate)
m_derived_types.fpp
Add derived types for model storage and ghost point levelsetssrc/common/m_derived_types.fpp
t_model_arrayderived type to store STL model data withboundary vertices and interpolation information
ghost_pointderived type withlevelsetandlevelset_normfields for local storage
point structure
case_dicts.py
Add STL model and transformation parameters to IB configurationtoolchain/mfc/run/case_dicts.py
model_filepath,model_spc, andmodel_thresholdmodel_translateandmodel_rotatefor each spatial directionmodel_scaleparameter for potential future use7 files
cases.py
Adjust grid resolution for circle test casestoolchain/mfc/test/cases.py
n=49grid resolution for improvedprecision
circular geometries
golden.txt
Remove IB markers from golden test outputtests/7FA04E95/golden.txt
ib_markersoutput line from golden test dataand stored globally
golden.txt
Remove IB markers output from test golden filetests/5600D63B/golden.txt
D/ib_markers.00.datwith all zerovalues
D/cons.5.00.000050.datline with updated valuesgolden-metadata.txt
Update test metadata with new build environment detailstests/7F70E665/golden-metadata.txt
OpenMP : OFFconfiguration linegolden-metadata.txt
Update test metadata with new build environment detailstests/F60D6594/golden-metadata.txt
OpenMP : OFFconfiguration linegolden-metadata.txt
Update test metadata with new build environment detailstests/4F5A5E32/golden-metadata.txt
OpenMP : OFFconfiguration linegolden-metadata.txt
Update test metadata with new build environment detailstests/8D8F6424/golden-metadata.txt
OpenMP : OFFconfiguration line3 files
golden-metadata.txt
Update golden test metadata and build environmenttests/B0CE19C5/golden-metadata.txt
hash
paths, system information)
golden-metadata.txt
Update golden test metadata and build environmenttests/7DCE34B4/golden-metadata.txt
hash
paths, system information)
golden-metadata.txt
Update golden test metadata and build environmenttests/6171E9D4/golden-metadata.txt
hash
paths, system information)
specifications
63 files
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests
CodeAnt-AI Description
Refactor levelset and STL immersed-boundary handling to compute values per ghost point
What Changed
Impact
✅ Lower memory during simulation✅ Support for rotated STL immersed boundaries✅ IB marker files available at startup and restart💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.