SIGSEGV Fault in parallel, but not serial

Numerical methods and mathematical models of Elmer
Post Reply
ryan_rjlg
Posts: 4
Joined: 19 Feb 2024, 21:43
Antispam: Yes

SIGSEGV Fault in parallel, but not serial

Post by ryan_rjlg »

Hello,

I have been working on a simple electromagnetic model using CoilSolver and MagnetoDynamics. The model solves well serially, but always has a SIGSEGV fault when I execute it in parallel. I don't think it is an issue with my installation as I have solved thermal models in parallel. It also occurs on both a Windows and Ubuntu install.

I have tried many different meshes, and variations on the geometry without success. I have also tried partitioning with Zoltan on Ubuntu and received a similar error.

I have attached my .sif here and a much coarser version of the mesh I'm using (for file size limit). With this coarse mesh, it solves well in serial, but not in parallel. Just as with much finer meshes. The mesh is partitioned for 4 cores.

I would really appreciate any guidance. Thank you for all the work on this software--I've really enjoyed working with it so far.

-Ryan
Attachments
my_case_u100_transient_50Hz.sif
(5.71 KiB) Downloaded 27 times
Mesh_1_coarser.zip
(917.48 KiB) Downloaded 41 times
kevinarden
Posts: 2418
Joined: 25 Jan 2019, 01:28
Antispam: Yes

Re: SIGSEGV Fault in parallel, but not serial

Post by kevinarden »

Almost always due to the partition strategy, in this case I expect
Coil Closed = Logical True
means that the whole coil has to be in 1 partition not cut over multiple partitions.
Same is true for some BCs such as mortar conditions.

Looks like some of the parallel processes are failing because some of them do not have access to the whole coil.
ryan_rjlg
Posts: 4
Joined: 19 Feb 2024, 21:43
Antispam: Yes

Re: SIGSEGV Fault in parallel, but not serial

Post by ryan_rjlg »

Thank you. I used ElmerGrid to partition the mesh such that the coil is always entirely in one partition with the same result. I also reordered the partitions such that the coil is always in the first partition with the same result. Picture of two partitions attached.

Any other advice would be appreciated, but I also understand if this is just a fringe case.

Thanks.

-Ryan
Attachments
Mesh1_partcell-002.png
(265.54 KiB) Not downloaded yet
kevinarden
Posts: 2418
Joined: 25 Jan 2019, 01:28
Antispam: Yes

Re: SIGSEGV Fault in parallel, but not serial

Post by kevinarden »

It is a difficult problem to diagnose. Not every problem can be parallel using mesh partition alone. The Elmer strategy appears to be to partition the mesh and each process acts on each mesh partition. Some codes parallel the solution equations, but each process has full access to the mesh. It may be that the coil solver needs the entire coil, and the entire infinity boundary condition for every process. This would mean it is not a good candidate for parallel using mesh partitions.
raback
Site Admin
Posts: 4851
Joined: 22 Aug 2009, 11:57
Antispam: Yes
Location: Espoo, Finland
Contact:

Re: SIGSEGV Fault in parallel, but not serial

Post by raback »

Hi

I changed the MPI communicator and the solver may now function better. Prior to this it did work in parallel but only if the Coil was active in all partitions.

The mortar stuff would be much more difficult to implement fully parallel. The issue there is that all search algorithms should be parallel, and the code should also be able to create matrix entitities add'hoc such that for entry "ij" the dof "i" and "j" could lie in any partition. So there we have to do special partitioning as Kevin pointed out.

-Peter
ryan_rjlg
Posts: 4
Joined: 19 Feb 2024, 21:43
Antispam: Yes

Re: SIGSEGV Fault in parallel, but not serial

Post by ryan_rjlg »

Hi Kevin and Peter,

Thank you both very much for the help.

I modified my geometry so that the center of the coil is the origin. Then it is trivial to get 4 or 8 partitions with the coil in all partitions with -partcell 220 or -partcell 222. Both do solve with a 2x and 3x respective improvement in solution time for 3 timesteps.

I also installed the nightly build for today and was able to solve a model partitioned as -metis 8 -partdual. Thank you for that significant improvement.

Thanks again. I've really enjoyed using Elmer.

-Ryan
raback
Site Admin
Posts: 4851
Joined: 22 Aug 2009, 11:57
Antispam: Yes
Location: Espoo, Finland
Contact:

Re: SIGSEGV Fault in parallel, but not serial

Post by raback »

Hi Ryan,

You jumped quickly to the parallel simulations of Elmer! Maybe you can show some nice color pictures. Do you have a reference for this case? It would be nice addition in "elmer-elmag" repo!

Your setting's might not be fully optimal. For example, I think you didn't provide any BC's for A. The linear solvers very much like those. In your mesh the BC's were all grouped to one so it is difficult to pick from there. However, you could try the following:

Code: Select all

Boundary Condition 1
  Name = "ExtBC"
  Default External BC = Logical True
  A {e} = Real 0.0
End 
The idea of the "Default..." is that you can more easily pick the correct boundaries of those that were not set in the "Target Boundaries" list. See test case "DefaultIntExtBCs".

Also you could try to fix input current density already in the CoilSolver. May work equally well with less effort.

-Peter
ryan_rjlg
Posts: 4
Joined: 19 Feb 2024, 21:43
Antispam: Yes

Re: SIGSEGV Fault in parallel, but not serial

Post by ryan_rjlg »

Hi Peter,

Thank you for the suggestions!

My goal with this toy model is to make sure I understand the modeling procedure and can correctly simulate saturation behavior in soft magnetic materials. I have a few more details to check, but this seems to be working well. When I have everything sorted out I would be happy to provide it as an example case if it would be useful.

If the modeling of my commercial case is promising over the next month or two I will be transitioning into real experiments. If I get to that phase I would be happy to contribute a reference case for an AC coil on a closed ferrite torroid. The experimental part should be relatively straightforward.

Thanks again.

-Ryan
Post Reply