Page 1 of 2

ElmerSolver_mpi stuck in a loop

Posted: 19 Mar 2021, 06:04
by spacedout
Good morning

I think the code is running in an infinite loop when I execute

mpirun -np 2 ElmerSolver_mpi case.sif

where case.sif contains

Solver 1
Equation = "potential"
Variable = -global Whatever

Exported Variable 1 = -global setflag

Exec Solver = Always
Procedure = "volt" "voltage"
End

Solver 2
Exec Condition = Equals setflag
Equation = "results"
Procedure = "ResultOutputSolve" "ResultOutputSolver"
Output File Name = "parav"
Vtu Format = Logical True
Single Precision = Logical True ! double precision is the default
Scalar Field 1 = String Potential
Vector Field 1 = String Velocity
End

and where volt.F90 contains

SUBROUTINE voltage( Model,Solver,dt,TransientSimulation )

..........
IF( ParEnv % MyPe /= 0 )RETURN

setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )

...........

setfgVar % Values(1) = 1.0

...........
END SUBROUTINE voltage


This does not happen with
mpirun -np 1 ElmerSolver_mpi case.sif
or more simply
ElmerSolver case.sif

I am not sure how to go about debugging file ResultOutputSolve.F90 with gdb or any debugger for that matter.

All comments appreciated
Marc

Re: ElmerSolver_mpi stuck in a loop

Posted: 19 Mar 2021, 12:36
by kevinarden
Did you first partition the mesh to 2 partitions and it partitioned without errors. Then did you update the sif to point to the new partitioned mesh?

Re: ElmerSolver_mpi stuck in a loop

Posted: 19 Mar 2021, 13:58
by raback
Hi

When you return from some MPI process do you consider that the others may want to sync and are still waiting...

-Peter

Re: ElmerSolver_mpi stuck in a loop

Posted: 19 Mar 2021, 22:19
by spacedout
I did

ElmerGrid 2 2 meshdirname -partdual -metiskway 2

and output shows no errors and claims it was successful
I can see the 2 partitions under subfolder partitioning.2 of folder meshdirname

If case .sif contains something like

Header
Mesh DB "." "meshdirname/partitioning.2"
End

the program aborts immediately and you observe a warning about a non-existent partition 2


Therefore I stick with using

Header
Mesh DB "." "meshdirname"
End


Now in more detail, my file volt.F90 also contains fields variables:

SUBROUTINE voltage( Model,Solver,dt,TransientSimulation )

..........

IF( ParEnv % MyPe /= 0 )RETURN

setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )

...........

setfgVar % Values(1) = 1.0

...........


DO i=1,Model % NumberOfNodes

j = fehdPerm(i)
IF( j == 0 ) CYCLE

DO k=1,DIM

aveFEHD(DIM*(j-1)+k) = 0.0

END DO

END DO

..........
END SUBROUTINE voltage

I presume

mpirun -np 2 ElmerSolver_mpi case.sif

will use subfolder partitioning.2 to find the mesh and that only one processor (ParEnv % MyPe = 0 ) knows what to do with the above loop over the entire mesh. So all field variables are taken care of by one processor and the other processor does not need to do anything at all.

You can of course correct me if I am wrong in my assumptions

Re: ElmerSolver_mpi stuck in a loop

Posted: 19 Mar 2021, 22:58
by raback
Hi

In MPI all processes typically carry out their own task on their own piece of data. Communication must be done explicitely using MPI commands. Partition 0 roughly owns half of the mesh and partition 1 the rest.

Maybe you could add

Code: Select all

Max Output Level = 20
Max Output Partition = 2
to get more data from both processes to see where the code freezes.

-Peter

Re: ElmerSolver_mpi stuck in a loop

Posted: 19 Mar 2021, 23:12
by kevinarden
It has been awhile since I coded and compiled mpi programs, but I remember having to use an mpi compiler or setting flags to get mpi to work for a program. I have not tested a user subroutine using elmerf90 with mpi. Perhaps elmerf90 handles it, or the technology has moved on from my previous experience.

I tried the same with one of my user subroutines compiled with elmerf90 and it worked fine with no issues, So the above does not appear to be the issue.

Re: ElmerSolver_mpi stuck in a loop

Posted: 19 Mar 2021, 23:19
by kevinarden
If you want to share the mesh, subroutine, and sif file, I can do an independent check.

Re: ElmerSolver_mpi stuck in a loop

Posted: 20 Mar 2021, 05:37
by spacedout
For case.sif, I added

Max Output Level = 20
Max Output Partition = 2

in the Simulation section

and with its

Solver 1
Equation = "potential"
Variable = -global Whatever

Exported Variable 1 = -global setflag

Exec Solver = Always
Procedure = "volt" "voltage"
End


Solver 2
Exec Condition = Equals setflag
Equation = "result vtu"
Procedure = "ResultOutputSolve" "ResultOutputSolver"
Output File Name = "parav"
Vtu Format = Logical True
Single Precision = Logical True ! double precision is the default
Scalar Field 1 = String Potential
Vector Field 1 = String Velocity

..........

End

and my volt.F90 now simply
setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )
setfgVar % Values(1) = 1.0

IF( ParEnv % MyPe /= 0 )RETURN

setfgVar % Values(1) = -1.0

RETURN

the results are

UpdateDependentObjects: Part1: Updating objects depending on primary field in steady state
DerivateExportedVariables: Part1: Derivating variables, if any!
UpdateDependentObjects: Part0: Updating objects depending on primary field in steady state
DerivateExportedVariables: Part0: Derivating variables, if any!

---- now program is frozen

whereas

setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )
setfgVar % Values(1) = -1.0

IF( ParEnv % MyPe /= 0 )RETURN

setfgVar % Values(1) = 1.0

RETURN

the results are

UpdateDependentObjects: Part1: Updating objects depending on primary field in steady state
DerivateExportedVariables: Part1: Derivating variables, if any!
UpdateDependentObjects: Part0: Updating objects depending on primary field in steady state
DerivateExportedVariables: Part0: Derivating variables, if any!
SetActiveElementsTable: Part0: Creating active element table for: result vtu
SetActiveElementsTable: Part0: Number of active elements found : 8029

---- now program is frozen

However if volt.F90 is reduced to
setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )
setfgVar % Values(1) = 1.0

RETURN
then the program does not freeze.

Quite fantastic! you would think setflag is a global variable but it is as if there is a separate setflag variable for each processor.

Also I am not sure how easy it is to change volt.F90 to incorporate MPI communications of the sort I saw inside MainUtils.F90

Have a nice weekend

Re: ElmerSolver_mpi stuck in a loop

Posted: 21 Mar 2021, 19:22
by spacedout
Good day kevinarden

I have attached a barebones mesh and program. Download all 3 files in the same folder and execute

ElmerGrid 1 2 rect.grd

ElmerGrid 2 2 rect -partdual -metiskway 2

elmerf90 volt.F90 -o volt.so

mpirun -np 2 ElmerSolver_mpi case.sif

within that folder.

The program will freeze almost right away. And of course if you comment out line

IF( ParEnv % MyPe /= 0 )RETURN

in volt.F90, the program runs normally.

These two lines in the simulation section of case.sif, as suggested by Peter, are useful in debugging.

Max Output Level = 20
Max Output Partition = 2

Have a nice end of weekend
Marc

Re: ElmerSolver_mpi stuck in a loop

Posted: 21 Mar 2021, 19:52
by kevinarden
It happens on my system exactly as you describe.