Good morning
I think the code is running in an infinite loop when I execute
mpirun -np 2 ElmerSolver_mpi case.sif
where case.sif contains
Solver 1
Equation = "potential"
Variable = -global Whatever
Exported Variable 1 = -global setflag
Exec Solver = Always
Procedure = "volt" "voltage"
End
Solver 2
Exec Condition = Equals setflag
Equation = "results"
Procedure = "ResultOutputSolve" "ResultOutputSolver"
Output File Name = "parav"
Vtu Format = Logical True
Single Precision = Logical True ! double precision is the default
Scalar Field 1 = String Potential
Vector Field 1 = String Velocity
End
and where volt.F90 contains
SUBROUTINE voltage( Model,Solver,dt,TransientSimulation )
..........
IF( ParEnv % MyPe /= 0 )RETURN
setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )
...........
setfgVar % Values(1) = 1.0
...........
END SUBROUTINE voltage
This does not happen with
mpirun -np 1 ElmerSolver_mpi case.sif
or more simply
ElmerSolver case.sif
I am not sure how to go about debugging file ResultOutputSolve.F90 with gdb or any debugger for that matter.
All comments appreciated
Marc
ElmerSolver_mpi stuck in a loop
-
- Posts: 2237
- Joined: 25 Jan 2019, 01:28
- Antispam: Yes
Re: ElmerSolver_mpi stuck in a loop
Did you first partition the mesh to 2 partitions and it partitioned without errors. Then did you update the sif to point to the new partitioned mesh?
-
- Site Admin
- Posts: 4812
- Joined: 22 Aug 2009, 11:57
- Antispam: Yes
- Location: Espoo, Finland
- Contact:
Re: ElmerSolver_mpi stuck in a loop
Hi
When you return from some MPI process do you consider that the others may want to sync and are still waiting...
-Peter
When you return from some MPI process do you consider that the others may want to sync and are still waiting...
-Peter
Re: ElmerSolver_mpi stuck in a loop
I did
ElmerGrid 2 2 meshdirname -partdual -metiskway 2
and output shows no errors and claims it was successful
I can see the 2 partitions under subfolder partitioning.2 of folder meshdirname
If case .sif contains something like
Header
Mesh DB "." "meshdirname/partitioning.2"
End
the program aborts immediately and you observe a warning about a non-existent partition 2
Therefore I stick with using
Header
Mesh DB "." "meshdirname"
End
Now in more detail, my file volt.F90 also contains fields variables:
SUBROUTINE voltage( Model,Solver,dt,TransientSimulation )
..........
IF( ParEnv % MyPe /= 0 )RETURN
setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )
...........
setfgVar % Values(1) = 1.0
...........
DO i=1,Model % NumberOfNodes
j = fehdPerm(i)
IF( j == 0 ) CYCLE
DO k=1,DIM
aveFEHD(DIM*(j-1)+k) = 0.0
END DO
END DO
..........
END SUBROUTINE voltage
I presume
mpirun -np 2 ElmerSolver_mpi case.sif
will use subfolder partitioning.2 to find the mesh and that only one processor (ParEnv % MyPe = 0 ) knows what to do with the above loop over the entire mesh. So all field variables are taken care of by one processor and the other processor does not need to do anything at all.
You can of course correct me if I am wrong in my assumptions
ElmerGrid 2 2 meshdirname -partdual -metiskway 2
and output shows no errors and claims it was successful
I can see the 2 partitions under subfolder partitioning.2 of folder meshdirname
If case .sif contains something like
Header
Mesh DB "." "meshdirname/partitioning.2"
End
the program aborts immediately and you observe a warning about a non-existent partition 2
Therefore I stick with using
Header
Mesh DB "." "meshdirname"
End
Now in more detail, my file volt.F90 also contains fields variables:
SUBROUTINE voltage( Model,Solver,dt,TransientSimulation )
..........
IF( ParEnv % MyPe /= 0 )RETURN
setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )
...........
setfgVar % Values(1) = 1.0
...........
DO i=1,Model % NumberOfNodes
j = fehdPerm(i)
IF( j == 0 ) CYCLE
DO k=1,DIM
aveFEHD(DIM*(j-1)+k) = 0.0
END DO
END DO
..........
END SUBROUTINE voltage
I presume
mpirun -np 2 ElmerSolver_mpi case.sif
will use subfolder partitioning.2 to find the mesh and that only one processor (ParEnv % MyPe = 0 ) knows what to do with the above loop over the entire mesh. So all field variables are taken care of by one processor and the other processor does not need to do anything at all.
You can of course correct me if I am wrong in my assumptions
-
- Site Admin
- Posts: 4812
- Joined: 22 Aug 2009, 11:57
- Antispam: Yes
- Location: Espoo, Finland
- Contact:
Re: ElmerSolver_mpi stuck in a loop
Hi
In MPI all processes typically carry out their own task on their own piece of data. Communication must be done explicitely using MPI commands. Partition 0 roughly owns half of the mesh and partition 1 the rest.
Maybe you could add
to get more data from both processes to see where the code freezes.
-Peter
In MPI all processes typically carry out their own task on their own piece of data. Communication must be done explicitely using MPI commands. Partition 0 roughly owns half of the mesh and partition 1 the rest.
Maybe you could add
Code: Select all
Max Output Level = 20
Max Output Partition = 2
-Peter
-
- Posts: 2237
- Joined: 25 Jan 2019, 01:28
- Antispam: Yes
Re: ElmerSolver_mpi stuck in a loop
It has been awhile since I coded and compiled mpi programs, but I remember having to use an mpi compiler or setting flags to get mpi to work for a program. I have not tested a user subroutine using elmerf90 with mpi. Perhaps elmerf90 handles it, or the technology has moved on from my previous experience.
I tried the same with one of my user subroutines compiled with elmerf90 and it worked fine with no issues, So the above does not appear to be the issue.
I tried the same with one of my user subroutines compiled with elmerf90 and it worked fine with no issues, So the above does not appear to be the issue.
Last edited by kevinarden on 19 Mar 2021, 23:20, edited 1 time in total.
-
- Posts: 2237
- Joined: 25 Jan 2019, 01:28
- Antispam: Yes
Re: ElmerSolver_mpi stuck in a loop
If you want to share the mesh, subroutine, and sif file, I can do an independent check.
Re: ElmerSolver_mpi stuck in a loop
For case.sif, I added
Max Output Level = 20
Max Output Partition = 2
in the Simulation section
and with its
Solver 1
Equation = "potential"
Variable = -global Whatever
Exported Variable 1 = -global setflag
Exec Solver = Always
Procedure = "volt" "voltage"
End
Solver 2
Exec Condition = Equals setflag
Equation = "result vtu"
Procedure = "ResultOutputSolve" "ResultOutputSolver"
Output File Name = "parav"
Vtu Format = Logical True
Single Precision = Logical True ! double precision is the default
Scalar Field 1 = String Potential
Vector Field 1 = String Velocity
..........
End
and my volt.F90 now simply
setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )
setfgVar % Values(1) = 1.0
IF( ParEnv % MyPe /= 0 )RETURN
setfgVar % Values(1) = -1.0
RETURN
the results are
UpdateDependentObjects: Part1: Updating objects depending on primary field in steady state
DerivateExportedVariables: Part1: Derivating variables, if any!
UpdateDependentObjects: Part0: Updating objects depending on primary field in steady state
DerivateExportedVariables: Part0: Derivating variables, if any!
---- now program is frozen
whereas
setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )
setfgVar % Values(1) = -1.0
IF( ParEnv % MyPe /= 0 )RETURN
setfgVar % Values(1) = 1.0
RETURN
the results are
UpdateDependentObjects: Part1: Updating objects depending on primary field in steady state
DerivateExportedVariables: Part1: Derivating variables, if any!
UpdateDependentObjects: Part0: Updating objects depending on primary field in steady state
DerivateExportedVariables: Part0: Derivating variables, if any!
SetActiveElementsTable: Part0: Creating active element table for: result vtu
SetActiveElementsTable: Part0: Number of active elements found : 8029
---- now program is frozen
However if volt.F90 is reduced to
setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )
setfgVar % Values(1) = 1.0
RETURN
then the program does not freeze.
Quite fantastic! you would think setflag is a global variable but it is as if there is a separate setflag variable for each processor.
Also I am not sure how easy it is to change volt.F90 to incorporate MPI communications of the sort I saw inside MainUtils.F90
Have a nice weekend
Max Output Level = 20
Max Output Partition = 2
in the Simulation section
and with its
Solver 1
Equation = "potential"
Variable = -global Whatever
Exported Variable 1 = -global setflag
Exec Solver = Always
Procedure = "volt" "voltage"
End
Solver 2
Exec Condition = Equals setflag
Equation = "result vtu"
Procedure = "ResultOutputSolve" "ResultOutputSolver"
Output File Name = "parav"
Vtu Format = Logical True
Single Precision = Logical True ! double precision is the default
Scalar Field 1 = String Potential
Vector Field 1 = String Velocity
..........
End
and my volt.F90 now simply
setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )
setfgVar % Values(1) = 1.0
IF( ParEnv % MyPe /= 0 )RETURN
setfgVar % Values(1) = -1.0
RETURN
the results are
UpdateDependentObjects: Part1: Updating objects depending on primary field in steady state
DerivateExportedVariables: Part1: Derivating variables, if any!
UpdateDependentObjects: Part0: Updating objects depending on primary field in steady state
DerivateExportedVariables: Part0: Derivating variables, if any!
---- now program is frozen
whereas
setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )
setfgVar % Values(1) = -1.0
IF( ParEnv % MyPe /= 0 )RETURN
setfgVar % Values(1) = 1.0
RETURN
the results are
UpdateDependentObjects: Part1: Updating objects depending on primary field in steady state
DerivateExportedVariables: Part1: Derivating variables, if any!
UpdateDependentObjects: Part0: Updating objects depending on primary field in steady state
DerivateExportedVariables: Part0: Derivating variables, if any!
SetActiveElementsTable: Part0: Creating active element table for: result vtu
SetActiveElementsTable: Part0: Number of active elements found : 8029
---- now program is frozen
However if volt.F90 is reduced to
setfgVar => VariableGet( Solver % Mesh % Variables, 'setflag' )
setfgVar % Values(1) = 1.0
RETURN
then the program does not freeze.
Quite fantastic! you would think setflag is a global variable but it is as if there is a separate setflag variable for each processor.
Also I am not sure how easy it is to change volt.F90 to incorporate MPI communications of the sort I saw inside MainUtils.F90
Have a nice weekend
Re: ElmerSolver_mpi stuck in a loop
Good day kevinarden
I have attached a barebones mesh and program. Download all 3 files in the same folder and execute
ElmerGrid 1 2 rect.grd
ElmerGrid 2 2 rect -partdual -metiskway 2
elmerf90 volt.F90 -o volt.so
mpirun -np 2 ElmerSolver_mpi case.sif
within that folder.
The program will freeze almost right away. And of course if you comment out line
IF( ParEnv % MyPe /= 0 )RETURN
in volt.F90, the program runs normally.
These two lines in the simulation section of case.sif, as suggested by Peter, are useful in debugging.
Max Output Level = 20
Max Output Partition = 2
Have a nice end of weekend
Marc
I have attached a barebones mesh and program. Download all 3 files in the same folder and execute
ElmerGrid 1 2 rect.grd
ElmerGrid 2 2 rect -partdual -metiskway 2
elmerf90 volt.F90 -o volt.so
mpirun -np 2 ElmerSolver_mpi case.sif
within that folder.
The program will freeze almost right away. And of course if you comment out line
IF( ParEnv % MyPe /= 0 )RETURN
in volt.F90, the program runs normally.
These two lines in the simulation section of case.sif, as suggested by Peter, are useful in debugging.
Max Output Level = 20
Max Output Partition = 2
Have a nice end of weekend
Marc
-
- Posts: 2237
- Joined: 25 Jan 2019, 01:28
- Antispam: Yes
Re: ElmerSolver_mpi stuck in a loop
It happens on my system exactly as you describe.