I experienced a problem with the ElmerSolver_mpi/VectorHelmholtz module on a high performance computer with the SLURM wordload manager. About a third of the job submissions of an identical simulation crash, with the typical error message below.
I've also attached the mesh and .sif file of the the standard bent waveguide example from the ElmerGUI tutorial which causes this problem.
Does anyone have an idea what is going wrong and how it could be solved?
Best regards
Code: Select all
ELMER SOLVER (v 8.2) STARTED AT: 2017/03/06 14:43:45
ELMER SOLVER (v 8.2) STARTED AT: 2017/03/06 14:43:45
ELMER SOLVER (v 8.2) STARTED AT: 2017/03/06 14:43:45
ELMER SOLVER (v 8.2) STARTED AT: 2017/03/06 14:43:45
ELMER SOLVER (v 8.2) STARTED AT: 2017/03/06 14:43:45
ELMER SOLVER (v 8.2) STARTED AT: 2017/03/06 14:43:45
ELMER SOLVER (v 8.2) STARTED AT: 2017/03/06 14:43:45
ELMER SOLVER (v 8.2) STARTED AT: 2017/03/06 14:43:45
ELMER SOLVER (v 8.2) STARTED AT: 2017/03/06 14:43:45
ELMER SOLVER (v 8.2) STARTED AT: 2017/03/06 14:43:45
ELMER SOLVER (v 8.2) STARTED AT: 2017/03/06 14:43:45
ELMER SOLVER (v 8.2) STARTED AT: 2017/03/06 14:43:45
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x2AF060D04367
#1 0x2AF060D0497E
#2 0x2AF06272491F
#3 0x2AF063726BA0
#4 0x2AF063724609
#5 0x2AF061F73D2C
#6 0x2AF061DDC4A2
#7 0x2AF061E4E28A
#8 0x2AF061E4039B
#9 0x2AF061E356F2
#10 0x2AF061F35E79
#11 0x2AF061DDA038
#12 0x2AF061DF9D79
#13 0x2AF061B4C707
#14 0x2AF05F40883E
#15 0x2AF05F4FC553
#16 0x401252 in MAIN__ at Solver.F90:69
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
srun: error: compute-a1-017: task 4: Segmentation fault (core dumped)
srun: Terminating job step 60962479.0
slurmstepd: error: *** STEP 60962479.0 ON compute-a1-017 CANCELLED AT 2017-03-06T14:43:47 ***
srun: Job step aborted: Waiting up to 122 seconds for job step to finish.
srun: error: compute-a1-017: tasks 0-3: Killed
srun: error: compute-a1-029: tasks 9-11: Killed
srun: error: compute-a1-019: tasks 5-8: Killed