Elmer built with MPI - MUMPS crashes on Debian

Clearly defined bug reports and their fixes
Post Reply
flowwolf
Posts: 7
Joined: 14 Dec 2016, 01:39
Antispam: Yes

Elmer built with MPI - MUMPS crashes on Debian

Post by flowwolf » 08 May 2019, 00:31

Hi guys,

First, I would like to congratulate to the authors to create such a professional software and release it to the public.

I have built Elmer from source on Debian with MPI and MUMPS, it crashes (SIGSEGV) on the simple HelmholtzStructure2 example when called as:
'mpirun -np 2 ElmerSolver_mpi'

There is no crash when called as 'mpirun -np 1 ElmerSolver_mpi', the serial version works too

  • ElmerSolver: Version: 8.4 (Rev: 7d1e94b4, Compiled: 2019-05-07)
  • uname -r: Linux x3k30c-Azalia 4.9.0-3-rt-amd64 #1 SMP PREEMPT RT Debian 4.9.30-2+deb9u5 (2017-09-19) x86_64 GNU/Linux
  • lsb_release -a: Distributor ID: Debian; Description: Debian GNU/Linux 9.9 (stretch); Release: 9.9; Codename: stretch
  • [edit]: gcc (Debian 6.3.0-18) 6.3.0 20170516
cmake was called with:

Code: Select all

cmake  -DWITH_OpenMP:BOOL=FALSE  -DWITH_MPI:BOOL=TRUE  -DWITH_Mumps:BOOL=TRUE  -DWITH_Hypre:BOOL=FALSE  \
       -DWITH_ELMERGUI:BOOL=TRUE  -DWITH_ELMERGUILOGGER:BOOL=TRUE \
       -DCMAKE_INSTALL_PREFIX=../install ../elmerfem
result:

Code: Select all

-- The Fortran compiler identification is GNU 6.3.0
-- The C compiler identification is GNU 6.3.0
-- The CXX compiler identification is GNU 6.3.0
-- Check for working Fortran compiler: /usr/bin/f95
-- Check for working Fortran compiler: /usr/bin/f95  -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /usr/bin/f95 supports Fortran 90
-- Checking whether /usr/bin/f95 supports Fortran 90 -- yes
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Checking whether GFortran version >= 4.8 -- yes
-- Found MPI_C: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so
-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so
-- Found MPI_Fortran: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempi_ignore_tkr.so;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_mpifh.so;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so
-- ------------------------------------------------
-- Mesh adaptation 2D/3D looking for [Mmg] tools
--   Mmg:           TRUE
--   Mmg_INC:       /usr/local/include
--   Mmg_LIB:      /usr/local/lib/libmmg.a
--   Mmg_LIBDIR:      /usr/local/lib
-- Compile MMG2DSolver/MMG3DSolver
-- ------------------------------------------------
-- ------------------------------------------------
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - found
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- A library with BLAS API found.
-- A library with BLAS API found.
-- Looking for Fortran cheev
-- Looking for Fortran cheev - found
-- A library with LAPACK API found.
-- Finding Mumps
-- Finding SCALAPACK
-- Checking if BLACS library is needed by SCALAPACK
-- Looking for Fortran blacs_gridinit
-- Looking for Fortran blacs_gridinit - not found
-- Checking if BLACS library is needed by SCALAPACK -- yes
-- Finding BLACS
-- A library with BLACS API found.
-- BLACS libraries: /usr/lib/libblacs-openmpi.so
-- Checking if Metis library is needed by Mumps
-- Checking if Metis library is needed by Mumps -- yes
-- Finding Metis
-- Checking if ParMetis library is needed by Mumps
-- Checking if ParMetis library is needed by Mumps -- yes
-- Finding ParMetis
-- A library with Mumps API found.
-- Mumps include dir: /usr/include
-- Mumps libraries: /usr/lib/libdmumps.so;/usr/lib/libmumps_common.so;/usr/lib/libpord.so;/usr/lib/libscalapack-openmpi.so;/usr/lib/libblacs-openmpi.so;/usr/lib/x86_64-linux-gnu/libmetis.so;/usr/lib/libparmetis.so
-- Checking whether /usr/bin/f95 supports PROCEDURE POINTER
-- Checking whether /usr/bin/f95 supports PROCEDURE POINTER -- yes
-- Checking whether /usr/bin/f95 supports CONTIGUOUS
-- Checking whether /usr/bin/f95 supports CONTIGUOUS -- yes
-- Checking whether /usr/bin/f95 supports EXECUTE_COMMAND_LINE
-- Checking whether /usr/bin/f95 supports EXECUTE_COMMAND_LINE -- yes
-- Looking for include file inttypes.h
-- Looking for include file inttypes.h - found
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of long
-- Check size of long - done
-- Found 116 modules from /media/h2/_src/elmerfem/fem/src/modules
--  ELMERSOLVER_RPATH_STRING_MOD $ORIGIN/../lib/elmersolver:/usr/local/lib
--  ELMERLIB_RPATH_STRING $ORIGIN/:/usr/local/lib
-- Skipping test PoissonDG with 16 procs
-- Skipping test WinkelPoissonMetisKwayDual with 16 procs
-- Skipping test WinkelPoissonMetisKwayNodal with 16 procs
-- Found 584 tests
-- Looking for execinfo.h
-- Looking for execinfo.h - found
-- Looking for getline
-- Looking for getline - found
-- checking for thread-local storage - found
--   Building ElmerGUI
-- ------------------------------------------------
-- ------------------------------------------------
-- Looking for Q_WS_X11
-- Looking for Q_WS_X11 - found
-- Looking for Q_WS_WIN
-- Looking for Q_WS_WIN - not found
-- Looking for Q_WS_QWS
-- Looking for Q_WS_QWS - not found
-- Looking for Q_WS_MAC
-- Looking for Q_WS_MAC - not found
-- Found Qt4: /usr/bin/qmake-qt4 (found version "4.8.7")
--   [ElmerGUI] Qt4:               TRUE
--   [ElmerGUI] Qt4_LIBRARIES:
-- ------------------------------------------------
-- ------------------------------------------------
CMake Warning (dev) at /usr/share/cmake-3.7/Modules/CheckCXXSymbolExists.cmake:35 (include):
  File /usr/share/cmake-3.7/Modules/CheckCXXSymbolExists.cmake includes
  /usr/share/cmake-3.7/Modules/CheckSymbolExists.cmake (found via
  CMAKE_MODULE_PATH) which shadows
  /usr/share/cmake-3.7/Modules/CheckSymbolExists.cmake.  This may cause
  errors later on .

  Policy CMP0017 is not set: Prefer files from the CMake module directory
  when including from there.  Run "cmake --help-policy CMP0017" for policy
  details.  Use the cmake_policy command to set the policy and suppress this
  warning.
Call Stack (most recent call first):
  /usr/share/cmake-3.7/Modules/FindQt4.cmake:334 (include)
  ElmerGUIlogger/CMakeLists.txt:4 (FIND_PACKAGE)
This warning is for project developers.  Use -Wno-dev to suppress it.

--   [ElmerGUIlogger] Qt4:               TRUE
--   [ElmerGUIlogger] Qt4_LIBRARIES:
-- ------------------------------------------------
CMake Warning at ElmerGUIlogger/CMakeLists.txt:28 (MESSAGE):
  QT_USE_FILE: /usr/share/cmake-3.7/Modules/UseQt4.cmake


-- ------------------------------------------------
--   BLAS library:   /usr/lib/libblas.so
--   LAPACK library: /usr/lib/liblapack.so;/usr/lib/libblas.so
-- ------------------------------------------------
--   Fortran compiler:        /usr/bin/f95
--   Fortran flags:            -O2 -g -DNDEBUG
-- ------------------------------------------------
--   C compiler:              /usr/bin/cc
--   C flags:                  -O2 -g -DNDEBUG
-- ------------------------------------------------
--   CXX compiler:            /usr/bin/c++
--   CXX flags:                -O2 -g -DNDEBUG
-- ------------------------------------------------
--   MPI Fortran:             TRUE
--   MPI Fortran compiler:    /usr/bin/mpif90
--   MPI Fortran flags:
--   MPI Fortran include dir: /usr/lib/x86_64-linux-gnu/openmpi/include;/usr/lib/x86_64-linux-gnu/openmpi/lib
--   MPI Fortran libraries:   /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempi_ignore_tkr.so;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_mpifh.so;/usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so
--   MPI Fortran link flags:
-- ------------------------------------------------
--   MPI C:             TRUE
--   MPI C compiler:    /usr/bin/mpicc
--   MPI C flags:
--   MPI C include dir: /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi;/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent;/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include;/usr/lib/x86_64-linux-gnu/openmpi/include
--   MPI C libraries:   /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so
--   MPI C flags:
-- ------------------------------------------------
--   Mumps:             TRUE
--   Mumps include:     /usr/include
--   Mumps libraries:   /usr/lib/libdmumps.so;/usr/lib/libmumps_common.so;/usr/lib/libpord.so;/usr/lib/libscalapack-openmpi.so;/usr/lib/libblacs-openmpi.so;/usr/lib/x86_64-linux-gnu/libmetis.so;/usr/lib/libparmetis.so
-- ------------------------------------------------
--   Building ElmerGUI logger
-- ------------------------------------------------
-- ------------------------------------------------
--   Package filename: elmerfem-8.4-7d1e94b4-20190507_Linux-x86_64
--   Patch version: 8.4-7d1e94b4
-- Configuring done
-- Generating done
-- Build files have been written to: /media/h2/_src/elmerbuild_mpi

First, make stops when compiling ViewFactors, GebhardtFactors and Solver_TGT by signaling that it can't find libscotch (attached as log1.txt and log2.txt)
However, I managed to fix this problem by appending '/usr/lib/libscotch-5.1.so' to the following files:
fem/src/CMakeFiles/ViewFactors.dir/link.txt
fem/src/CMakeFiles/GebhardtFactors.dir/link.txt
fem/src/CMakeFiles/Solver_TGT.dir/link.txt

mumps was installed from here: https://packages.debian.org/source/stretch/mumps, it also needs libmetis and libparmetis so:

Code: Select all

dpkg -l |grep -E 'mumps|metis'
ii  libmetis-dev                                                5.1.0.dfsg-5+b2                             amd64        Serial Graph Partitioning and Fill-reducing Matrix Ordering. Header
ii  libmetis5:amd64                                             5.1.0.dfsg-5+b2                             amd64        Serial Graph Partitioning and Fill-reducing Matrix Ordering
ii  libmumps-4.10.0                                             4.10.0.dfsg-4+b2                            amd64        Direct linear systems solver - parallel shared libraries
ii  libmumps-dev                                                4.10.0.dfsg-4+b2                            amd64        Direct linear systems solver - parallel development files
ii  libparmetis-dev                                             4.0.3-4+b4                                  amd64        Parallel Graph Partitioning and Sparse Matrix Ordering Libs: Devel
ii  libparmetis4.0                                              4.0.3-4+b4                                  amd64        Parallel Graph Partitioning and Sparse Matrix Ordering Shared Libs

dpkg -l |grep libscotch
ii  libscotch-5.1                                               5.1.12b.dfsg-2.1                            amd64        programs and libraries for graph, mesh and hypergraph partitioning
ii  libscotch-dev                                               5.1.12b.dfsg-2.1                            amd64        programs and libraries for graph, mesh and hypergraph partitioning

Now make builds Elmer, but this build crashes on the simple HelmholtzStructure2 example, for example:

Code: Select all

ELMER SOLVER (v 8.4) STARTED AT: 2019/05/07 17:13:06
ELMER SOLVER (v 8.4) STARTED AT: 2019/05/07 17:13:06
ParCommInit: ParCommInit:  Initialize #PEs:            2
MAIN:
MAIN: =============================================================
MAIN: ElmerSolver finite element software, Welcome!
MAIN: This program is free software licensed under (L)GPL
 Initialize #PEs:            2
MAIN: Copyright 1st April 1995 - , CSC - IT Center for Science Ltd.
MAIN: Webpage http://www.csc.fi/elmer, Email elmeradm@csc.fi
MAIN: Version: 8.4 (Rev: 7d1e94b4, Compiled: 2019-05-07)
MAIN:  Running in parallel using 2 tasks.
MAIN:  Running with just one thread per task.
MAIN:  MUMPS library linked in.
MAIN: =============================================================
MAIN:
MAIN:
MAIN: -------------------------------------
MAIN: Reading Model: case.sif
LoadInputFile: Scanning input file: case.sif
LoadInputFile: Loading input file: case.sif
Model Input:  Unlisted keyword: [stress bodyforce 1 im] in section: [body force 1]
Loading user function library: [HelmholtzSolve]...[HelmholtzSolver_Init0]
Loading user function library: [SaveData]...[SaveScalars_Init0]
LoadMesh: Base mesh name: ./angle_in_halfcircle
LoadMesh: Elapsed REAL time:     0.0811 (s)
MAIN: -------------------------------------
Loading user function library: [StressSolve]...[StressSolver_Init]
Loading user function library: [StressSolve]...[StressSolver_bulk]
Loading user function library: [StressSolve]...[StressSolver]
OptimizeBandwidth: ---------------------------------------------------------
OptimizeBandwidth: Computing matrix structure for: stress analysis...done.
OptimizeBandwidth: Half bandwidth without optimization: 41
OptimizeBandwidth:
OptimizeBandwidth: Bandwidth Optimization ...done.
OptimizeBandwidth: Half bandwidth after optimization: 8
OptimizeBandwidth: ---------------------------------------------------------
Loading user function library: [HelmholtzSolve]...[HelmholtzSolver_Init]
Loading user function library: [HelmholtzSolve]...[HelmholtzSolver_bulk]
Loading user function library: [HelmholtzSolve]...[HelmholtzSolver]
OptimizeBandwidth: ---------------------------------------------------------
OptimizeBandwidth: Computing matrix structure for: helmholtz...done.
OptimizeBandwidth: Half bandwidth without optimization: 2313
OptimizeBandwidth:
OptimizeBandwidth: Bandwidth Optimization ...done.
OptimizeBandwidth: Half bandwidth after optimization: 67
OptimizeBandwidth: ---------------------------------------------------------
Loading user function library: [SaveData]...[SaveScalars_Init]
Loading user function library: [SaveData]...[SaveScalars_bulk]
Loading user function library: [SaveData]...[SaveScalars]
MAIN:
MAIN: -------------------------------------
MAIN:  Steady state iteration:            1
MAIN: -------------------------------------
MAIN:
SingleSolver: Attempting to call solver
SingleSolver: Solver Equation string is: stress analysis
StressSolve:
StressSolve: --------------------------------------------------
StressSolve: Solving displacements from linear elasticity model
StressSolve: --------------------------------------------------
StressSolve: Starting assembly...
StressSolve: Assembly:
StressSolve: Bulk assembly done
DefUtils::DefaultDirichletBCs: Setting Dirichlet boundary conditions
DefUtils::DefaultDirichletBCs: Dirichlet boundary conditions set
StressSolve: Set boundaries done
HarmonicSolve: Solving initially transient style system as harmonic one
OptimizeBandwidth: ---------------------------------------------------------
OptimizeBandwidth: Computing matrix structure for: stress analysis...done.
OptimizeBandwidth: Half bandwidth without optimization: 41
OptimizeBandwidth:
OptimizeBandwidth: Bandwidth Optimization ...done.
OptimizeBandwidth: Half bandwidth after optimization: 8
OptimizeBandwidth: ---------------------------------------------------------
HarmonicSolve: Frequency value:    0.600E+03

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f6740dbcd1d in ???
#1  0x7f6740dbbf7d in ???
#2  0x7f673f50e05f in ???
#3  0x7f6743d71f84 in setsinglepoint
        at /media/h2/_src/elmerfem/fem/src/SolverUtils.F90:5465
#4  0x7f6743dc3331 in setelementvalues
        at /media/h2/_src/elmerfem/fem/src/SolverUtils.F90:5312
#5  0x7f6743dcb74d in __solverutils_MOD_setdirichletboundaries
        at /media/h2/_src/elmerfem/fem/src/SolverUtils.F90:4623
#6  0x7f6743dde476 in __solverutils_MOD_solveharmonicsystem
        at /media/h2/_src/elmerfem/fem/src/SolverUtils.F90:13209
#7  0x7f6743dda60e in __solverutils_MOD_solvelinearsystem
        at /media/h2/_src/elmerfem/fem/src/SolverUtils.F90:11650
#8  0x7f6743dd85aa in __solverutils_MOD_solvesystem
        at /media/h2/_src/elmerfem/fem/src/SolverUtils.F90:12188
#9  0x7f6743f0fd98 in __defutils_MOD_defaultsolve
        at /media/h2/_src/elmerfem/fem/src/DefUtils.F90:3223
#10  0x7f671f9efdfe in stresssolver_
        at /media/h2/_src/elmerfem/fem/src/modules/StressSolve.F90:662
#11  0x7f6743de7047 in __mainutils_MOD_singlesolver
        at /media/h2/_src/elmerfem/fem/src/MainUtils.F90:5129
#12  0x7f6743dfc205 in __mainutils_MOD_solveractivate
        at /media/h2/_src/elmerfem/fem/src/MainUtils.F90:5365
#13  0x7f6743dfd65e in solvecoupled
        at /media/h2/_src/elmerfem/fem/src/MainUtils.F90:3067
#14  0x7f6743dff2d9 in __mainutils_MOD_solveequations
        at /media/h2/_src/elmerfem/fem/src/MainUtils.F90:2769
#15  0x7f6743fdc5d4 in execsimulation
        at /media/h2/_src/elmerfem/fem/src/ElmerSolver.F90:2396
#16  0x7f6743fdc5d4 in elmersolver_
        at /media/h2/_src/elmerfem/fem/src/ElmerSolver.F90:589
#17  0x561ae85832c8 in solver
        at /media/h2/_src/elmerfem/fem/src/Solver.F90:69
#18  0x561ae8582ffe in main
        at /media/h2/_src/elmerfem/fem/src/Solver.F90:34
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node x3k30c-Azalia exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

And here is the interesting thing, it doesn't just work when called as 'mpirun -np 1 ElmerSolver_mpi', but it seems that Elmer with mumps actually uses 2 cores instead of 1 even in this case. Elmer works with mumps when built without MPI too, and it still uses 2 cores. The ElmerSolver manual says that Mumps doesn't work with the serial build, so what's going on?

Output of 'mpirun -np 1 ElmerSolver_mpi'

Code: Select all

ELMER SOLVER (v 8.4) STARTED AT: 2019/05/07 23:16:28
ParCommInit:  Initialize #PEs:            1
MAIN: 
MAIN: =============================================================
MAIN: ElmerSolver finite element software, Welcome!
MAIN: This program is free software licensed under (L)GPL
MAIN: Copyright 1st April 1995 - , CSC - IT Center for Science Ltd.
MAIN: Webpage http://www.csc.fi/elmer, Email elmeradm@csc.fi
MAIN: Version: 8.4 (Rev: 7d1e94b4, Compiled: 2019-05-07)
MAIN:  Running one task without MPI parallelization.
MAIN:  Running in parallel with 2 threads per task.
MAIN:  HYPRE library linked in.
MAIN:  MUMPS library linked in.
MAIN: =============================================================
MAIN: 
MAIN: 
MAIN: -------------------------------------
MAIN: Reading Model: case.sif
LoadInputFile: Scanning input file: case.sif
LoadInputFile: Loading input file: case.sif
Model Input:  Unlisted keyword: [stress bodyforce 1 im] in section: [body force 1]
Loading user function library: [HelmholtzSolve]...[HelmholtzSolver_Init0]
Loading user function library: [SaveData]...[SaveScalars_Init0]
LoadMesh: Base mesh name: ./angle_in_halfcircle
LoadMesh: Elapsed REAL time:     0.0295 (s)
MAIN: -------------------------------------
Loading user function library: [StressSolve]...[StressSolver_Init]
Loading user function library: [StressSolve]...[StressSolver_bulk]
Loading user function library: [StressSolve]...[StressSolver]
OptimizeBandwidth: ---------------------------------------------------------
OptimizeBandwidth: Computing matrix structure for: stress analysis...done.
OptimizeBandwidth: Half bandwidth without optimization: 72
OptimizeBandwidth: 
OptimizeBandwidth: Bandwidth Optimization ...done.
OptimizeBandwidth: Half bandwidth after optimization: 12
OptimizeBandwidth: ---------------------------------------------------------
Loading user function library: [HelmholtzSolve]...[HelmholtzSolver_Init]
Loading user function library: [HelmholtzSolve]...[HelmholtzSolver_bulk]
Loading user function library: [HelmholtzSolve]...[HelmholtzSolver]
OptimizeBandwidth: ---------------------------------------------------------
OptimizeBandwidth: Computing matrix structure for: helmholtz...done.
OptimizeBandwidth: Half bandwidth without optimization: 4549
OptimizeBandwidth: 
OptimizeBandwidth: Bandwidth Optimization ...done.
OptimizeBandwidth: Half bandwidth after optimization: 175
OptimizeBandwidth: ---------------------------------------------------------
Loading user function library: [SaveData]...[SaveScalars_Init]
Loading user function library: [SaveData]...[SaveScalars_bulk]
Loading user function library: [SaveData]...[SaveScalars]
MAIN: 
MAIN: -------------------------------------
MAIN:  Steady state iteration:            1
MAIN: -------------------------------------
MAIN: 
SingleSolver: Attempting to call solver
SingleSolver: Solver Equation string is: stress analysis
StressSolve: 
StressSolve: --------------------------------------------------
StressSolve: Solving displacements from linear elasticity model
StressSolve: --------------------------------------------------
StressSolve: Starting assembly...
StressSolve: Assembly:
StressSolve: Bulk assembly done
DefUtils::DefaultDirichletBCs: Setting Dirichlet boundary conditions
DefUtils::DefaultDirichletBCs: Dirichlet boundary conditions set
StressSolve: Set boundaries done
HarmonicSolve: Solving initially transient style system as harmonic one
OptimizeBandwidth: ---------------------------------------------------------
OptimizeBandwidth: Computing matrix structure for: stress analysis...done.
OptimizeBandwidth: Half bandwidth without optimization: 72
OptimizeBandwidth: 
OptimizeBandwidth: Bandwidth Optimization ...done.
OptimizeBandwidth: Half bandwidth after optimization: 12
OptimizeBandwidth: ---------------------------------------------------------
HarmonicSolve: Frequency value:    0.600E+03
ComputeChange: NS (ITER=1) (NRM,RELC): ( 0.49318577E-05  2.0000000     ) :: stress analysis
StressSolver: All done
StressSolver: ------------------------------------------
Loading user function library: [StressSolve]...[StressSolver_post]
ComputeChange: SS (ITER=1) (NRM,RELC): (  0.0000000      0.0000000     ) :: stress analysis
SingleSolver: Attempting to call solver
SingleSolver: Solver Equation string is: helmholtz
HelmholtzSolve: 
HelmholtzSolve: -------------------------------------
HelmholtzSolve:  Helmholtz iteration           1
HelmholtzSolve:  Frequency (Hz):    600.00000000000000
HelmholtzSolve: -------------------------------------
HelmholtzSolve: 
HelmholtzSolve: Starting Assembly
HelmholtzSolve: Assembly:
HelmholtzSolve: Assembly done
DefUtils::DefaultDirichletBCs: Setting Dirichlet boundary conditions
DefUtils::DefaultDirichletBCs: Dirichlet boundary conditions set
ComputeChange: NS (ITER=1) (NRM,RELC): (  8.4633149      2.0000000     ) :: helmholtz
HelmholtzSolve: iter:    1 Assembly: (s)    0.09    0.09
HelmholtzSolve: iter:    1 Solve:    (s)    0.12    0.12
Loading user function library: [HelmholtzSolve]...[HelmholtzSolver_post]
ComputeChange: SS (ITER=1) (NRM,RELC): (  8.4633149      2.0000000     ) :: helmholtz
WARNING:: CompareToReferenceSolution: Solver 2 FAILED:  Norm = 8.46331487E+00  RefNorm = 3.54360520E+01
CompareToReferenceSolution: Relative Error to reference norm: 7.611665E-01
WARNING:: CompareToReferenceSolution: FAILED 1 tests out of 1!
ElmerSolver: *** Elmer Solver: ALL DONE ***
ElmerSolver: The end
SOLVER TOTAL TIME(CPU,REAL):         0.45        0.82
ELMER SOLVER FINISHED AT: 2019/05/07 23:16:29
I've read on this forum viewtopic.php?t=4155, that Mumps doesn't like metis or libparmetis so I added the following to Elmerfem/CmakeLists.txt

(libmumps-ptscotch)

Code: Select all

SET(Mumps_INCLUDE_DIR /usr/include)
SET(Mumps_LIBRARIES /usr/lib/libcmumps_ptscotch-4.10.0.so;/usr/lib/libdmumps_ptscotch-4.10.0.so;/usr/lib/libmumps_common_ptscotch-4.10.0.so;/usr/lib/libpord_ptscotch-4.10.0.so;/usr/lib/libsmumps_ptscotch-4.10.0.so;/usr/lib/libzmumps_ptscotch-4.10.0.so)
make works now without asking for libscotch, but the crash is still there..


I am not sure that I can download the source of Mumps because I don't study or work at an Institute.
Anyone knows if it's possible to build an MPI version this way, or is this considered a bug?

Currently, the only way to run Elmer in parallel to run 2 instances, but that allocates twice as much memory.

Regards,
flowwolf
Attachments
log2.txt
(104.18 KiB) Downloaded 31 times
log1.txt
(104.76 KiB) Downloaded 33 times
Last edited by flowwolf on 17 Jun 2019, 18:43, edited 2 times in total.

flowwolf
Posts: 7
Joined: 14 Dec 2016, 01:39
Antispam: Yes

Re: Elmer built with MPI - MUMPS crashes on Debian

Post by flowwolf » 08 May 2019, 18:52

I changed 'Linear System Direct Method' to Mumps in the HelmholtzStructure2 example, of course.

flowwolf
Posts: 7
Joined: 14 Dec 2016, 01:39
Antispam: Yes

Re: Elmer built with MPI - MUMPS crashes on Debian

Post by flowwolf » 11 May 2019, 02:26

Update:
I still get the very same crash, at 'setsinglepoint' in SolverUtils.F90

I managed to get the source code of the Mumps package, so I compiled Mumps and all the requirements:

OpenBLAS-0.3.6 (with LAPACK)
scalapack-2.0.2
metis-5.1.0
parmetis-4.0.3
scotch_6.0.6

both static and shared, furthermore, I compiled openmpi-4.0.1 and used its mpi wrappers to build Elmer. I had to add some modifications to CMakeLists.txt and to the above mentioned link.txt files.

It works when -np is 1, not with 2 (I have a dual core AMDx2-270).
I tried both with static and shared libraries, with/without a self-compiled openmpi.

Some .so files had to be copied to the $ELMER_HOME/lib directory. I can share the details of the compilation if needed (not all these steps were straightforward). I'll try some MPI example applications next to see whether those work or not.

flowwolf
Posts: 7
Joined: 14 Dec 2016, 01:39
Antispam: Yes

Re: Elmer built with MPI - MUMPS crashes on Debian

Post by flowwolf » 19 May 2019, 15:03

Update: MUMPS is really not available in the serial version, my apologies for that, that was my mistake.

I re-read the README and INSTALL files of the required packages, I understand the concepts (hybrid programming OpenMP + MPI) much better, so some of those packages can use both OpenMP and MPI besides Elmer, no luck so far.

Basic MPI test cases work fine, so now we're trying to run the solver on different machines to see whether it works or not.
A different version of gcc may come next.

Post Reply