Heat Radiation issue with parallel runs

Numerical methods and mathematical models of Elmer
Post Reply
trystan
Posts: 5
Joined: 06 Feb 2021, 21:08
Antispam: Yes

Heat Radiation issue with parallel runs

Post by trystan »

Hi,
I'm working on heating through radiation between several bodies, and so far I was able to produce satisfying results with ElmerGUI (9.0) using a single processor on a Windows 10. It however took several hours. Now when I try to run parallel processes with the very same model and settings I run into a Segmentation error. After reading other forum.posts, it seems to be an issue with either a system path not beeing found or an issue with ElmerGrid, that might be resolved with additional options put in the parallel setting window. However as of now I am not able to resolve the problems myself.

SolverLog:

Code: Select all

Starting program Elmergrid
Elmergrid reading in-line arguments
The mesh will be partitioned with Metis to 2 partitions.
Output will be saved to file D:/Users/Tristan/Desktop/Masterarbeit/Elmer/wafermit2Ringen.

Elmergrid loading data:
-----------------------
Loading mesh in ElmerSolver format from directory D:/Users/Tristan/Desktop/Masterarbeit/Elmer/wafermit2Ringen.
Loading header from mesh.header
Maximum elementtype index is: 504
Maximum number of nodes in element is: 4
Allocating for 8930 knots and 25701 elements.
Loading 8930 Elmer nodes from mesh.nodes
Loading 25701 bulk elements from mesh.elements
Loading 17856 boundary elements from mesh.boundary
Elmer mesh loaded successfully

Elmergrid creating and manipulating meshes:
-------------------------------------------

Elmergrid partitioning meshes:
------------------------------
Making a Metis partitioning for 25701 elements in 3-dimensions.
All elements are of type 504
Minimum number of linear nodes in elements 4
Requiring number of nodes in dual graph 2
Using all 8930 possible nodes in the Metis graph
Allocating mesh topology of size 102804
Starting graph partitioning METIS_PartMeshNodal.
 Runtime parameters:
   Objective type: METIS_OBJTYPE_CUT
   Coarsening type: METIS_CTYPE_SHEM
   Initial partitioning type: METIS_IPTYPE_METISRB
   Refinement type: METIS_RTYPE_GREEDY
   Perform a 2-hop matching: No
   Number of balancing constraints: 1
   Number of refinement iterations: 10
   Random number seed: -1
   Number of partitions: 2
   Number of cuts: 1
   User-supplied ufactor: 30
   Minimize connectivity: No
   Create contigous partitions: No
   Target partition weights: 
        0=[5.00e-001]   1=[5.00e-001]
   Allowed maximum load imbalance: 1.030 


 gk_mcore statistics
           coresize:       143016         nmops:         2048  cmop:      0
        num_callocs:           27   num_hallocs:            0
       size_callocs:       246408  size_hallocs:            0
        cur_callocs:            0   cur_hallocs:            0
        max_callocs:        84656   max_hallocs:            0
 nbrpool statistics
        nbrpoolsize:            0   nbrpoolcpos:            0
    nbrpoolreallocs:            0


Timing Information -------------------------------------------------
 Multilevel: 		   0.004
     Coarsening: 		   0.002
            Matching: 			   0.001
            Contract: 			   0.001
     Initial Partition: 	   0.001
     Uncoarsening: 		   0.001
          Refinement: 			   0.000
          Projection: 			   0.001
     Splitting: 		   0.000
********************************************************************

 gk_mcore statistics
           coresize:       143016         nmops:         2048  cmop:      0
        num_callocs:           53   num_hallocs:            0
       size_callocs:       296424  size_hallocs:            0
        cur_callocs:            0   cur_hallocs:            0
        max_callocs:       142912   max_hallocs:            0
 nbrpool statistics
        nbrpoolsize:         4284   nbrpoolcpos:         4034
    nbrpoolreallocs:            0

Finished graph partitioning METIS_PartMeshNodal.
Set the partition given by Metis for each node
Successfully made a Metis partition using the element mesh.
Optimizing the partitioning at boundaries.
Ownership of 0 parents was changed at BCs
Optimizing for 2 partitions
Creating a table showing all parenting partitions of nodes.
Nodes belong to 2 partitions in maximum
There are 148 shared nodes which is 1.66 % of all nodes.
The initial owner was not any of the elements for 0 nodes
Checking partitioning before optimization
Checking for partitioning
Information on partition bandwidth
Distribution of elements, nodes and shared nodes
     partition  elements   nodes      shared    
     1          13126      4468       0         
     2          12575      4462       148       
Average number of elements in partition 4.465e+003
Maximum deviation in ownership 6
Average deviation in ownership 3.000e+000
Average relative deviation 0.07 %
Checking for problematic sharings
Partitioning was not altered

Elmergrid
 saving data with method 2:
-------------------------------------
Saving Elmer mesh in partitioned format
Number of boundary nodes at the boundary: 8930
Number of additional interface nodes: 0
Created mesh directory: D:/Users/Tristan/Desktop/Masterarbeit/Elmer/wafermit2Ringen
Reusing existing subdirectory: partitioning.2
Saving mesh in parallel ElmerSolver format to directory D:/Users/Tristan/Desktop/Masterarbeit/Elmer/wafermit2Ringen/partitioning.2.
Nodes belong to 2 partitions in maximum
Saving mesh for 2 partitions
   part  elements   nodes      shared   bc elems
   1     13126      4468       0        8786    
   2     12575      4462       148      9070    
----------------------------------------------------------------------------------------------
   ave   12850.5    4465.0     74.0     8928.0   0.0     
Writing of partitioned mesh finished

Thank you for using Elmergrid!
Send bug reports and feature wishes to elmeradm@csc.fi

ELMER SOLVER (v 9.0) STARTED AT: 2021/02/18 11:10:55

ELMER SOLVER (v 9.0) STARTED AT: 2021/02/18 11:10:55
ParCommInit: 
ParCommInit: 
 Initialize #PEs:            2
 Initialize #PEs:            2
MAIN: 
MAIN: =============================================================
MAIN: ElmerSolver finite element software, Welcome!
MAIN: This program is free software licensed under (L)GPL
MAIN: Copyright 1st April 1995 - , CSC - IT Center for Science Ltd.
MAIN: Webpage http://www.csc.fi/elmer, Email elmeradm@csc.fi
MAIN: Version: 9.0 (Rev: Release, Compiled: 2021-02-10)
MAIN:  Running in parallel using 2 tasks.
MAIN:  Running with just one thread per task.
MAIN:  Lua interpreted linked in.
MAIN: =============================================================

LoadInputFile: Reading only "Run Control" section

MAIN: 
MAIN: 
MAIN: -------------------------------------
MAIN: Reading Model: case.sif

LoadInputFile: Scanning input file: case.sif
LoadInputFile: Scanning only size info
LoadInputFile: First time visiting
LoadInputFile: Reading base load of sif file

LoadInputFile: Loading input file: case.sif
LoadInputFile: Reading base load of sif file

LoadInputFile: Number of BCs: 2
LoadInputFile: Number of Body Forces: 0
LoadInputFile: Number of Initial Conditions: 1
LoadInputFile: Number of Materials: 2
LoadInputFile: Number of Equations: 1
LoadInputFile: Number of Solvers: 1
LoadInputFile: Number of Bodies: 3

ElmerAsciiMesh: Base mesh name: ./.

LoadMesh: Elapsed REAL time:     0.0660 (s)

MAIN: -------------------------------------
AddVtuOutputSolverHack: Adding ResultOutputSolver to write VTU output in file: case

RadiationFactors: ----------------------------------------------------
RadiationFactors: Computing radiation factors for heat transfer
RadiationFactors: ----------------------------------------------------

RadiationFactors: Total number of Radiation Surfaces 8786 out of 8786

RadiationFactors: Computing factors...


Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0xffffffff
#1  0xffffffff
#2  0xffffffff
#3  0xffffffff
#4  0xffffffff
#5  0xffffffff
#6  0xffffffff
#7  0xffffffff
#8  0xffffffff
#9  0xffffffff
#10  0xffffffff
#11  0xffffffff
#12  0xffffffff
#13  0xffffffff
#14  0xffffffff
#15  0xffffffff
#16  0xffffffff
#17  0xffffffff

job aborted:
[ranks] message

[0] process exited without calling finalize

[1] terminated

---- error analysis -----

[0] on TRISTAN-PC
ElmerSolver_mpi.exe ended prematurely and may have crashed. exit code 3

---- error analysis -----
Attachments
wafermit2Ringenfein.zip
msh file
(591.14 KiB) Downloaded 129 times
case.sif
(2.44 KiB) Downloaded 143 times
para_setting_window.png
para_setting_window.png (23.54 KiB) Viewed 1833 times
raback
Site Admin
Posts: 4828
Joined: 22 Aug 2009, 11:57
Antispam: Yes
Location: Espoo, Finland
Contact:

Re: Heat Radiation issue with parallel runs

Post by raback »

Hi

Well, the radition is a little tricky thing to be parallelized. There are tens of PDEs that operate succesfully in parallel but I radiation does not really do that. The challenge is in that the view factors basically couple all the elements seeing each other. The default partitioning routine is not aware of this and thus the computation fails. Probably it saves even earlier as I don't see ViewFactors being computed.

The best one could do is to create a partiotioning that honors the domain interfaces for radiation and ensure that the view factors get correctly treated in parallel. Creating a fully parallel view factor computation would require quite a bit of effort.

So, sorry. I don't think that you can do anything here with ElmerGUI.

EDIT: checked that basically the parallel computation works if you partition the case such that it does not break any radiation connections. In the working example below a special partitioning routine into 4 is used honoring the couplings among boundaries {1,3,7}.

Code: Select all

ElmerGrid 2 2 mesh -partdual -connect 1 3 7  -metiskway 4
mpirun -np 4 ElmerSolver_mpi
It seems it is still best to compute the view factors in serial before.

-Peter
trystan
Posts: 5
Joined: 06 Feb 2021, 21:08
Antispam: Yes

Re: Heat Radiation issue with parallel runs

Post by trystan »

Hi,
Thank you for your input.
Still whether i compute the Viewfactors in serial before or in parallel, I still get pretty much the same errors.
But if you say, that parallel runs ususally don't run Radiation as smoothly in usual, and I'm planning to simulate even more structures, I think i'll stick to serial runs, and try to make the heating element not a torus, but flat towards the to-be-heated surface. That way, there should be less calculations necessary, right?
raback
Site Admin
Posts: 4828
Joined: 22 Aug 2009, 11:57
Antispam: Yes
Location: Espoo, Finland
Contact:

Re: Heat Radiation issue with parallel runs

Post by raback »

Hi

Did you use the partitioning command with the -connect flag? You can view the partitioning when outputting in vtu format (number 5).

The critical parameter is number boundary elements associated with "diffuse gray" flag.

-Peter
Post Reply