Elmer is so cool!
I am using the Linux Ubuntu 64 bit package for Elmer. When I use the parallel solver on the loaded elastic beam tutorial it goes almost to completion and then I get the following output. I am using 4 processors on my dual quad core xeon machine. On this size of problem it seems to go faster than with all 8, but I get the same error any time I use MPI.
StressSolve: -------------------------------------
StressSolve: DISPLACEMENT SOLVER ITERATION 5
StressSolve: -------------------------------------
StressSolve:
StressSolve: Starting assembly...
StressSolve: Assembly done
StressSolve: Set boundaries done
ERROR:: IterSolve: Failed convergence tolerances.
ComputeChange: NS (ITER=5) (NRM,RELC): ( 0.39235873E+20 0.14462802E-08 ) :: linear elasticity
StressSolve: Result Norm : 3.92358734871225876E+019
StressSolve: Relative Change : 1.44628017259738065E-009
OptimizeBandwidth: ---------------------------------------------------------
OptimizeBandwidth: Computing matrix structure for: calculate stresses...done.
OptimizeBandwidth: Half bandwidth without optimization: 265
OptimizeBandwidth:
OptimizeBandwidth: Bandwidth Optimization ...done.
OptimizeBandwidth: Half bandwidth after optimization: 520
OptimizeBandwidth: Bandwidth optimization rejected, using original ordering.
OptimizeBandwidth: ---------------------------------------------------------
[FlyWheel:13015] *** An error occurred in MPI_Allreduce
[FlyWheel:13015] *** on communicator MPI_COMM_WORLD
[FlyWheel:13015] *** MPI_ERR_COMM: invalid communicator
[FlyWheel:13015] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 13015 on
node FlyWheel exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[FlyWheel:12981] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[FlyWheel:12981] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
MPI error on Tutorial
Re: MPI error on Tutorial
Hi,
Could you please try unlimiting the stack size with "ulimit -s unlimited" before launching ElmerGUI?
Does your model complete with the serial solver?
Could you please try unlimiting the stack size with "ulimit -s unlimited" before launching ElmerGUI?
Does your model complete with the serial solver?
Re: MPI error on Tutorial
Hi Mal,
Thanks for the quick reply. I have been poking around at it all day. The serial solver works and the results look OK. I had to run ElmerGUI from the shell for the ulimit to have an effect. I still get the same results, except that sometimes it only goes to 4 iterations. To get the MPI to work initially I had to install openssh, and it asks me for my password to start it. I am wondering if a spawned off process tries to reconnect or something.
I wrote a little program today to read a binary .stl file and output a text .stl file. ElmerGUI would not import a .stl binary file from Blender or solidworks. When I run this program to generate a text version (in the .stl text format) with a .stl extension it opens just fine.
Here it is:
Enjoy!
Thanks for the quick reply. I have been poking around at it all day. The serial solver works and the results look OK. I had to run ElmerGUI from the shell for the ulimit to have an effect. I still get the same results, except that sometimes it only goes to 4 iterations. To get the MPI to work initially I had to install openssh, and it asks me for my password to start it. I am wondering if a spawned off process tries to reconnect or something.
I wrote a little program today to read a binary .stl file and output a text .stl file. ElmerGUI would not import a .stl binary file from Blender or solidworks. When I run this program to generate a text version (in the .stl text format) with a .stl extension it opens just fine.
Here it is:
Enjoy!
Code: Select all
// STL checker
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
float x;
float y;
float z;
} Float_Coord;
typedef struct {
Float_Coord normal;
Float_Coord point1;
Float_Coord point2;
Float_Coord point3;
unsigned short colour;
} STL_Facet;
#define HEADER_SIZE 80
#define MAX_HEADER_WORDS 10
int main(int argc, char* argv[])
{
char header[HEADER_SIZE + 1];
char *name[MAX_HEADER_WORDS];
unsigned long num_facets = 0;
unsigned long i = 0;
STL_Facet *facet = NULL;
FILE *fp = NULL;
FILE *outfp = NULL;
header[80] = 0;
if(argc < 2)
fprintf(stdout, "Usage %s infile [outfile]\n", argv[0]);
fprintf(stdout, "Sizeof float %d Sizeof double %d\n",
(int)sizeof(float), (int)sizeof(double));
if((fp = fopen(argv[1], "rb")) != NULL)
{
fread(header, 80, 1, fp);
name[i] = strtok(header, " ");
while(name[i] != NULL)
{
i++;
name[i] = strtok(NULL, " ");
}
fprintf(stdout, "HEADER:");
i = 0;
while(name[i] != NULL)
{
fprintf(stdout, " %s", name[i]);
i++;
}
fprintf(stdout, "\n");
fread(&num_facets, 4, 1, fp);
fprintf(stdout, "Num Facets = %d\n", num_facets);
if((facet = (STL_Facet*)malloc(num_facets * sizeof(STL_Facet)))
== NULL)
{
fprintf(stdout, "Could not malloc %d facets.\n", num_facets);
fclose(fp);
exit (0);
}
for(i = 0; i < num_facets; i++)
{
fprintf(stdout, "Facet: %d\n", i);
fread(&facet[i].normal.x, sizeof(float) , 1, fp);
fread(&facet[i].normal.y, sizeof(float) , 1, fp);
fread(&facet[i].normal.z, sizeof(float) , 1, fp);
fprintf(stdout, "Normal: %f, %f, %f\n",
facet[i].normal.x, facet[i].normal.y, facet[i].normal.z);
fread(&facet[i].point1.x, sizeof(float) , 1, fp);
fread(&facet[i].point1.y, sizeof(float) , 1, fp);
fread(&facet[i].point1.z, sizeof(float) , 1, fp);
fprintf(stdout, "point1: %f, %f, %f\n",
facet[i].point1.x, facet[i].point1.y, facet[i].point1.z);
fread(&facet[i].point2.x, sizeof(float) , 1, fp);
fread(&facet[i].point2.y, sizeof(float) , 1, fp);
fread(&facet[i].point2.z, sizeof(float) , 1, fp);
fprintf(stdout, "point2: %f, %f, %f\n",
facet[i].point2.x, facet[i].point2.y, facet[i].point2.z);
fread(&facet[i].point3.x, sizeof(float) , 1, fp);
fread(&facet[i].point3.y, sizeof(float) , 1, fp);
fread(&facet[i].point3.z, sizeof(float) , 1, fp);
fprintf(stdout, "point3: %f, %f, %f\n",
facet[i].point3.x, facet[i].point3.y, facet[i].point3.z);
fread(&facet[i].colour, sizeof(unsigned short) , 1, fp);
fprintf(stdout, "Colour: %d\n", facet[i].colour);
}
fclose(fp);
} else
{
fprintf(stdout, "Could not open %s\n", argv[1]);
exit(0);
}
if(argc == 3) // There is an output file specified.
{
if((outfp = fopen(argv[2], "wb")) != NULL)
{
if(name[0] != NULL)
fprintf(outfp, "%s", name[0]);
i = 1;
while(name[i] != NULL)
{
fprintf(outfp, " %s", name[i]);
i++;
}
fprintf(outfp, "\n");
for(i = 0; i < num_facets; i++)
{
fprintf(outfp, "facet normal %f %f %f\n",
facet[i].normal.x, facet[i].normal.y, facet[i].normal.z);
fprintf(outfp, " outer loop\n", header);
fprintf(outfp, " vertex %f %f %f\n",
facet[i].point1.x, facet[i].point1.y, facet[i].point1.z);
fprintf(outfp, " vertex %f %f %f\n",
facet[i].point2.x, facet[i].point2.y, facet[i].point2.z);
fprintf(outfp, " vertex %f %f %f\n",
facet[i].point3.x, facet[i].point3.y, facet[i].point3.z);
fprintf(outfp, " endloop\n");
fprintf(outfp, "end facet\n");
}
if(name[0] != NULL)
fprintf(outfp, "end%s", name[0]);
i = 1;
while(name[i] != NULL)
{
fprintf(outfp, " %s", name[i]);
i++;
}
fprintf(outfp, "\n");
fclose(outfp);
} else
{
fprintf(stdout, "Could not open %s\n", argv[2]);
exit(0);
}
}
return 0;
}
Re: MPI error on Tutorial
Unfortunately I'm unable to reproduce the error you reported.FlyWheel wrote:I still get the same results, except that sometimes it only goes to 4 iterations. To get the MPI to work initially I had to install openssh, and it asks me for my password to start it. I am wondering if a spawned off process tries to reconnect or something.
Did you build your MPI library from source or are you using precompiled packages provided by your Linux distribution?
This is my compilation script for a Debian based system with OpenMPI installed from the repository (libopenmpi-dev openmpi-bin):
Code: Select all
#!/bin/sh -f
export CC=mpicc.openmpi
export CXX=mpic++.openmpi
export FC=mpif90.openmpi
export F77=mpif90.openmpi
export ELMER_HOME=/usr/local
modules="matc umfpack mathlibs elmergrid meshgen2d eio hutiter fem"
for m in $modules; do
cd $m
./configure --with-mpi=yes --with-mpi-dir=/usr --prefix=$ELMER_HOME
make clean
make
sudo make install
cd ..
done
Re: MPI error on Tutorial
Thanks for the great build script. I did not know that those mpi compilers existed, but there they were on my machine! The script worked great, but I played a long time to get the ElmerGUI itself to compile. Finally got all the qt4 stuff sorted out. I still get the same error once it finally converges (about 14 iterations with mpi vs 3 or 4 with serial).
PS. I did not use the mpi compilers for ElmerGUI of ELMERGUIlogger. I just used the qmake stuff which used my g++.
PS. I did not use the mpi compilers for ElmerGUI of ELMERGUIlogger. I just used the qmake stuff which used my g++.