MPI error on Tutorial

The graphical user interface of Elmer
Post Reply
FlyWheel
Posts: 4
Joined: 26 Sep 2009, 01:05

MPI error on Tutorial

Post by FlyWheel »

Elmer is so cool! 8-)

I am using the Linux Ubuntu 64 bit package for Elmer. When I use the parallel solver on the loaded elastic beam tutorial it goes almost to completion and then I get the following output. I am using 4 processors on my dual quad core xeon machine. On this size of problem it seems to go faster than with all 8, but I get the same error any time I use MPI.

StressSolve: -------------------------------------
StressSolve: DISPLACEMENT SOLVER ITERATION 5
StressSolve: -------------------------------------
StressSolve:
StressSolve: Starting assembly...
StressSolve: Assembly done
StressSolve: Set boundaries done
ERROR:: IterSolve: Failed convergence tolerances.
ComputeChange: NS (ITER=5) (NRM,RELC): ( 0.39235873E+20 0.14462802E-08 ) :: linear elasticity
StressSolve: Result Norm : 3.92358734871225876E+019
StressSolve: Relative Change : 1.44628017259738065E-009
OptimizeBandwidth: ---------------------------------------------------------
OptimizeBandwidth: Computing matrix structure for: calculate stresses...done.
OptimizeBandwidth: Half bandwidth without optimization: 265
OptimizeBandwidth:
OptimizeBandwidth: Bandwidth Optimization ...done.
OptimizeBandwidth: Half bandwidth after optimization: 520
OptimizeBandwidth: Bandwidth optimization rejected, using original ordering.
OptimizeBandwidth: ---------------------------------------------------------
[FlyWheel:13015] *** An error occurred in MPI_Allreduce
[FlyWheel:13015] *** on communicator MPI_COMM_WORLD
[FlyWheel:13015] *** MPI_ERR_COMM: invalid communicator
[FlyWheel:13015] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 13015 on
node FlyWheel exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[FlyWheel:12981] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[FlyWheel:12981] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
mal
Site Admin
Posts: 54
Joined: 21 Aug 2009, 14:21

Re: MPI error on Tutorial

Post by mal »

Hi,

Could you please try unlimiting the stack size with "ulimit -s unlimited" before launching ElmerGUI?

Does your model complete with the serial solver?
FlyWheel
Posts: 4
Joined: 26 Sep 2009, 01:05

Re: MPI error on Tutorial

Post by FlyWheel »

Hi Mal,

Thanks for the quick reply. I have been poking around at it all day. The serial solver works and the results look OK. I had to run ElmerGUI from the shell for the ulimit to have an effect. I still get the same results, except that sometimes it only goes to 4 iterations. To get the MPI to work initially I had to install openssh, and it asks me for my password to start it. I am wondering if a spawned off process tries to reconnect or something.

I wrote a little program today to read a binary .stl file and output a text .stl file. ElmerGUI would not import a .stl binary file from Blender or solidworks. When I run this program to generate a text version (in the .stl text format) with a .stl extension it opens just fine.

Here it is:
Enjoy! :ugeek:

Code: Select all

// STL checker
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
    float x;
    float y;
    float z;
} Float_Coord;

typedef struct {
    Float_Coord normal;
    Float_Coord point1;
    Float_Coord point2;
    Float_Coord point3;
    unsigned short colour;
} STL_Facet;

#define HEADER_SIZE      80
#define MAX_HEADER_WORDS 10

int main(int argc, char* argv[])
{
    char header[HEADER_SIZE + 1];
    char *name[MAX_HEADER_WORDS];
    unsigned long num_facets = 0;
    unsigned long i = 0;
    STL_Facet *facet = NULL;
    FILE *fp = NULL;
    FILE *outfp = NULL;

    header[80] = 0;

    if(argc < 2)
        fprintf(stdout, "Usage %s infile [outfile]\n", argv[0]);

    fprintf(stdout, "Sizeof float %d Sizeof double %d\n",
            (int)sizeof(float), (int)sizeof(double));

    if((fp = fopen(argv[1], "rb")) != NULL)
    {
        fread(header, 80, 1, fp);

        name[i] = strtok(header, " ");
        while(name[i] != NULL)
        {
            i++;
            name[i] = strtok(NULL, " ");
        }

        fprintf(stdout, "HEADER:");

        i = 0;
        while(name[i] != NULL)
        {
            fprintf(stdout, " %s", name[i]);
            i++;
        }

        fprintf(stdout, "\n");

        fread(&num_facets, 4, 1, fp);
        fprintf(stdout, "Num Facets = %d\n", num_facets);

        if((facet = (STL_Facet*)malloc(num_facets * sizeof(STL_Facet)))
           == NULL)
        {
            fprintf(stdout, "Could not malloc %d facets.\n", num_facets);
            fclose(fp);
            exit (0);
        }

        for(i = 0; i < num_facets; i++)
        {
            fprintf(stdout, "Facet: %d\n", i);

            fread(&facet[i].normal.x, sizeof(float) , 1, fp);
            fread(&facet[i].normal.y, sizeof(float) , 1, fp);
            fread(&facet[i].normal.z, sizeof(float) , 1, fp);
            fprintf(stdout, "Normal: %f, %f, %f\n",
                facet[i].normal.x, facet[i].normal.y, facet[i].normal.z);

            fread(&facet[i].point1.x, sizeof(float) , 1, fp);
            fread(&facet[i].point1.y, sizeof(float) , 1, fp);
            fread(&facet[i].point1.z, sizeof(float) , 1, fp);
            fprintf(stdout, "point1: %f, %f, %f\n",
                facet[i].point1.x, facet[i].point1.y, facet[i].point1.z);

            fread(&facet[i].point2.x, sizeof(float) , 1, fp);
            fread(&facet[i].point2.y, sizeof(float) , 1, fp);
            fread(&facet[i].point2.z, sizeof(float) , 1, fp);
            fprintf(stdout, "point2: %f, %f, %f\n",
                facet[i].point2.x, facet[i].point2.y, facet[i].point2.z);

            fread(&facet[i].point3.x, sizeof(float) , 1, fp);
            fread(&facet[i].point3.y, sizeof(float) , 1, fp);
            fread(&facet[i].point3.z, sizeof(float) , 1, fp);
            fprintf(stdout, "point3: %f, %f, %f\n",
                facet[i].point3.x, facet[i].point3.y, facet[i].point3.z);

            fread(&facet[i].colour, sizeof(unsigned short) , 1, fp);
            fprintf(stdout, "Colour: %d\n", facet[i].colour);
        }
        fclose(fp);

    } else
    {
        fprintf(stdout, "Could not open %s\n", argv[1]);
        exit(0);
    }

    if(argc == 3) // There is an output file specified.
    
    {
        if((outfp = fopen(argv[2], "wb")) != NULL)
        {
            if(name[0] != NULL)
                fprintf(outfp, "%s", name[0]);

            i = 1;
            while(name[i] != NULL)
            {
                fprintf(outfp, " %s", name[i]);
                i++;
            }

            fprintf(outfp, "\n");

            for(i = 0; i < num_facets; i++)
            {
                fprintf(outfp, "facet normal %f %f %f\n",
                    facet[i].normal.x, facet[i].normal.y, facet[i].normal.z);

                fprintf(outfp, "  outer loop\n", header);

                fprintf(outfp, "    vertex %f %f %f\n",
                    facet[i].point1.x, facet[i].point1.y, facet[i].point1.z);
                fprintf(outfp, "    vertex %f %f %f\n",
                    facet[i].point2.x, facet[i].point2.y, facet[i].point2.z);
                fprintf(outfp, "    vertex %f %f %f\n",
                    facet[i].point3.x, facet[i].point3.y, facet[i].point3.z);

                fprintf(outfp, "  endloop\n");

                fprintf(outfp, "end facet\n");
            }

            if(name[0] != NULL)
                fprintf(outfp, "end%s", name[0]);

            i = 1;
            while(name[i] != NULL)
            {
                fprintf(outfp, " %s", name[i]);
                i++;
            }

            fprintf(outfp, "\n");

            fclose(outfp);

        } else
        {
            fprintf(stdout, "Could not open %s\n", argv[2]);
            exit(0);
        }
    }

    return 0;
}
mal
Site Admin
Posts: 54
Joined: 21 Aug 2009, 14:21

Re: MPI error on Tutorial

Post by mal »

FlyWheel wrote:I still get the same results, except that sometimes it only goes to 4 iterations. To get the MPI to work initially I had to install openssh, and it asks me for my password to start it. I am wondering if a spawned off process tries to reconnect or something.
Unfortunately I'm unable to reproduce the error you reported.

Did you build your MPI library from source or are you using precompiled packages provided by your Linux distribution?

This is my compilation script for a Debian based system with OpenMPI installed from the repository (libopenmpi-dev openmpi-bin):

Code: Select all

#!/bin/sh -f 

export CC=mpicc.openmpi
export CXX=mpic++.openmpi
export FC=mpif90.openmpi
export F77=mpif90.openmpi

export ELMER_HOME=/usr/local

modules="matc umfpack mathlibs elmergrid meshgen2d eio hutiter fem" 
for m in $modules; do
  cd $m
  ./configure --with-mpi=yes --with-mpi-dir=/usr --prefix=$ELMER_HOME
  make clean
  make
  sudo make install
  cd .. 
done
FlyWheel
Posts: 4
Joined: 26 Sep 2009, 01:05

Re: MPI error on Tutorial

Post by FlyWheel »

Thanks for the great build script. I did not know that those mpi compilers existed, but there they were on my machine! :o The script worked great, but I played a long time to get the ElmerGUI itself to compile. Finally got all the qt4 stuff sorted out. I still get the same error once it finally converges (about 14 iterations with mpi vs 3 or 4 with serial).

PS. I did not use the mpi compilers for ElmerGUI of ELMERGUIlogger. I just used the qmake stuff which used my g++.
Post Reply