Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpi error?
From: Peter Kjellstrom (cap_at_[hidden])
Date: 2010-03-11 10:42:16


On Thursday 11 March 2010, Matthew MacManes wrote:
> Can anybody tell me if this is an error associated with openmpi, versus an
> issue with the program I am running (MRBAYES,
> https://sourceforge.net/projects/mrbayes/)
>
> We are trying to run a large simulated dataset using 1,000,000 bases
> divided up into 1000 genes, 5 taxa.. An error is occurring, but we are not
> sure why. We are using the MPI version of MRBAYES v3.2-cvs on a linux
> 16core 24GB RAM machine. It does not appear as if the program runs out of
> memory (max memory usage is 13gb). Maybe this is an OpenMPI problem and
> not related to MrBayes...
>
> See snippet of error message below. Can anybody give me any hints about the
> source of the problem?
>
> I am using OPENMPI version 1.4.1.
>
> ...
> Defining charset called gene997
> Defining charset called gene998
> Defining charset called gene999
> Defining charset called gene1000
> Defining partition called Genes
> [macmanes:02546] *** Process received signal ***
> [macmanes:02546] Signal: Segmentation fault (11)
> [macmanes:02546] Signal code: Address not mapped (1)
> [macmanes:02546] Failing at address: (nil)
> [macmanes:02546] [ 0] /lib/libpthread.so.0 [0x7ffd0f322190]
> [macmanes:02546] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 13 with PID 2546 on node macmanes exited
> on signal 11 (Segmentation fault).

On of the ranks got a "Segmentation fault". This would typically indicate a
problem with the app not the MPI. Maybe you ran out of stack space?
(ulimit -s).

Have you tried a different/lower number of ranks?

/Peter