Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] SM init failures
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-03-26 17:25:16


Ralph Castain wrote:

> It looks like the SM revisions we inserted into 1.3.2 are a great
> detector for shared memory init failures - it segfaulted 143 times
> last night on IU's sif computer, 34 times on Sun/Linux, and 3 times
> on Sun/SunOS...almost every single time due to "Address not mapped"
> errors in the sm btl during init.
>
> Might be worth someone looking at the MTT output stack traces -this
> is something that now appears to be reproducible, and should be
> addressed before we release 1.3.2 as it seems far more likely to
> happen than in the past.

Okay. I look at http://www.open-mpi.org/mtt/index.php?do_redir=973

If we start with the 3 Sun/SunOS failures (row #7), these seem to be
labeled 1.3.1 ("MPI Version"). So, not 1.3.2. And, I don't know how to
make sense of the stack trace... there an "mca_common_sm_mmap_init"
ftruncate problem and stuff apparently much later on. How can this be?

The Sun/Linux problems must be row #6. Yes? Again, the "MPI Version"
is labeled 1.3.1. Is that informative or misleading? Lots of stacks
looking like this is happening during MPI_Init. I try running a code
that just does MPI_Init on similar configs and seem unable to trigger
this problem.

How do I figure out the compiler used?

I need help reproducing this problem.