Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] SM init failures
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-03-26 12:07:49


Ralph Castain wrote:

> Hi folks

Er, perhaps pronounced "Eugene". :^(

> It looks like the SM revisions we inserted into 1.3.2 are a great
> detector for shared memory init failures

How delicately put! I appreciate the gentleness.

> - it segfaulted 143 times last night on IU's sif computer, 34 times
> on Sun/Linux, and 3 times on Sun/SunOS...almost every single time due
> to "Address not mapped" errors in the sm btl during init.

Any guess as to frequency or what it'd take for me to reproduce this? I
tried with 1.3.1... 200K times and no failures on np=8 MPI_Init() jobs.
I'm starting now with a single-queue version, but wouldn't be surprised
if, again, I can't turn anything up.

> Might be worth someone looking at the MTT output stack traces -this
> is something that now appears to be reproducible, and should be
> addressed before we release 1.3.2 as it seems far more likely to
> happen than in the past.

Great (in a weird way, I guess). Can you tell me how to look at the MTT
output stack traces? I found http://www.open-mpi.org/projects/mtt/ but
expect it'll take me awhile to wade through that.