Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MTT tests: segv's with sm on large messages
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-05-05 18:13:22


On May 5, 2009, at 6:01 PM, Eugene Loh wrote:

> You and Terry saw something that was occurring about 0.01% of the time
> during MPI_Init during add_procs. That does not seem to be what we
> are
> seeing here.
>

Right -- that's what I'm saying. It's different than the MPI_INIT
errors.

> But we have seen failures in 1.3.1 and 1.3.2 that look like the one
> here. They occur more like 1% of the time and can occur during
> MPI_Init
> *OR* later during a collective call. What we're looking at here seems
> to be related. E.g., see
> http://www.open-mpi.org/community/lists/devel/2009/03/5768.php
>

Good to see that we're agreeing.

Yes, I agree that this is not a new error, but it is worth fixing.
Cisco's MTT didn't run last night because there was no new trunk
tarball last night. I'll check Cisco's MTT tomorrow morning and see
if there are any sm failures of this new flavor, and how frequently
they're happening.

-- 
Jeff Squyres
Cisco Systems