Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] trunk hangs since r19010
From: Brad Benton (bradford.benton_at_[hidden])
Date: 2008-07-28 18:01:41


On Mon, Jul 28, 2008 at 12:08 PM, Terry Dontje <Terry.Dontje_at_[hidden]> wrote:

> Jeff Squyres wrote:
>
>> On Jul 28, 2008, at 12:03 PM, George Bosilca wrote:
>>
>> Interesting. The self is only used for local communications. I don't
>>> expect that any benchmark execute such communications, but apparently I was
>>> wrong. Please let me know the failing test, I will take a look this evening.
>>>
>>
>> FWIW, my manual tests of a simplistic "ring" program work for all
>> combinations (openib, openib+self, openib+self+sm). Shrug.
>>
>> But for OSU latency, I found that openib, openib+sm work, but
>> openib+sm+self hangs (same results whether the 2 procs are on the same node
>> or different nodes). There is no self communication in osu_latency, so
>> something else must be going on.
>>
>> Is it something to do with the MPI_Barrier call? osu_latency uses
> MPI_Barrier and from rhc's email it sounds like his code does too.

I don't think it's an issue with MPI_Barrier(). I'm running into this
problem with srtest.c (one of the example programs from the mpich
distribution). It's a ring-type test with no barriers until the end, yet it
hangs on the very first Send/Recv pair from rank0 to rank1.

I my case, openib and openib+sm works, but openib+self & openib+sm+self
hang.

--brad

>
> --td
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>