Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] trunk hangs since r19010
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-07-28 12:03:42


Interesting. The self is only used for local communications. I don't
expect that any benchmark execute such communications, but apparently
I was wrong. Please let me know the failing test, I will take a look
this evening.

   Thanks,
     george.

On Jul 28, 2008, at 5:56 PM, Ralph Castain wrote:

> I just re-tested to confirm, and that is correct.
>
> -mca btl openib works
> -mca btl openib,self hangs
> -mca btl openib,sm works
>
>
> On Jul 28, 2008, at 9:49 AM, George Bosilca wrote:
>
>> I'm a little bit lost here. You're stating that openib,self doesn't
>> work while openib does? In other words that adding self to the BTL
>> leads to deadlocks?
>>
>> george.
>>
>> PS: Btw, it is not supposed to work at all, except in the case
>> where openib handle internal messages (where the source and
>> destination is the same process).
>>
>> On Jul 28, 2008, at 5:05 PM, Ralph Castain wrote:
>>
>>>
>>> On Jul 28, 2008, at 8:52 AM, Lenny Verkhovsky wrote:
>>>
>>>> only openib works for me too,
>>>>
>>>> but Glebs said to me once that it's illigal and I always need to
>>>> use self btl.
>>>>
>>>
>>> Don't know - could be true. But if that is true, then we should
>>> check to see if that condition is met and error out - with an
>>> appropriate message - if so. Otherwise, how is a user supposed to
>>> know this condition?
>>>
>>>>
>>>>
>>>> On 7/28/08, Jeff Squyres <jsquyres_at_[hidden]> wrote: FWIW, all my
>>>> MTT runs are hanging as well.
>>>>
>>>>
>>>>
>>>> On Jul 28, 2008, at 10:37 AM, Brad Benton wrote:
>>>>
>>>> My experience is the same a Lenny's. I've tested on x86_64 and
>>>> ppc64 systems and tests using --mca btl openib,self hang in all
>>>> cases.
>>>>
>>>> --brad
>>>>
>>>>
>>>> 2008/7/28 Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]>
>>>> I failed to run on different nodes or on the same node via
>>>> self,openib
>>>>
>>>>
>>>>
>>>>
>>>> On 7/28/08, Ralph Castain <rhc_at_[hidden]> wrote:
>>>> I checked this out some more and I believe it is ticket #1378
>>>> related. We lock up if SM is included in the BTL's, which is what
>>>> I had done on my test. If I ^sm, I can run fine.
>>>>
>>>>
>>>> On Jul 28, 2008, at 6:41 AM, Ralph Castain wrote:
>>>>
>>>> It could also be something new. Brad and I noted on Fri that IB
>>>> was locking up as soon as we tried any cross-node communications.
>>>> Hadn't seen that before, and at least I haven't explored it
>>>> further - planned to do so today.
>>>>
>>>>
>>>> On Jul 28, 2008, at 6:01 AM, Lenny Verkhovsky wrote:
>>>>
>>>> I believe it it.
>>>>
>>>> On 7/28/08, Jeff Squyres <jsquyres_at_[hidden]> wrote: On Jul 28,
>>>> 2008, at 7:51 AM, Jeff Squyres wrote:
>>>>
>>>> Is this related to r1378?
>>>>
>>>> Gah -- I meant #1378, meaning the "PML ob1 deadlock" ticket.
>>>>
>>>>
>>>>
>>>> On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote:
>>>>
>>>> Hi,
>>>>
>>>> I experience hanging of tests ( latency ) since r19010
>>>>
>>>>
>>>> Best Regards
>>>>
>>>> Lenny.
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



  • application/pkcs7-signature attachment: smime.p7s