Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] trunk hangs since r19010
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-07-28 12:03:42


Interesting. The self is only used for local communications. I don't
expect that any benchmark execute such communications, but apparently
I was wrong. Please let me know the failing test, I will take a look
this evening.

   Thanks,
     george.

On Jul 28, 2008, at 5:56 PM, Ralph Castain wrote:

> I just re-tested to confirm, and that is correct.
>
> -mca btl openib works
> -mca btl openib,self hangs
> -mca btl openib,sm works
>
>
> On Jul 28, 2008, at 9:49 AM, George Bosilca wrote:
>
>> I'm a little bit lost here. You're stating that openib,self doesn't
>> work while openib does? In other words that adding self to the BTL
>> leads to deadlocks?
>>
>> george.
>>
>> PS: Btw, it is not supposed to work at all, except in the case
>> where openib handle internal messages (where the source and
>> destination is the same process).
>>
>> On Jul 28, 2008, at 5:05 PM, Ralph Castain wrote:
>>
>>>
>>> On Jul 28, 2008, at 8:52 AM, Lenny Verkhovsky wrote:
>>>
>>>> only openib works for me too,
>>>>
>>>> but Glebs said to me once that it's illigal and I always need to
>>>> use self btl.
>>>>
>>>
>>> Don't know - could be true. But if that is true, then we should
>>> check to see if that condition is met and error out - with an
>>> appropriate message - if so. Otherwise, how is a user supposed to
>>> know this condition?
>>>
>>>>
>>>>
>>>> On 7/28/08, Jeff Squyres <jsquyres_at_[hidden]> wrote: FWIW, all my
>>>> MTT runs are hanging as well.
>>>>
>>>>
>>>>
>>>> On Jul 28, 2008, at 10:37 AM, Brad Benton wrote:
>>>>
>>>> My experience is the same a Lenny's. I've tested on x86_64 and
>>>> ppc64 systems and tests using --mca btl openib,self hang in all
>>>> cases.
>>>>
>>>> --brad
>>>>
>>>>
>>>> 2008/7/28 Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]>
>>>> I failed to run on different nodes or on the same node via
>>>> self,openib
>>>>
>>>>
>>>>
>>>>
>>>> On 7/28/08, Ralph Castain <rhc_at_[hidden]> wrote:
>>>> I checked this out some more and I believe it is ticket #1378
>>>> related. We lock up if SM is included in the BTL's, which is what
>>>> I had done on my test. If I ^sm, I can run fine.
>>>>
>>>>
>>>> On Jul 28, 2008, at 6:41 AM, Ralph Castain wrote:
>>>>
>>>> It could also be something new. Brad and I noted on Fri that IB
>>>> was locking up as soon as we tried any cross-node communications.
>>>> Hadn't seen that before, and at least I haven't explored it
>>>> further - planned to do so today.
>>>>
>>>>
>>>> On Jul 28, 2008, at 6:01 AM, Lenny Verkhovsky wrote:
>>>>
>>>> I believe it it.
>>>>
>>>> On 7/28/08, Jeff Squyres <jsquyres_at_[hidden]> wrote: On Jul 28,
>>>> 2008, at 7:51 AM, Jeff Squyres wrote:
>>>>
>>>> Is this related to r1378?
>>>>
>>>> Gah -- I meant #1378, meaning the "PML ob1 deadlock" ticket.
>>>>
>>>>
>>>>
>>>> On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote:
>>>>
>>>> Hi,
>>>>
>>>> I experience hanging of tests ( latency ) since r19010
>>>>
>>>>
>>>> Best Regards
>>>>
>>>> Lenny.
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



  • application/pkcs7-signature attachment: smime.p7s