Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] rdma_connect() failure
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2008-10-05 14:45:59


Hi Jeff,

I tried to test the latest hg tree but it failes from time to time

it happens on different machines with different errors ( see attached file )

It also failes when ib0 is set to slave mode due to bonding, but I am sure
that it happens "by design".

Lenny.

On 9/29/08, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>
> Annnnddd.... the pendulum swings back the other way now. :-)
>
> See the ticket for details: https://svn.open-mpi.org/trac/ompi/ticket/1540
>
> Short version: OMPI now just "figures it out" and does the right thing.
>
>
> On Sep 28, 2008, at 7:27 AM, Jeff Squyres wrote:
>
> Actually, I thought about this one more, and I have concluded that we do
>> *not* want to do this (i.e., allow RDMA CM to send requests for port A from
>> port B. If we do this, then it would be possible that *all* traffic will go
>> the "wrong" way. More specifically, OMPI will not have direct control over
>> what traffic goes over what port -- and that would be Bad.
>>
>> So we'll still lookup the peer based on the address where the connect
>> request came from, and I'll eventually add a FAQ item about it (because IP
>> addressing is much more flexible than IB addressing, and netadmins may be
>> tempted to use a "flat" address space).
>>
>>
>>
>> On Sep 26, 2008, at 5:53 PM, Jeff Squyres wrote:
>>
>> On Sep 26, 2008, at 5:45 PM, Jeff Squyres wrote:
>>>
>>> I actually spent all afternoon diagnosing something that I'll turn into
>>>> a FAQ entry (OMPI's RDMA CM TCP addressing requirements are stronger than
>>>> TCP's legal addressing rules). In short, OMPI needs the RDMA CM to
>>>> guarantee that requests to connect to port A come from port A. If you have
>>>> a "flat" network address space, RDMA CM may actually issue a connect request
>>>> for port A from port B. This causes OMPI to get confused because it will
>>>> not find the right BTL openib endpoint to connect to.
>>>>
>>>
>>>
>>> And... crap. We can fix this one, too.
>>>
>>> Right now, we use the IP address from the incoming RDMA CM event ID to
>>> determine who the caller is. But we could easily embed the IP address
>>> (i.e., endpoint designator) in the private data in the event so that the
>>> peer can look at *that* address to identify who the peer is (rather than the
>>> address embedded in the event ID).
>>>
>>> This is actually what the IB CM CPC does, IIRC.
>>>
>>> Blah. This is also not hard, but it's another task for later. :-)
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>>
>
> --
> Jeff Squyres
> Cisco Systems
>
>