Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2008-07-29 15:23:12


Terry Dontje wrote:
>>
>> Date: Tue, 29 Jul 2008 14:19:14 -0400
>> From: "Alexander Shabarshin" <ashabarshin_at_[hidden]>
>> Subject: Re: [OMPI users] Communitcation between OpenMPI and
>> ClusterTools
>> To: <users_at_[hidden]>
>> Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7_at_Shabarshin>
>> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>> reply-type=response
>>
>> Hello
>>
>>
>>>>>> >>> > One idea comes to mind is whether the two nodes are on the
>>>>>> same >>> > subnet? If they are not on the same subnet I think
>>>>>> there is a bug in >>> > which the TCP BTL will recuse itself from
>>>>>> communications between the >>> > two nodes.
>>>>>>
>>
>>
>>>> >> you are right - subnets are different, but routes set up
>>>> correctly and >> everything like ping, ssh etc. are working OK
>>>> between them
>>>>
>>
>>
>>> > But it isn't a routing problem but how the tcp btl in Open MPI
>>> decides > which interface the nodes can communicate with (completely
>>> out of the > hands of the TCP stack and lower).
>>>
>>
>> Do you know when it can be fixed in official OpenMPI?
>> Is patch available or something?
>>
> Well this problem is captured in ticket 972
> (https://svn.open-mpi.org/trac/ompi/ticket/972). There is a question
> as to whether this ticket has been fixed or not (that is was code
> actually putback). Sun's experience with the Trunk, 1.3 branch and
> CT8 EA2 release seems to be that you now can run jobs across subnets
> but we (Sun) are not completely
>
I guess I should have ended with "mumble..mumble" :-)
Now for the rest of the sentence:

... sure whether the support is truly in there or we just got lucky in
how our setup was configured.

--td
> FWIW, it looks like that code has had a lot of changes in it between
> 1.2 and 1.3.
>
> --td
>