Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2008-07-30 07:15:04


One last note to close this out. After some discussion on the
developers list it was pointed out that this problem was fixed with new
code in the trunk and 1.3 branch. So my statement below of the trunk,
1.3 and CT8 EA2 supporting nodes on different subnets can be made
stronger that we really do expect this to work.

--td
Terry Dontje wrote:
> Terry Dontje wrote:
>>>
>>> Date: Tue, 29 Jul 2008 14:19:14 -0400
>>> From: "Alexander Shabarshin" <ashabarshin_at_[hidden]>
>>> Subject: Re: [OMPI users] Communitcation between OpenMPI and
>>> ClusterTools
>>> To: <users_at_[hidden]>
>>> Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7_at_Shabarshin>
>>> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>>> reply-type=response
>>>
>>> Hello
>>>
>>>
>>>>>>> >>> > One idea comes to mind is whether the two nodes are on the
>>>>>>> same >>> > subnet? If they are not on the same subnet I think
>>>>>>> there is a bug in >>> > which the TCP BTL will recuse itself
>>>>>>> from communications between the >>> > two nodes.
>>>>>>>
>>>
>>>
>>>>> >> you are right - subnets are different, but routes set up
>>>>> correctly and >> everything like ping, ssh etc. are working OK
>>>>> between them
>>>>>
>>>
>>>
>>>> > But it isn't a routing problem but how the tcp btl in Open MPI
>>>> decides > which interface the nodes can communicate with
>>>> (completely out of the > hands of the TCP stack and lower).
>>>>
>>>
>>> Do you know when it can be fixed in official OpenMPI?
>>> Is patch available or something?
>>>
>> Well this problem is captured in ticket 972
>> (https://svn.open-mpi.org/trac/ompi/ticket/972). There is a question
>> as to whether this ticket has been fixed or not (that is was code
>> actually putback). Sun's experience with the Trunk, 1.3 branch and
>> CT8 EA2 release seems to be that you now can run jobs across subnets
>> but we (Sun) are not completely
>>
> I guess I should have ended with "mumble..mumble" :-)
> Now for the rest of the sentence:
>
> ... sure whether the support is truly in there or we just got lucky in
> how our setup was configured.
>
> --td
>> FWIW, it looks like that code has had a lot of changes in it between
>> 1.2 and 1.3.
>>
>> --td
>>
>
>