Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools
From: Alexander Shabarshin (ashabarshin_at_[hidden])
Date: 2008-07-30 10:27:40


OK, thanks!

Is it possible to fix it somehow directly in 1.2.x codebase?

----- Original Message -----
From: "Terry Dontje" <Terry.Dontje_at_[hidden]>
To: <users_at_[hidden]>
Sent: Wednesday, July 30, 2008 7:15 AM
Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools

> One last note to close this out. After some discussion on the
> developers list it was pointed out that this problem was fixed with new
> code in the trunk and 1.3 branch. So my statement below of the trunk,
> 1.3 and CT8 EA2 supporting nodes on different subnets can be made
> stronger that we really do expect this to work.
>
> --td
> Terry Dontje wrote:
>> Terry Dontje wrote:
>>>>
>>>> Date: Tue, 29 Jul 2008 14:19:14 -0400
>>>> From: "Alexander Shabarshin" <ashabarshin_at_[hidden]>
>>>> Subject: Re: [OMPI users] Communitcation between OpenMPI and
>>>> ClusterTools
>>>> To: <users_at_[hidden]>
>>>> Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7_at_Shabarshin>
>>>> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>>>> reply-type=response
>>>>
>>>> Hello
>>>>
>>>>
>>>>>>>> >>> > One idea comes to mind is whether the two nodes are on the
>>>>>>>> same >>> > subnet? If they are not on the same subnet I think
>>>>>>>> there is a bug in >>> > which the TCP BTL will recuse itself
>>>>>>>> from communications between the >>> > two nodes.
>>>>>>>>
>>>>
>>>>
>>>>>> >> you are right - subnets are different, but routes set up
>>>>>> correctly and >> everything like ping, ssh etc. are working OK
>>>>>> between them
>>>>>>
>>>>
>>>>
>>>>> > But it isn't a routing problem but how the tcp btl in Open MPI
>>>>> decides > which interface the nodes can communicate with
>>>>> (completely out of the > hands of the TCP stack and lower).
>>>>>
>>>>
>>>> Do you know when it can be fixed in official OpenMPI?
>>>> Is patch available or something?
>>>>
>>> Well this problem is captured in ticket 972
>>> (https://svn.open-mpi.org/trac/ompi/ticket/972). There is a question
>>> as to whether this ticket has been fixed or not (that is was code
>>> actually putback). Sun's experience with the Trunk, 1.3 branch and
>>> CT8 EA2 release seems to be that you now can run jobs across subnets
>>> but we (Sun) are not completely
>>>
>> I guess I should have ended with "mumble..mumble" :-)
>> Now for the rest of the sentence:
>>
>> ... sure whether the support is truly in there or we just got lucky in
>> how our setup was configured.
>>
>> --td
>>> FWIW, it looks like that code has had a lot of changes in it between
>>> 1.2 and 1.3.
>>>
>>> --td
>>>
>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users