Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-09-22 13:32:14


We really should discuss that as a group first - there is quite a bit
of code required to actually support multi-clusters that has been
removed.

Our operational model that was agreed to quite a while ago is that
mpirun can -only- extend over a single "cell". You can connect/accept
multiple mpiruns that are sitting on different cells, but you cannot
execute a single mpirun across multiple cells.

Please keep this on your own development branch for now. Bringing it
into the trunk will require discussion as this changes the operating
model, and has significant code consequences when we look at abnormal
terminations, comm_spawn, etc.

Thanks
Ralph

On Sep 22, 2008, at 11:26 AM, Richard Graham wrote:

> This check in was in error - I had not realized that the checkout
> was from
> the 1.3 branch, so we will fix this, and put these into the trunk
> (1.4). We
> are going to bring in some limited multi-cluster support - limited
> is the
> operative word.
>
> Rich
>
>
> On 9/22/08 12:50 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>
>> I notice that Ken Matney (the committer) is not on the devel list; I
>> added him explicitly to the CC line.
>>
>> Ken: please see below.
>>
>>
>> On Sep 22, 2008, at 12:46 PM, Ralph Castain wrote:
>>
>>> Whoa! We made a decision NOT to support multi-cluster apps in OMPI
>>> over a year ago!
>>>
>>> Please remove this from 1.3 - we should discuss if/when this would
>>> even be allowed in the trunk.
>>>
>>> Thanks
>>> Ralph
>>>
>>> On Sep 22, 2008, at 10:35 AM, matney_at_[hidden] wrote:
>>>
>>>> Author: matney
>>>> Date: 2008-09-22 12:35:54 EDT (Mon, 22 Sep 2008)
>>>> New Revision: 19600
>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/19600
>>>>
>>>> Log:
>>>> Added member to orte_node_t to enable multi-cluster jobs in ALPS
>>>> scheduled systems (like Cray XT).
>>>>
>>>> Text files modified:
>>>> branches/v1.3/orte/runtime/orte_globals.h | 4 ++++
>>>> 1 files changed, 4 insertions(+), 0 deletions(-)
>>>>
>>>> Modified: branches/v1.3/orte/runtime/orte_globals.h
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> =
>>>> ===================================================================
>>>> --- branches/v1.3/orte/runtime/orte_globals.h (original)
>>>> +++ branches/v1.3/orte/runtime/orte_globals.h 2008-09-22 12:35:54
>>>> EDT (Mon, 22 Sep 2008)
>>>> @@ -222,6 +222,10 @@
>>>> /** Username on this node, if specified */
>>>> char *username;
>>>> char *slot_list;
>>>> + /** Clustername (machine name of cluster) on which this node
>>>> + resides. ALPS scheduled systems need this to enable
>>>> + multi-cluster support. */
>>>> + char *clustername;
>>>> } orte_node_t;
>>>> ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_node_t);
>>>>
>>>> _______________________________________________
>>>> svn mailing list
>>>> svn_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel