Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] ORTE process name,, nodeid..
From: Shipman, Galen M. (gshipman_at_[hidden])
Date: 2007-11-22 14:11:35


I really don't have the background to discuss ORTE architectural decisions I
will leave this to others.

What I need is the following:

The ability to support Shared Memory on CNL. This requires knowing if a proc
is local and a mechanism to solve the SM init race condition. Here are the
constraints as I see them:

- Don't want to #if the code
- Need to support this in 1.3 and probably 1.2 depending on release of 1.3
- This shouldn't be a "hack" in 1.3
- 1.2 may be a bit more of a hack as we are talking a back port with a much
shorter maintenance time-frame than 1.3

Looks to me like we need a conference call to discuss this. Would sometime
next week work?

Happy Thanksgiving all! I'm off to eat entirely too much..

- Galen

On 11/19/07 10:32 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:

>
>
>
> On 11/19/07 6:20 PM, "Tim Prins" <tprins_at_[hidden]> wrote:
>
>> On Monday 19 November 2007 09:42:21 am Ralph H Castain wrote:
>> <snip>
>>> An alternative solution might be to incorporate the modex in the new OMPI
>>> framework I was about to create anyway. This framework was intended to deal
>>> with publish/lookup of OMPI data to support a variety of methods.
>>> Originally, we had intended only to include support there for things
>>> specifically related to MPI_Publish etc., but there is no reason we
>>> couldn't generalize it to support the general exchange of process "how to
>>> connect to me" info and include a modex API in it. I was figuring we would
>>> need two immediate components in it anyway: an ORTE one for when we have
>>> full ORTE support in the system, and a CNOS one that would...well, I guess
>>> just bark and say "you can't do publish/lookup on a Cray". It would be
>>> simple to add the modex stuff there, and makes some logical sense as well.
>> I think this approach is fundamentally flawed. Our frameworks are designed to
>> abstract out something, to allow for multiple implementations. However, doing
>> this would put two completely different things (the modex and the MPI
>> pub/sub) together in one framework. While this may be convenient for the
>> cray, it would be very inconvenient for someone who wanted to do the MPI
>> pub/sub via a ldap server (as has been discussed). The key here is that MPI
>> pub/sub is for very small amounts of data, accessed infrequently and in a
>> non-performance-critical manner, whereas the modex is for rather large
>> amounts of information (on big jobs) that has to be exchanged efficiently.
>
> Actually, several people talked about this before we proposed it and came to
> a different conclusion. The modex is in essence a "here's how to talk to me"
> communication, which is the same intent of publish/lookup. I agree that the
> volume of data involved is different. However, we are -not- proposing to use
> the same mechanism for the two (modex vs. pub/lookup).
>
> The proposal was based on the fact that the publish/lookup and modex
> effectively use similar mechanisms - i.e., the orte component would use the
> RML as the underlying communication mechanism. In contrast, the cray
> component has alternative non-RML based mechanisms for both systems.
>
> Things like the LDAP server pose an interesting challenge. In that case, the
> publish/lookup cannot use the RML as LDAP has no understanding of that comm
> mode. The modex, however, might - and might not - use that mechanism.
> Accordingly, the plan was to provide base functions that use RML for any
> component that can and wants to do so. This is identical to the approach we
> use throughout the code base.
>
> However, we do need the modex in a framework somewhere as we will need to
> modify it to support tight integration with various environments. I cannot
> see doing every tight integration with yet another RSL component as the code
> duplication gets absurd - there isn't enough difference to support it. I
> also, though, don't want to be forced to use the same modex in every case if
> the native environment can provide an alternative method - having the modex
> in the framework solves that problem.
>
> So I guess I don't grok the issue here - what is wrong with having a modex
> API in the pub/sub framework??? Other than causing you some additional merge
> issues within RSL, I fail to understand why this is a problem.
>
>
>>
>> Before anyone misunderstands, I am *not* proposing that we add a modex
>> framework to ompi. Rather, I think this is a case where the RSL could make
>> things really easy.
>>
>> The RSL defines a process attribute system. One of the original ideas (later
>> retracted, but now that I think about it I may re-add it) was to have some
>> predefined attribute keys, that the runtime would set so we could look up
>> information about any process.
>>
>> So in the case of the cray, the rsl_init function would query to get all the
>> info it wants, and then populate the info into its (local) process attribute
>> data store.
>>
>> In other systems each process would set the information in rsl_init and it
>> would be exchanged in the normal modex method.
>>
>> Then, the information would be looked up (locally) using the 'get' function
>> on
>> both platforms.
>>
>> Simple, eh?
>
> Maybe - and maybe not. The devil is always in the details. My concerns with
> the RSL have been documented and wildly misunderstood. I still fail to see
> the overall advantage as it seems we get different explanations every time
> we ask. But I'll set that aside here.
>
> FWIW: The publish/lookup interface was specifically required to support both
> local and remote data storage operations, though that doesn't really apply
> to the modex.
>
>>
>> As an alternative to this, I think we could apply these same ideas into a
>> specialized ORTE system, but it would not be as clean, and would tie our
>> system closer to ORTE. I am not going to argue whether this is good or bad,
>> but I am just mentioning it as a consequence.
>
> My concern right now is that doing it in RSL means (as we chatted about
> offline) integrating RSL into the OMPI trunk NOW - either directly or as
> part of the orte revision branch. This will certainly delay getting the ORTE
> revision done, maybe by as much as 3 months or more (IMHO). I will contact
> LANL management to seek their input on this matter, but I doubt they will be
> supportive as such a delay will cause LANL to miss several critical
> RoadRunner milestones - which would almost certainly negatively impact our
> RoadRunner commercial partners as well.
>
> Alternatively, I suppose we could just fork the code base at this time, and
> I'll complete the orte revisions on a LANL server. I hate to do this,
> though, as it means someone (LANL, IBM, Voltaire, some combination, or
> whomever) will be left with the problem of dealing with either re-merging
> the branches or supporting a split code. I only offer it as an option we
> could consider, if necessary.
>
> Given those potential consequences, it would really help to have some
> substantive reason -why- the framework is unacceptable. I grok that you feel
> the RSL offers a possibly better alternative, but why does that mean we
> shouldn't do the framework now and worry about that if/when the RSL is
> proposed for production?
>
>>
>> Tim
>>
>>>
>>> If that makes sense, we can implement the latter approach on the branch
>>> where we are doing the next major ORTE revision - that's where I was going
>>> to create the new framework anyway.
>>>
>>> Ralph
>>>
>>> On 11/16/07 10:24 PM, "Shipman, Galen M." <gshipman_at_[hidden]> wrote:
>>>> I am doing some work on Cray's CNL to support shared memory. To support
>>>> shared memory I need to know if processes are local or remote. For other
>>>> systems we simply use the modex in ompi_proc_get_info to get the proc's
>>>> nodeid. When using CNOS I don't need the modex to get a remote processes
>>>> nodeid. In fact, I can obtain every processes pid and nodeid (nid/pid)
>>>> via a single CNOS call.
>>>>
>>>> I have explored a couple of ways to populate the proc structures on the
>>>> CRAY. One involves using #if's to do special things in
>>>> ompi_proc_get_info. I don't like this. The second method involves adding
>>>> a CNOS nameserver and modifying the orte_process_name_t to include the
>>>> orte_nodeid_t so that the nameserver can populate all the info if it can.
>>>> Prior to this change, the orte_nodeid_t was in ompi_proc_t, which doesn't
>>>> make any sense to me, it is an orte level concept and it is only
>>>> accessible in the ompi side. I also don't like adding orte_nodeid_t to
>>>> orte_process_name_t as it really doesn't have anything to do with the a
>>>> name.. I think it makes more sense to have an orte_proc_t.. Something
>>>> like the following structure:
>>>>
>>>>
>>>>
>>>> struct orte_process_name_t {
>>>> orte_jobid_t jobid; /**< Job number */
>>>> orte_vpid_t vpid; /**< Process number */
>>>> /** "nodeid" on which the proc resides */
>>>> };
>>>>
>>>> Struct orte_proc_t {
>>>> opal_list_item_t super;
>>>> orte_process_name_t proc_name;
>>>> orte_nodeid_t nid;
>>>> };
>>>>
>>>> struct ompi_proc_t {
>>>> orte_proc_t base;
>>>> ..... Etc .....
>>>>
>>>> };
>>>>
>>>>
>>>> I know there is some talk about removing the process names,,, not sure
>>>> how that fits in here but this is what makes sense to me given the
>>>> current architecture. Any thoughts here?
>>>>
>>>>
>>>> - Galen
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel