Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph Castain (rhc_at_[hidden])
Date: 2007-01-29 20:57:22


On further thought, perhaps I should be clearer. If you are saying that you
need to read the hostfile to display the cluster *before* the user actually
submits a job for execution, then fine - go ahead and call rds.query.

What I'm trying to communicate to you is that you need to call setup_job
when you are launching the resulting application. If you want, you could do
the following:

1. call orte_rds.query(ORTE_JOBID_INVALID) to get your host info. Note that
only a hostfile will be read here - so if you are in (for example) a bproc
environment, you won't get any node info at this point.

2. when you are ready to launch the app, call orte_rmgr.spawn with an
attribute list that contains ORTE_RMGR_SPAWN_FLOW with a value of
ORTE_RMGR_SETUP | ORTE_RMGR_ALLOC | ORTE_RMGR_MAP | ORTE_RMGR_SETUP_TRIGS |
ORTE_RMGR_LAUNCH. This will tell spawn to do everything *except* rds.query
so you avoid re-entering the hostfile info.

Unfortunately, if you want to see node info prior to launch on anything
other than a hostfile, we really don't have a way to do that right now. The
ORTE 2.0 design allows for it, but we haven't implemented that yet -
probably a few months away.

Hope that helps
Ralph

On 1/29/07 6:45 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:

>
>
>
> On 1/29/07 5:57 PM, "Greg Watson" <gwatson_at_[hidden]> wrote:
>
>> Ralph,
>>
>> On Jan 29, 2007, at 11:10 AM, Ralph H Castain wrote:
>>
>>>
>>>
>>>
>>> On 1/29/07 10:20 AM, "Greg Watson" <gwatson_at_[hidden]> wrote:
>>>
>>>>
>>>> No, we have always called query() first, just after orte_init().
>>>> Since query() has never required a job id before, this used to work.
>>>> I think the call was required to kick the SOH into action, but I'm
>>>> not sure if it was needed for any other purpose.
>>>
>>> Query has nothing to do with the SOH - the only time you would
>>> "need" it
>>> would be if you are reading a hostfile. Otherwise, it doesn't do
>>> anything at
>>> the moment.
>>>
>>>
>>> Not calling setup_job would be risky, in my opinion...
>>
>> We've had this discussion before. We *need* to read the hostfile
>> before calling setup_job() because we have to populate the registry
>> with node information. If you're saying that this is now no longer
>> possible, then I'd respectfully ask that this functionality be
>> restored before you release 1.2. If there is some other way to
>> achieve this, then please let me know. We've been doing this ever
>> since 1.0 and in the alpha and beta versions of 1.2.
>
> I think you don't understand what setup_job does. Setup_job has four
> arguments:
>
> (a) an array of app_context objects that contain the application to be
> launched
>
> (b) the number of elements in that array
>
> (c) a pointer to a location where the jobid for this job is to be returned;
> and
>
> (d) a list of attributes that allows the caller to "fine-tune" behavior
>
> With that info, setup_job will:
>
> (a) create a new jobid for your application; and
>
> (b) store the app_context info in appropriate places in the registry
>
> And that is *all* setup_job does - it simply gets a jobid and initializes some
> important info in the registry. It never looks at node information, nor does
> it in any way impact node info.
>
> Calling rds.query after rmgr.setup_job is how we always do it. In truth, the
> precise ordering of those two operations is immaterial as they have absolutely
> nothing in common. However, we always do it in the described order so that
> rds.query can have a valid jobid. As I said, at the moment rds.query doesn't
> actually use the jobid, though that will change at some point in the future.
>
> Although it isn't *absolutely* necessary, I would still suggest that you call
> rmgr.setup_job before calling rds.query to ensure that any subsequent
> operations have all the info they require to function correctly. You can see
> the progression we use in orte/mca/rmgr/urm/rmgr_urm.c - I believe you will
> find it helpful to follow that logic.
>
> Alternatively, if you want, you can simply repeatedly call orte_rmgr.spawn and
> use the attributes I built for you to step your way through the standard
> launch. As you probably recall, I gave you the ability to specify - at a very
> atomistic level - exactly which steps in the spawn process were to be
> implemented at each call into rmgr.spawn. You can look at the referenced file
> to see the attribute for each step in the procedure.
>
>
>>
>>>
>>>
>>>>
>>>> Are there likely to be further API changes before the release
>>>> version? We are trying to release PTP, but I think this is impossible
>>>> until your API's stabilize.
>>>
>>> None planned, other than what I mentioned above. If you want to
>>> support Open
>>> MPI 1.2, you may need a slight phase shift, though, so you can see
>>> the final
>>> release.
>>
>> Please explain "phase shift".
>>
>> Greg
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel

------ End of Forwarded Message