Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph H Castain (rhc_at_[hidden])
Date: 2007-01-29 13:10:01


On 1/29/07 10:20 AM, "Greg Watson" <gwatson_at_[hidden]> wrote:

>
> On Jan 29, 2007, at 6:47 AM, Ralph H Castain wrote:
>
>>
>>
>>
>> On 1/27/07 9:37 AM, "Greg Watson" <gwatson_at_[hidden]> wrote:
>>
>>> There are two more interfaces that have changed:
>>>
>>> 1. orte_rds.query() now takes a job id, whereas in 1.2b1 it didn't
>>> take any arguments. I seem to remember that I call this to kick orted
>>> into action, but I'm not sure of the implications of not calling it.
>>> In any case, I don't have a job id when I call it, so what do I pass
>>> to get the old behavior?
>>
>> For now, you can just use ORTE_JOBID_INVALID (defined in
>> orte/mca/ns/ns_types.h).
>>
>> However, your question raises a flag. You should be calling
>> orte_rmgr.setup_job before you call the RDS, and that function
>> returns the
>> jobid for your job. Failing to call setup_job first may cause other
>> parts of
>> the code base to fail as they are expecting certain data to be
>> setup in the
>> registry by setup_job.
>>
>> If you do call setup_job first, then just pass the returned jobid
>> along to
>> rds.query.
>
> No, we have always called query() first, just after orte_init().
> Since query() has never required a job id before, this used to work.
> I think the call was required to kick the SOH into action, but I'm
> not sure if it was needed for any other purpose.

Query has nothing to do with the SOH - the only time you would "need" it
would be if you are reading a hostfile. Otherwise, it doesn't do anything at
the moment.

Not calling setup_job would be risky, in my opinion...

>
>>
>>>
>>> 2. orte_pls.terminate_job() now takes a list of attributes in
>>> addition to a job id. What are the attributes for, and what happens
>>> if I pass a NULL here? Do I need to crate an empty attribute list?
>>>
>>
>> You can always pass a NULL to any function looking for attributes -
>> the
>> system knows how to handle that situation.
>>
>> What you should pass here depends upon what you are trying to do.
>> If you
>> just want to terminate a specific job, then you can just pass a NULL.
>> However, if you want to terminate the specified job AND any
>> "children" that
>> were dynamically spawned by that job, then you need to pass the
>> ORTE_NS_INCLUDE_DESCENDANTS attribute - something like the
>> following code
>> snippet (pulled from orterun) would work:
>>
>> #include "opal/class/opal_list.h"
>>
>> #include "orte/mca/pls/pls.h"
>> #include "orte/mca/rmgr/rmgr.h"
>> #include "orte/mca/ns/ns_types.h"
>> #include "orte/runtime/params.h"
>>
>> opal_list_t attrs;
>> opal_list_item_t *item;
>>
>> OBJ_CONSTRUCT(&attrs, opal_list_t);
>> orte_rmgr.add_attribute(&attrs, ORTE_NS_INCLUDE_DESCENDANTS,
>> ORTE_UNDEF,
>> NULL, ORTE_RMGR_ATTR_OVERRIDE);
>> ret = orte_pls.terminate_job(jobid, &orte_abort_timeout, &attrs);
>> while (NULL != (item = opal_list_remove_first(&attrs)))
>> OBJ_RELEASE(item);
>> OBJ_DESTRUCT(&attrs);
>>
>>
>> Please note that the orte_pls.terminate_job API in 1.2 will undergo
>> a change
>> in the next few days (it already is changed in the trunk). The change,
>> included in the code snippet above, adds a timeout capability to
>> have the
>> function "give up" if the job doesn't terminate within the
>> specified time.
>> The parameter given above references the orte-wide default value
>> (adjustable
>> via MCA param), but you can give it anything you like - a NULL for the
>> timeout param means don't timeout so we'll try until you order us
>> to quit.
>>
>
> Is this going to be in "1.2b4", or some other version? The previous
> API changes mean that PTP will no longer work with pre-1.2b3
> versions. It sounds like this is going to cause a similar issue.

I don't know if the Open MPI folks plan on rolling another 1.2 beta or just
do the full release. If they do roll another beta, I would expect these
changes to be in it, though that depends on their timing (not in my hands).

>
> Are there likely to be further API changes before the release
> version? We are trying to release PTP, but I think this is impossible
> until your API's stabilize.

None planned, other than what I mentioned above. If you want to support Open
MPI 1.2, you may need a slight phase shift, though, so you can see the final
release.

>
> What about orte_ns.free_name()?

Just do a "free" - the name structures are not OPAL objects, so there is no
need for a special API to free them.

>
> Thanks,
>
> Greg
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel