Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Ralph H Castain (rhc_at_[hidden])
Date: 2007-01-29 13:10:01


On 1/29/07 10:20 AM, "Greg Watson" <gwatson_at_[hidden]> wrote:

>
> On Jan 29, 2007, at 6:47 AM, Ralph H Castain wrote:
>
>>
>>
>>
>> On 1/27/07 9:37 AM, "Greg Watson" <gwatson_at_[hidden]> wrote:
>>
>>> There are two more interfaces that have changed:
>>>
>>> 1. orte_rds.query() now takes a job id, whereas in 1.2b1 it didn't
>>> take any arguments. I seem to remember that I call this to kick orted
>>> into action, but I'm not sure of the implications of not calling it.
>>> In any case, I don't have a job id when I call it, so what do I pass
>>> to get the old behavior?
>>
>> For now, you can just use ORTE_JOBID_INVALID (defined in
>> orte/mca/ns/ns_types.h).
>>
>> However, your question raises a flag. You should be calling
>> orte_rmgr.setup_job before you call the RDS, and that function
>> returns the
>> jobid for your job. Failing to call setup_job first may cause other
>> parts of
>> the code base to fail as they are expecting certain data to be
>> setup in the
>> registry by setup_job.
>>
>> If you do call setup_job first, then just pass the returned jobid
>> along to
>> rds.query.
>
> No, we have always called query() first, just after orte_init().
> Since query() has never required a job id before, this used to work.
> I think the call was required to kick the SOH into action, but I'm
> not sure if it was needed for any other purpose.

Query has nothing to do with the SOH - the only time you would "need" it
would be if you are reading a hostfile. Otherwise, it doesn't do anything at
the moment.

Not calling setup_job would be risky, in my opinion...

>
>>
>>>
>>> 2. orte_pls.terminate_job() now takes a list of attributes in
>>> addition to a job id. What are the attributes for, and what happens
>>> if I pass a NULL here? Do I need to crate an empty attribute list?
>>>
>>
>> You can always pass a NULL to any function looking for attributes -
>> the
>> system knows how to handle that situation.
>>
>> What you should pass here depends upon what you are trying to do.
>> If you
>> just want to terminate a specific job, then you can just pass a NULL.
>> However, if you want to terminate the specified job AND any
>> "children" that
>> were dynamically spawned by that job, then you need to pass the
>> ORTE_NS_INCLUDE_DESCENDANTS attribute - something like the
>> following code
>> snippet (pulled from orterun) would work:
>>
>> #include "opal/class/opal_list.h"
>>
>> #include "orte/mca/pls/pls.h"
>> #include "orte/mca/rmgr/rmgr.h"
>> #include "orte/mca/ns/ns_types.h"
>> #include "orte/runtime/params.h"
>>
>> opal_list_t attrs;
>> opal_list_item_t *item;
>>
>> OBJ_CONSTRUCT(&attrs, opal_list_t);
>> orte_rmgr.add_attribute(&attrs, ORTE_NS_INCLUDE_DESCENDANTS,
>> ORTE_UNDEF,
>> NULL, ORTE_RMGR_ATTR_OVERRIDE);
>> ret = orte_pls.terminate_job(jobid, &orte_abort_timeout, &attrs);
>> while (NULL != (item = opal_list_remove_first(&attrs)))
>> OBJ_RELEASE(item);
>> OBJ_DESTRUCT(&attrs);
>>
>>
>> Please note that the orte_pls.terminate_job API in 1.2 will undergo
>> a change
>> in the next few days (it already is changed in the trunk). The change,
>> included in the code snippet above, adds a timeout capability to
>> have the
>> function "give up" if the job doesn't terminate within the
>> specified time.
>> The parameter given above references the orte-wide default value
>> (adjustable
>> via MCA param), but you can give it anything you like - a NULL for the
>> timeout param means don't timeout so we'll try until you order us
>> to quit.
>>
>
> Is this going to be in "1.2b4", or some other version? The previous
> API changes mean that PTP will no longer work with pre-1.2b3
> versions. It sounds like this is going to cause a similar issue.

I don't know if the Open MPI folks plan on rolling another 1.2 beta or just
do the full release. If they do roll another beta, I would expect these
changes to be in it, though that depends on their timing (not in my hands).

>
> Are there likely to be further API changes before the release
> version? We are trying to release PTP, but I think this is impossible
> until your API's stabilize.

None planned, other than what I mentioned above. If you want to support Open
MPI 1.2, you may need a slight phase shift, though, so you can see the final
release.

>
> What about orte_ns.free_name()?

Just do a "free" - the name structures are not OPAL objects, so there is no
need for a special API to free them.

>
> Thanks,
>
> Greg
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel