What is the exact purpose of the process name ?


On 11/17/07 5:27 PM, "Shipman, Galen M." <gshipman@ornl.gov> wrote:

I am doing some work on Cray's CNL to support shared memory. To support
shared memory I need to know if processes are local or remote. For other
systems we simply use the modex in ompi_proc_get_info to get the proc's
nodeid. When using CNOS I don't need the modex to get a remote processes
nodeid. In fact, I can obtain every processes pid and nodeid (nid/pid) via a
single CNOS call.

I have explored a couple of ways to populate the proc structures on the
CRAY. One involves using #if's to do special things in ompi_proc_get_info. I
don't like this. The second method involves adding a CNOS nameserver and
modifying the orte_process_name_t to include the orte_nodeid_t so that the
nameserver can populate all the info if it can. Prior to this change, the
orte_nodeid_t was in ompi_proc_t, which doesn't make any sense to me, it is
an orte level concept and it is only accessible in the ompi side. I also
don't like adding orte_nodeid_t to orte_process_name_t as it really doesn't
have anything to do with the a name.. I think it makes more sense to have an
orte_proc_t.. Something like the following structure:

struct orte_process_name_t {
    orte_jobid_t jobid;     /**< Job number */
    orte_vpid_t vpid;       /**< Process number */
    /** "nodeid" on which the proc resides */

Struct orte_proc_t {
    opal_list_item_t super;
    orte_process_name_t proc_name;
    orte_nodeid_t nid;

struct ompi_proc_t {
    orte_proc_t base;
    ..... Etc .....

I know there is some talk about removing the process names,,, not sure how
that fits in here but this is what makes sense to me given the current
architecture. Any thoughts here?

- Galen

devel mailing list