BTW: just to be clear. You don't have to write any code to compute these values, or to reset the job structures prior to restarting a process. This has already been done.

Recomputing local and node ranks is done in orte/mca/rmaps/base/rmaps_base_support_fns.c in a function called orte_rmaps_base_update_local_ranks.

Resetting the job and proc structures for restarting a process is done in orte/mca/plm/base/plm_base_rsh_support.c in a function called orte_plm_base_reset_job.

The restart logic was in the orte/mca/errmgr/orcm module, but I moved that out of the devel trunk recently as we needed to do some orcm-specific things in it. However, I can (and probably should) restore it under a different name if that would help.

Ralph


On Apr 7, 2010, at 10:15 PM, Ralph Castain wrote:

The local rank of a process is computed by looking at all processes on a node from that job. The lowest MPI rank process on that node from that job is given local-rank=0. All processes on the node are given local-ranks in ascending order according to their MPI rank.

The node rank is computed the same way, except that we look at all processes on the node, spanning all MPI jobs.

Consider this example. Suppose we have an MPI application that launches 3 processes on each of two nodes, with ranks assigned on a bynode round-robin basis. Thus, the MPI rank mapping looks like this:

node0:  rank 0, 2, 4
node1: rank 1, 3, 5

The local ranks would look like this:

Node             MPI Rank               Local Rank
node0                   0                                 0
node0                   2                                 1
node0                   4                                 2

node1                   1                                 0
node1                   3                                 1
node1                   5                                 2

Since we only have one job, the node rank of each process would be identical to its local rank.  Now suppose that application does a comm_spawn that launches two processes on node0. The local ranks of the new processes would be 0,1 reflecting their relative position within that job. However, their node ranks would be 3,4 because of the processes already on the node.

We use these values when assigning static ports and processor affinity. Other than that, they have no meaning.

HTH
Ralph



On Apr 7, 2010, at 7:16 PM, luyang dong wrote:

dear teachers:
         In orte_globals.h, there is a data structure.
typedef struct {
    /* index to node */
    int32_t node;
    /* local rank */
    orte_local_rank_t local_rank;
    /* node rank */
    orte_node_rank_t node_rank;
} orte_pmap_t;
And I do not understand what both local_rank and node_rank exactly mean. Is local_rank similar to the rank of MPI Specification. Can you help me? My motivation is to achieve process migration in openmpi, I urgently want to the procedure of launching process.

 _______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel