On Jul 28, 2014, at 1:09 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>> 2. If we keep it, I don't remember offhand what the difference is between node_rank and local_rank. The one we want is the 0-based index rank of this process *on this server*. E.g., on a 2-server job, each with 16 slots, the first process on each server will be <foo>_rank 0, the second process on each server will be <foo>_rank 1, etc. That's the one we want. If it's node_rank and not local_rank, ok.
> "local rank" is the relative rank of that proc on that server within its own job, not across all jobs on that server. Hence, "local rank" is not unique if multiple jobs are running on a server (e.g., as a result of comm_spawn)
> "node rank" is the relative rank of that proc on this server, looking across all jobs. It is therefore unique regardless of the number of jobs running on a server
I probably picked "local_rank" because I was thinking of repeatability.
...but you could probably make the same "repeatability" argument for "node_rank", especially when in the presence of multiple jobs on the same server (assuming no oversubscription, which would throw all possibility of repeatability out the window).
So I'm not sure what the Right answer is here. :-\
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/