Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Locality info
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-10-19 19:16:01


On Oct 19, 2011, at 5:05 PM, George Bosilca wrote:

> Wonderful!!! We've been waiting for such functionality for a while.

My pleasure :-)

>
> I do have some questions/remarks related to this patch.
>
> What is the my_node_rank in the orte_proc_info_t structure?

The node rank is a local ranking of procs on a node, starting with 0 for the lowest vpid on the node and going up from there. It normally was passed in the environment and picked up in the ess components so it could be used to select a static port during oob init, if those were specified.

I moved it to a more general place solely because I wanted to move a bunch of replicated code to the ess/base instead of having it in nearly every module. I debated about putting it in ess/base.h instead, but since other places in the code might also want it, figured I'd make it more globally available.

If it turns out nobody needs it, we can move it back into just the ess.

> Is there any difference between using the field my_node_rank or the vpid part of the my_daemon?

Yes - my_daemon refers to the local daemon. The node rank refers solely to the relative ranking of application procs on the node.

> What is the correct way of finding that two processes are on the same remote location, comparing their daemon vpid or their node_rank?

Daemon vpid

> How the node_rank change with respect to dynamic process management when new daemons are joining?

This is where node_rank comes into play. The mapper sees across jobs that are sharing nodes, so the mapper currently is responsible for computing the node_rank of a proc. This info gets transmitted to all daemons, including new dynamically started ones, in the launch msg. So everyone always has a picture of the node_rank for every proc.

>
> The flag OPAL_PROC_ON_L*CACHE is only set for local processes if I understand correctly your last email?

Yes - all the locality flags refer only to the location of another process relative to you, you being an app process. As I said, though, this can easily be extended to return the relative locality of two procs on a remote node, if that would be of use.

>
> I guess proc_flags in proc.h should be opal_paffinity_locality_t to match the flags on the ORTE level?

My bad - I thought I had changed it? If not, it certainly needs to be...

>
> A more high level remark. The fact that the locality information is automatically packed and exchanged during the grpcomm modex call seems a little bit weird (do the upper level have a saying on it?). I would not have thought that the grpcomm (which based on the grpcomm.h header file is a framework providing communication services that span entire jobs or collections of processes) is the place to put it.

I agree - I wasn't entirely sure where to put it, frankly. It needs to be somewhere that both direct launch and mpirun-launched apps can see it. Could go in the MPI layer, I suppose.

Suggestions welcome!

>
> Thanks,
> george.
>
>
> On Oct 19, 2011, at 16:28 , Ralph Castain wrote:
>
>> Hi folks
>>
>> For those of you who don't follow the commits...
>>
>> I just committed (r25323) an extension of the orte_ess.proc_get_locality function that allows a process to get its relative resource usage with any other proc in the job. In other words, you can provide a process name to the function, and the returned bitmask tells you if you share a node, numa, socket, caches (by level), core, and hyperthread with that process.
>>
>> If you are on the same node and unbound, of course, you share all of those. However, if you are bound, then this can help tell you if you are on a common numa node, sharing an L1 cache, etc. Might be handy.
>>
>> I implemented the underlying functionality so that we can further extend it to tell you the relative resource location of two procs on a remote node. If that someday becomes of interest, it would be relatively easy to do - but would require passing more info around. Hence, I've allowed for it, but not implemented it until there is some identified need.
>>
>> Locality info is available anytime after the modex is completed during MPI_Init, and is supported regardless of launch environment (minus cnos, for now), launch by mpirun, or direct-launch - in other words, pretty much always.
>>
>> Hope it proves of help in your work
>> Ralph
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel