WHAT: Should we make the job size (i.e., initial number of procs) available in OPAL?
WHY: At least 2 BTLs are using this info (*more below)
WHERE: usnic and ugni
TIMEOUT: there's already been some inflammatory emails about this; let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
This is an open question. We *have* the information at the time that the BTLs are initialized: do we allow that information to go down to OPAL?
Ralph added this info down in OPAL in r32355, but George reverted it in r32361.
Points for: YES, WE SHOULD
+++ 2 BTLs were using it (usinc, ugni)
+++ Other RTE job-related info are already in OPAL (num local ranks, local rank)
Points for: NO, WE SHOULD NOT
--- What exactly is this number (e.g., num currently-connected procs?), and when is it updated?
--- We need to precisely delineate what belongs in OPAL vs. above-OPAL
FWIW: here's how ompi_process_info.num_procs was used before the BTL move down to OPAL:
- usnic: for a minor latency optimization / sizing of a shared receive buffer queue length, and for the initial size of a peer lookup hash
- ugni: to determine the size of the per-peer buffers used for send/recv communication
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/