Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] RFC: job size info in OPAL
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-07-30 18:00:18


WHAT: Should we make the job size (i.e., initial number of procs) available in OPAL?

WHY: At least 2 BTLs are using this info (*more below)

WHERE: usnic and ugni

TIMEOUT: there's already been some inflammatory emails about this; let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014

MORE DETAIL:

This is an open question. We *have* the information at the time that the BTLs are initialized: do we allow that information to go down to OPAL?

Ralph added this info down in OPAL in r32355, but George reverted it in r32361.

Points for: YES, WE SHOULD
+++ 2 BTLs were using it (usinc, ugni)
+++ Other RTE job-related info are already in OPAL (num local ranks, local rank)

Points for: NO, WE SHOULD NOT
--- What exactly is this number (e.g., num currently-connected procs?), and when is it updated?
--- We need to precisely delineate what belongs in OPAL vs. above-OPAL

FWIW: here's how ompi_process_info.num_procs was used before the BTL move down to OPAL:

- usnic: for a minor latency optimization / sizing of a shared receive buffer queue length, and for the initial size of a peer lookup hash
- ugni: to determine the size of the per-peer buffers used for send/recv communication

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/