Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI devel] RFC: job size info in OPAL
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-07-30 18:00:18


WHAT: Should we make the job size (i.e., initial number of procs) available in OPAL?

WHY: At least 2 BTLs are using this info (*more below)

WHERE: usnic and ugni

TIMEOUT: there's already been some inflammatory emails about this; let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014

MORE DETAIL:

This is an open question. We *have* the information at the time that the BTLs are initialized: do we allow that information to go down to OPAL?

Ralph added this info down in OPAL in r32355, but George reverted it in r32361.

Points for: YES, WE SHOULD
+++ 2 BTLs were using it (usinc, ugni)
+++ Other RTE job-related info are already in OPAL (num local ranks, local rank)

Points for: NO, WE SHOULD NOT
--- What exactly is this number (e.g., num currently-connected procs?), and when is it updated?
--- We need to precisely delineate what belongs in OPAL vs. above-OPAL

FWIW: here's how ompi_process_info.num_procs was used before the BTL move down to OPAL:

- usnic: for a minor latency optimization / sizing of a shared receive buffer queue length, and for the initial size of a peer lookup hash
- ugni: to determine the size of the per-peer buffers used for send/recv communication

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/