The maximum number of peer processes that may be added over the course
of the job will suffice. So either the world or universe size. This is a
reasonable piece of information to expect the upper layers to provide to
the communication layer.
And the impact of providing this information is no less intrusive than
providing information like the number of local ranks.
On Thu, Jul 31, 2014 at 11:09:24AM -0400, George Bosilca wrote:
> What is your definition of âglobal job sizeâ?
> On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howardp_at_[hidden]> wrote:
> > Hi Folks,
> > I think given the way we want to use the btl's in lower levels like opal,
> > it is pretty disgusting for a btl to need to figure out on its own something
> > like a "global job size". That's not its business. Can't we add some attributes
> > to the component's initialization method that provides hints for how to
> > allocate resources it needs to provide its functionality?
> > I'll see if there's something clever that can be done in ugni for now.
> > I can always add in a hack to probe the apps placement info file and
> > scale the smsg blocks by number of nodes rather than number of ranks.
> > Howard
> > -----Original Message-----
> > From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Nathan Hjelm
> > Sent: Thursday, July 31, 2014 8:58 AM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] RFC: job size info in OPAL
> > +2^10000000
> > This information is absolutely necessary at this point. If someone has a better solution they can provide it as an alternative RFC. Until then this is how it should be done... Otherwise we loose uGNI support on the trunk. Because we ARE NOT going to remove the mailbox size optimization.
> > -Nathan
> > On Wed, Jul 30, 2014 at 10:00:18PM +0000, Jeff Squyres (jsquyres) wrote:
> >> WHAT: Should we make the job size (i.e., initial number of procs) available in OPAL?
> >> WHY: At least 2 BTLs are using this info (*more below)
> >> WHERE: usnic and ugni
> >> TIMEOUT: there's already been some inflammatory emails about this;
> >> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
> >> MORE DETAIL:
> >> This is an open question. We *have* the information at the time that the BTLs are initialized: do we allow that information to go down to OPAL?
> >> Ralph added this info down in OPAL in r32355, but George reverted it in r32361.
> >> Points for: YES, WE SHOULD
> >> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are
> >> +++ already in OPAL (num local ranks, local rank)
> >> Points for: NO, WE SHOULD NOT
> >> --- What exactly is this number (e.g., num currently-connected procs?), and when is it updated?
> >> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
> >> FWIW: here's how ompi_process_info.num_procs was used before the BTL move down to OPAL:
> >> - usnic: for a minor latency optimization / sizing of a shared receive
> >> buffer queue length, and for the initial size of a peer lookup hash
> >> - ugni: to determine the size of the per-peer buffers used for
> >> send/recv communication
> >> --
> >> Jeff Squyres
> >> jsquyres_at_[hidden]
> >> For corporate legal information go to:
> >> http://www.cisco.com/web/about/doing_business/legal/cri/
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15395.php
> devel mailing list
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15396.php
- application/pgp-signature attachment: stored