Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: job size info in OPAL
From: Nathan Hjelm (hjelmn_at_[hidden])
Date: 2014-07-31 12:14:04


This approach will work now but we need to start thinking about how we
want to support multiple simultaneous btl users. Does each user call
add_procs with a single module (or set of modules) or does each user
call btl_component_init and get their own module? If we do the latter
then it might make sense to add a max_procs argument to the
btl_component_init. Keep in mind we need to change the
btl_component_init interface anyway because the threading arguments no
longer make sense in their current form.

-Nathan

On Thu, Jul 31, 2014 at 09:04:09AM -0700, Ralph Castain wrote:
> Like I said, why don't we just do the following:
>
> > I'd like to suggest an alternative solution. A BTL can exploit whatever data it wants, but should first test if the data is available. If the data is *required*, then the BTL gracefully disqualifies itself. If the data is *desirable* for optimization, then the BTL writer (if they choose) can include an alternate path that doesn't do the optimization if the data isn't available.
>
> Seems like this should resolve the disagreement in a way that meets everyone's need. It basically is an attribute approach, but not requiring modification of the BTL interface.
>
>
> On Jul 31, 2014, at 8:26 AM, Pritchard Jr., Howard <howardp_at_[hidden]> wrote:
>
> > Hi George,
> >
> > The ompi_process_info.num_procs thing that seems to have been an object
> > of some contention yesterday.
> >
> > The ugni use of this is cloned off of the way I designed the mpich netmod.
> > Leveraging off size of the job was an easy way to scale the mailbox size.
> >
> > If I'd been asked to have the netmod work in a context like it appears we
> > may want to be eventually using BTLs - not just within ompi but for other
> > things, I'd have worked with Darius (if still in mpich world) on changing the netmod initialization
> > method to allow for an optional attributes struct to be passed into the init
> > method to give hints about how many connections may need to be established,
> > etc.
> >
> > For the GNI BTL - the way its currently designed - if you are only expecting
> > to use it for a limited number of connections, then you want to initialize
> > for big mailboxes (IBer's can think many large buffers posted as RX WQEs).
> > But for very large jobs, with possibly highly connected communication pattern,
> > you want very small mailboxes.
> >
> > Howard
> >
> >
> > -----Original Message-----
> > From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of George Bosilca
> > Sent: Thursday, July 31, 2014 9:09 AM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] RFC: job size info in OPAL
> >
> > What is your definition of "global job size"?
> >
> > George.
> >
> > On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howardp_at_[hidden]> wrote:
> >
> >> Hi Folks,
> >>
> >> I think given the way we want to use the btl's in lower levels like
> >> opal, it is pretty disgusting for a btl to need to figure out on its
> >> own something like a "global job size". That's not its business.
> >> Can't we add some attributes to the component's initialization method
> >> that provides hints for how to allocate resources it needs to provide its functionality?
> >>
> >> I'll see if there's something clever that can be done in ugni for now.
> >> I can always add in a hack to probe the apps placement info file and
> >> scale the smsg blocks by number of nodes rather than number of ranks.
> >>
> >> Howard
> >>
> >>
> >> -----Original Message-----
> >> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Nathan
> >> Hjelm
> >> Sent: Thursday, July 31, 2014 8:58 AM
> >> To: Open MPI Developers
> >> Subject: Re: [OMPI devel] RFC: job size info in OPAL
> >>
> >>
> >> +2^10000000
> >>
> >> This information is absolutely necessary at this point. If someone has a better solution they can provide it as an alternative RFC. Until then this is how it should be done... Otherwise we loose uGNI support on the trunk. Because we ARE NOT going to remove the mailbox size optimization.
> >>
> >> -Nathan
> >>
> >> On Wed, Jul 30, 2014 at 10:00:18PM +0000, Jeff Squyres (jsquyres) wrote:
> >>> WHAT: Should we make the job size (i.e., initial number of procs) available in OPAL?
> >>>
> >>> WHY: At least 2 BTLs are using this info (*more below)
> >>>
> >>> WHERE: usnic and ugni
> >>>
> >>> TIMEOUT: there's already been some inflammatory emails about this;
> >>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
> >>>
> >>> MORE DETAIL:
> >>>
> >>> This is an open question. We *have* the information at the time that the BTLs are initialized: do we allow that information to go down to OPAL?
> >>>
> >>> Ralph added this info down in OPAL in r32355, but George reverted it in r32361.
> >>>
> >>> Points for: YES, WE SHOULD
> >>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are
> >>> +++ already in OPAL (num local ranks, local rank)
> >>>
> >>> Points for: NO, WE SHOULD NOT
> >>> --- What exactly is this number (e.g., num currently-connected procs?), and when is it updated?
> >>> --- We need to precisely delineate what belongs in OPAL vs.
> >>> above-OPAL
> >>>
> >>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move down to OPAL:
> >>>
> >>> - usnic: for a minor latency optimization / sizing of a shared
> >>> receive buffer queue length, and for the initial size of a peer
> >>> lookup hash
> >>> - ugni: to determine the size of the per-peer buffers used for
> >>> send/recv communication
> >>>
> >>> --
> >>> Jeff Squyres
> >>> jsquyres_at_[hidden]
> >>> For corporate legal information go to:
> >>> http://www.cisco.com/web/about/doing_business/legal/cri/
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> Link to this post:
> >>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/devel/2014/07/15395.php
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15396.php
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15400.php
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15402.php



  • application/pgp-signature attachment: stored