Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] SDP support for OPEN-MPI
From: Lenny Verkhovsky (lennyb_at_[hidden])
Date: 2008-01-15 03:57:30


> -----Original Message-----
> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
On
> Behalf Of Jeff Squyres
> Sent: Tuesday, January 15, 2008 6:13 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] SDP support for OPEN-MPI
>
> On Jan 13, 2008, at 8:19 AM, Lenny Verkhovsky wrote:
>
> > > What I meant was try to open an SDP socket. If it fails because
SDP
> > > is not supported / available to that peer, then open a regular
> > > socket. So you should still always have only 1 socket open to a
> > peer
> > > (not 2).
> > Yes, but since the listener side doesn't know on which socket to
> > expect
> > a message it will need both sockets to be opened.
> >
>
> Ah, you meant the listener socket -- not 2 sockets to each peer. Ok,
> fair enough. Opening up one more listener socket in each process is
> no big deal (IMO).
I thought in a large cluster it can be a problem.

> > > > If one of the machine is not supporting SDP user will get an
> > error.
> > >
> > > Well, that's one way to go, but it's certainly less friendly. It
> > > means that the entire MPI job has to support SDP -- including
> > mpirun.
> > > What about clusters that do not have IB on the head node?
> > >
> > They can use OOB over IP sockets and BTL on SDP, it should work.
> >
>
> Yes, I'm fine with this -- IIRC, my point was that if SDP is not
> available (and the user didn't explicitly ask for it), then it should
> not be an error.
>
> > > >> Perhaps a more general approach would be to [perhaps
> > additionally]
> > > >> provide an MCA param to allow the user to specify the AF_*
value?
> > > >> (AF_INET_SDP is a standardized value, right? I.e., will it be
> > the
> > > >> same on all Linux variants [and someday Solaris]?)
> > > > I didn't find any standard on it, it seems to be "randomly"
> > selected
> > > > since the originally it was 26 and changed to 27 due to conflict
> > with
> > > > kernel's defines.
> > >
> > > This might make an even stronger case for having an MCA param for
it
> > > -- if the AF_INET_SDP value is so broken that it's effectively
> > random,
> > > it may be necessary to override it on some platforms (especially
in
> > > light of binary OMPI and OFED distributions that may not match).
> > >
> > If we talking about passing AF_INET_SDP value only then
> > 1. Passing this value as mca parameter will not make any changes to
> > the
> > SDP code.
> > 2. Hopefully in the future AF_INET_SDP value can be gotten from the
> > libc,
> > And the value will be configured automatically.
> > 3. If we are talking about AF_INET value in general ( IPv4, IPv6,
SDP)
> > Then by making it constant with mca parameter we are limiting
> > ourselves
> > for one protocol only without being able to failover or using
> > different
> > protocols for different needs ( i.e. SDP for OOB and IPv4 for BTL )
> >
>
> I'm not sure what you mean. The AF_INET values for v4 and v6 are
> constantly compiled into OMPI via whatever values they are in the
> system header files. They're standardized values, right?
Yes.
>
> My understanding of what you were saying was that AF_INET_SDP is *not*
> standardized such that it may actually be different values on
> different systems. Hence, an MPI app could be otherwise portable but
> have a wrong value for AF_INET_SDP compiled into its code.
>
> Are you saying something else?
I thought you were talking about porting general AF_INET value (
IPv4,IPv6,SDP...).
I do think that AF_INET_SDP will be standardized once, and will be a
constant value in the meanwhile for all systems.
Porting AF_INET_SDP will not minimize code changing, but will lower
flexibility ( using it for BTL and OOB independently).
>
> > > >> Patrick's got a good point: is there a reason not to do this?
> > > >> (LD_PRELOAD and the like) Is it problematic with the remote
> > orted's?
> > > > Yes, it's problematic with remote orted's and it not really
> > > > transparent
> > > > as you might think.
> > > > Since we can't pass environments' variables to the orted's
during
> > > > runtime
> > >
> > > I think this depends on your environment. If you're not using rsh
> > > (which you shouldn't be for a large cluster, which is where SDP
> > would
> > > matter most, right?), the resource manager typically copies the
> > > environment out to the cluster nodes. So an LD_PRELOAD value
should
> > > be set for the orteds as well.
> > >
> > > I agree that it's problematic for rsh, but that might also be
> > solvable
> > > (with some limits; there's only so many characters that we can
> > pass on
> > > the command line -- we did investigate having a wrapper to the
orted
> > > at one point to accept environment variables and then launch the
> > > orted, but this was so problematic / klunky that we abandoned the
> > idea).
> > >
> > Using LD_PRELOAD will not allow us to use SDP and IP separately,
i.e.
> > SDP for OOB and IP for a BTL.
> >
>
> Why would you want to do that? I would think that the biggest win
> here would be SDP for OOB -- the heck with the BTL. The BTL was just
> done for completeness (right?); if you have OpenFabrics support, you
> should be using the verbs BTL.
>
> Perhaps I don't understand exactly what you are proposing. I was
> under the impression that you were going after a common case: mpirun
> and the MPI jobs are running on back-end compute nodes where all of
> them support SDP (although the other case of mpirun running on the
> head node without SDP and all the MPI processes are running on back-
> end nodes with SDP is also not-uncommon...). Are you thinking of
> something else, or are you looking for more flexibility?
I am just looking for more flexibility for the end user.

>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel