Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] SDP support for OPEN-MPI
From: Lenny Verkhovsky (lennyb_at_[hidden])
Date: 2008-01-08 07:45:25


 Hi all,
Hi,
Thanks for the responses.

> -----Original Message-----
> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
On
> Behalf Of Jeff Squyres
> Sent: Wednesday, January 02, 2008 4:08 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] SDP support for OPEN-MPI
>
> On Jan 1, 2008, at 1:11 PM, Andrew Friedley wrote:
>
> >>> We would like to add SDP support for OPENMPI.
>
> I have a few points -- this is the first:
>
> I would do this patch slightly differently. I prefer to have as few
> #if's as possible, so I'd do it to always have the struct members and
> logic for the MCA-enable/disable of SDP support, but only actually
> enable it if HAVE_DECL_AF_INET_SDP. Hence, the number of #if's is
> dramatically reduced -- you only need to #if the parts of the code
> that actually try to use AF_INET_SDP (etc.).
>
> I'd also ditch the --enable-sdp; I think configure can figure that
> stuff out by itself without an --enable switch. Perhaps if people
> really want the ability to turn SDP off at configure time, --disable-
> sdp could be useful. But that might not be too useful.
Unfortunatly AF_INET_SDP is not defined in the glibc and there is no
easy way to check it during config, Each app that uses SDP defines
AF_INET_SDP in its own headers.
Since the user can compile on the machine without SDP support and to
minimize the number of #if's we can always compile code with
sdp_support.
>
> Don't forget that we always have the "bool" type available; you can
> use that for logicals (instead of int).
>
> I'd also add another MCA param that is read-only that indicates
> whether SDP is support was compiled in or not (i.e.,
> HAVE_DECL_AF_INET_SDP is 1, and therefore there was a value for
> AF_INET_SDP). This will allow you to query ompi_info and see if your
> OMPI was configured for SDP support.
>
> That way, you can have a consistent set of MCA params for the TCP
> components regardless of platform. I think that's somewhat
> important. To be user-friendly, I'd also emit a warning if someone
> tries to enable SDP support and it's not available. Note that SDP
> could be unavailable for multiple reasons:
>
> - wasn't available at compile time
> - isn't available for the peer IP address that was used
>
> Hence, if HAVE_DECL_AF_INET_SDP==1 and using AF_INET_SDP fails to that
> peer, it might be desirable to try to fail over to using
> AF_INET_something_else. I'm still technically on vacation :-), so I
> didn't look *too* closely at your patch, but I think you're doing that
> (failing over if AF_INET_SDP doesn't work because of EAFNOSUPPORT),
> which is good.
This is actually not implemented yet.
Supporting failing over requires opening AF_INET sockets in addition to
SDP sockets, this can cause a problem in large clusters.
If one of the machine is not supporting SDP user will get an error.
>
> I would think the following would apply:
>
> - Error (or warning?): user requests SDP and HAVE_DECL_AF_INET_SDP is
0
> - Error (or warning?): user requests SDP and HAVE_DECL_AF_INET_SDP is
> 1, but using AF_INET_SDP failed
> - Not an error: user does not request SDP, but HAVE_DECL_AF_INET_SDP
> is 1 and AF_INET_SDP works
> - Not an error: user does not request SDP, but HAVE_DECL_AF_INET_SDP
> is 1 and AF_INET_SDP does not work, but is able to fail over to
> AF_INET_something_else
>
> With all this, the support is still somewhat inconsistent -- you could
> be using an OMPI that has HAVE_DECL_AF_INET_SDP==0, but you're running
> on a system that has SDP available.
>
> Perhaps a more general approach would be to [perhaps additionally]
> provide an MCA param to allow the user to specify the AF_* value?
> (AF_INET_SDP is a standardized value, right? I.e., will it be the
> same on all Linux variants [and someday Solaris]?)
I didn't find any standard on it, it seems to be "randomly" selected
since the originally it was 26 and changed to 27 due to conflict with
kernel's defines.
>
> >>> SDP can be used to accelerate job start ( oob over sdp ) and IPoIB
> >>> performance.
> >>
> >> I fail to see the reason to pollute the TCP btl with IB-specific
> >> SDP stuff.
> >>
> >> For the oob, this is arguable, but doesn't SDP allow for
> >> *transparent*
> >> socket replacement at runtime ? In this case, why not use this
> >> mechanism
> >> and keep the code clean ?
>
> Patrick's got a good point: is there a reason not to do this?
> (LD_PRELOAD and the like) Is it problematic with the remote orted's?
Yes, it's problematic with remote orted's and it not really transparent
as you might think.
Since we can't pass environments' variables to the orted's during
runtime we must preload sdp library to each remote environment ( i.e.
bashrc ) This will cause all applications to use SDP instead of AF_INET.
Which means you can't choose specific protocol for specific application,
either you are using SDP or AF_INET for all.
SDP also can be loaded with appropriate /usr/local/ofed/etc/libsdp.conf
configuration but a simple user have no access to it usually.
(http://www.cisco.com/univercd/cc/td/doc/product/svbu/ofed/ofed_1_1/ofed
_ug/sdp.htm#wp952927)

>
> > Furthermore, why would a user choose to use SDP and TCP/IPoIB when
the
> > OpenIB BTL is available using the native verbs interface? FWIW,
this
> > same sort of question gets asked of the uDAPL BTL -- the answer
there
> > being that the uDAPL BTL runs in places the OpenIB BTL does not. Is
> > this true here as well?
>
>
> Andrew's got a point point here, too -- accelerating the TCP BTL with
> SDP seems kinda pointless. I'm guessing that you did it because it
> was just about the same work as was done in the TCP OOB (for which we
> have no corresponding verbs interface). Is that right?
Indeed. But it also seems that SDP has lower overhead than VERBS in some
cases.

Tests with Sandia's overlapping benchmark
http://www.cs.sandia.gov/smb/overhead.html#mozTocId316713

VERBS results
msgsize iterations iter_t work_t overhead base_t
avail(%)
0 1000 16.892 15.309 1.583 7.029 77.5
2 1000 16.852 15.332 1.520 7.144 78.7
4 1000 16.932 15.312 1.620 7.128 77.3
8 1000 16.985 15.319 1.666 7.182 76.8
16 1000 16.886 15.297 1.589 7.219 78.0
32 1000 16.988 15.311 1.677 7.251 76.9
64 1000 16.944 15.299 1.645 7.457 77.9

SDP results
0 1000 134.902 128.089 6.813 54.691 87.5
2 1000 135.064 128.196 6.868 55.283 87.6
4 1000 135.031 128.356 6.675 55.039 87.9
8 1000 130.460 125.908 4.552 52.010 91.2
16 1000 135.432 128.694 6.738 55.615 87.9
32 1000 135.228 128.494 6.734 55.627 87.9
64 1000 135.470 128.540 6.930 56.583 87.8

IPoIB results
0 1000 252.953 247.053 5.900 119.977 95.1
2 1000 253.336 247.285 6.051 121.573 95.0
4 1000 254.147 247.041 7.106 122.110 94.2
8 1000 254.613 248.011 6.602 121.840 94.6
16 1000 255.662 247.952 7.710 124.738 93.8
32 1000 255.569 248.057 7.512 127.095 94.1
64 1000 255.867 248.308 7.559 132.858 94.3

>
> --
> Jeff Squyres
> Cisco Systems
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel