Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] initial SCTP BTL commit comments?
From: Brad Penoff (penoff_at_[hidden])
Date: 2007-11-09 22:47:31


Greetings Open MPI developers,

Karol Mroz and I at UBC have been working on a BTL component for SCTP.
 With our own internal testing, the BTL has stabilized so we were
hoping to commit it to ompi-trunk. Prior to doing so though, we
wanted get some feedback from the community. Particularly we were
curious if there were any objections to putting an initial version in
the trunk, initially with an ompi_ignore. The SCTP BTL component
stands alone completely. So what we're wondering....

Any objections to us committing an SCTP BTL to ompi-trunk if it has
the ompi_ignore file in it first?

I'll try to tell a little bit about this new SCTP BTL. Feel free to
write back if you have any questions.

For starters, SCTP is an IP-based transport protocol. There are
kernel-based implementations on most major operating systems. The
best implementation seems to be the FreeBSD stack (now by default in
FreeBSD 7), but the Linux one (lksctp.sf.net) has been getting better
and is currently a module in the vanilla kernel. These have been the
only two stacks that we have tested on so far; we've been able to run
a handful of our own tests in addition to the OSU, NAS, and Intel
benchmarks. At present, our autoconf rules only build the component
on these two platforms. We've also conformed to the Open MPI coding
standards as outlined on the wiki.

For fault tolerance purposes, SCTP connections (termed "associations")
can be made aware of multiple interfaces on the endpoints by binding
to more than one interface (for performance, the CMT extension uses
this multihoming feature to stripe data). SCTP also has several
different APIs that it supports. Like TCP, there can be a one-to-one
socket per connection. Another option is that like UDP, there can be
a single one-to-many socket that is used for all connections. The
SCTP BTL has the option of using either socket style, depending on the
value of the btl_sctp_if_11 MCA option. When this value is 1, the
one-to-one socket is used and like the TCP BTL, there are as many BTL
component modules as the number of network cards specified with
if_include and friends. By default, this value is 0 which means that
a single one-to-many socket is used; here only one BTL module is used
and internally, SCTP itself handles within that one socket all the
network cards specified with if_include, etc.

Currently, both the one-to-one and the one-to-many make use of the
event library offered by Open MPI. The callback functions for the
one-to-many style however are quite unique as multiple endpoints may
be interested in the events that poll returns. Currently we use these
unique callback functions, but in the future the hope is to play with
the potential benefits of a btl_progress function, particularly for
the one-to-many style.

At a high level, that's a review of the SCTP BTL component. The
current design does not make use of the SCTP multistreaming feature;
that is the intent of a future MTL so that we have access to MPI
information (like the context and tag). The question here is if I can
go ahead and commit, initially with the proper ignore files.... any
comments/suggestions/feedback?

Thanks!
brad