Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OMPI devel] NIC Failover and Message Stripping of Open MPI
From: Lirong Jian (lirong.misc_at_[hidden])
Date: 2012-10-25 20:25:48


Thanks, guys.

I will check the code of OB1 more carefully. Thanks.

Best,
Lirong

Message: 7
> Date: Thu, 25 Oct 2012 10:55:51 -0700
> From: Ralph Castain <rhc_at_[hidden]>
> Subject: Re: [OMPI devel] NIC Failover and Message Stripping of Open
> MPI.
> To: Open MPI Developers <devel_at_[hidden]>
> Message-ID: <B1A13D1B-02A2-4E67-B0CD-FA924538D458_at_[hidden]>
> Content-Type: text/plain; charset="us-ascii"
>
> Just an FYI - I asked a similar question recently and got the following
> answer from Rolf:
>
> > In my case, it was specific to openib only and it required you to be
> running with two or more IB rails.
> > Then, if one of them failed, we just shut it down, and continued with
> the working ones.
> > You could only get use of the failing rail if it was fixed and a new job
> was started.
> >
> > To get this to work, I created a new PML called bfo. I also had to make
> some changes in the openib BTL.
> > By default, none of the code is configured in. There is a README in the
> PML bfo directory that
> > actually does quite a good job explaining what I did.
>
> The bfo module is included in the 1.6 series, and in the upcoming 1.7
> series. Can't say anything as to its state of repair.
>
>
> On Oct 25, 2012, at 10:41 AM, George Bosilca <bosilca_at_[hidden]> wrote:
>
> >
> > On Oct 25, 2012, at 17:54 , Lirong Jian <lirong.misc_at_[hidden]> wrote:
> >
> >> Hi foks,
> >>
> >> Sorry to bother you guys, but I have some questions about Open MPI and
> really want your help.
> >>
> >> There are some papers (e.g., [1, 2, 3], although they are sort of
> old-aged) mentioning that Open MPI is supporting NIC failover and message
> stripping over multiple NICs. However, when I read the source code of
> openmpi-1.6.2, I couldn't find any component named DR or TEG (which are
> mentioned in those papers and are supposed to support NIC failover and
> message stripping). So my question is:
> >>
> >> Does the 1.6.2 release of Open MPI support such two kinds of
> functionalities? If positive, which part of code is corresponding to these
> functionalities?
> >
> > Lirong,
> >
> > As you noticed the papers are quite old and dusty.
> >
> > Due to a lack of interest from the community the DR PML has been retired
> from out stable releases. In other terms no stable Open MPI version
> supports network failover. However, the code is still available in the
> trunk, but there is no guarantee it still does what it was designed for.
> >
> > TEG has been replaced with OB1, which is our current network management
> layer. It does stripping over multiple NICs (identical or not) by default.
> >
> > george.
> >
> >>
> >> Many thanks in advance.
> >>
> >> P.S., I am a newbie of this domain. Maybe my questions are simple even
> naive, but your help would be highly appreciated.
> >>
> >> Best,
> >> Lirong
> >>
> >>
> >> [1] Network Fault Tolerance in Open MPI.
> >> [2] Open MPI: A High Performance, Flexible Implementation of MPI
> Point-to-Point Communications.
> >> [3] TEG: A High-Performance, Scalable, Multi-network, Point-to-Point,
> Communications Methodology.
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> -------------- next part --------------
> HTML attachment scrubbed and removed
>
> ------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> End of devel Digest, Vol 2285, Issue 2
> **************************************
>