Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] OMPI devel] NIC Failover and Message Stripping of Open MPI
From: Lirong Jian (lirong.misc_at_[hidden])
Date: 2012-10-25 20:25:48


Thanks, guys.

I will check the code of OB1 more carefully. Thanks.

Best,
Lirong

Message: 7
> Date: Thu, 25 Oct 2012 10:55:51 -0700
> From: Ralph Castain <rhc_at_[hidden]>
> Subject: Re: [OMPI devel] NIC Failover and Message Stripping of Open
> MPI.
> To: Open MPI Developers <devel_at_[hidden]>
> Message-ID: <B1A13D1B-02A2-4E67-B0CD-FA924538D458_at_[hidden]>
> Content-Type: text/plain; charset="us-ascii"
>
> Just an FYI - I asked a similar question recently and got the following
> answer from Rolf:
>
> > In my case, it was specific to openib only and it required you to be
> running with two or more IB rails.
> > Then, if one of them failed, we just shut it down, and continued with
> the working ones.
> > You could only get use of the failing rail if it was fixed and a new job
> was started.
> >
> > To get this to work, I created a new PML called bfo. I also had to make
> some changes in the openib BTL.
> > By default, none of the code is configured in. There is a README in the
> PML bfo directory that
> > actually does quite a good job explaining what I did.
>
> The bfo module is included in the 1.6 series, and in the upcoming 1.7
> series. Can't say anything as to its state of repair.
>
>
> On Oct 25, 2012, at 10:41 AM, George Bosilca <bosilca_at_[hidden]> wrote:
>
> >
> > On Oct 25, 2012, at 17:54 , Lirong Jian <lirong.misc_at_[hidden]> wrote:
> >
> >> Hi foks,
> >>
> >> Sorry to bother you guys, but I have some questions about Open MPI and
> really want your help.
> >>
> >> There are some papers (e.g., [1, 2, 3], although they are sort of
> old-aged) mentioning that Open MPI is supporting NIC failover and message
> stripping over multiple NICs. However, when I read the source code of
> openmpi-1.6.2, I couldn't find any component named DR or TEG (which are
> mentioned in those papers and are supposed to support NIC failover and
> message stripping). So my question is:
> >>
> >> Does the 1.6.2 release of Open MPI support such two kinds of
> functionalities? If positive, which part of code is corresponding to these
> functionalities?
> >
> > Lirong,
> >
> > As you noticed the papers are quite old and dusty.
> >
> > Due to a lack of interest from the community the DR PML has been retired
> from out stable releases. In other terms no stable Open MPI version
> supports network failover. However, the code is still available in the
> trunk, but there is no guarantee it still does what it was designed for.
> >
> > TEG has been replaced with OB1, which is our current network management
> layer. It does stripping over multiple NICs (identical or not) by default.
> >
> > george.
> >
> >>
> >> Many thanks in advance.
> >>
> >> P.S., I am a newbie of this domain. Maybe my questions are simple even
> naive, but your help would be highly appreciated.
> >>
> >> Best,
> >> Lirong
> >>
> >>
> >> [1] Network Fault Tolerance in Open MPI.
> >> [2] Open MPI: A High Performance, Flexible Implementation of MPI
> Point-to-Point Communications.
> >> [3] TEG: A High-Performance, Scalable, Multi-network, Point-to-Point,
> Communications Methodology.
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> -------------- next part --------------
> HTML attachment scrubbed and removed
>
> ------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> End of devel Digest, Vol 2285, Issue 2
> **************************************
>