Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] fault tolerance in open mpi
From: Durga Choudhury (dpchoudh_at_[hidden])
Date: 2009-07-09 11:00:19

Although I have perhaps the least experience on the topic in this
list, I will take a shot; more experienced people, please correct me:

MPI standards specify communication mechanism, not fault tolerance at
any level. You may achieve network tolerance at the IP level by
implementing 'equal cost multipath' routes (which means two equally
capable NIC cards connecting to the same destination and modifying the
kernel routing table to use both cards; the kernel will dynamically
load balance.). At the MAC level, you can achieve the same effect by
trunking multiple network cards.

You can achieve process level fault tolerance by a checkpointing
scheme such as BLCR, which has been tested to work with OpenMPI (and
other processes as well)


On Thu, Jul 9, 2009 at 4:57 AM, vipin kumar<vipinkumar41_at_[hidden]> wrote:
> Hi all,
> I want to know whether open mpi supports Network and process fault tolerance
> or not? If there is any example demonstrating these features that will be
> best.
> Regards,
> --
> Vipin K.
> Research Engineer,
> C-DOTB, India
> _______________________________________________
> users mailing list
> users_at_[hidden]