Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] fault tolerance in open mpi
From: jody (jody.xha_at_[hidden])
Date: 2009-08-03 09:21:11


Hi

I guess "task-farming" could give you a certain amount of the kind of
fault-tolerance you want.
(i.e. a master process distributes tasks to idle slave processors -
however, this will only work
if the slave processes don't need to communicate with each other)

Jody

On Mon, Aug 3, 2009 at 1:24 PM, vipin kumar<vipinkumar41_at_[hidden]> wrote:
> Hi all,
>
> Thanks Durga for your reply.
>
> Jeff, once you wrote code for Mandelbrot set to demonstrate fault tolerance
> in LAM-MPI. i. e. killing any slave process doesn't
> affect others. Exact behaviour I am looking for in Open MPI. I attempted,
> but no luck. Can you please tell how to write such programs in Open MPI.
>
> Thanks in advance.
>
> Regards,
> On Thu, Jul 9, 2009 at 8:30 PM, Durga Choudhury <dpchoudh_at_[hidden]> wrote:
>>
>> Although I have perhaps the least experience on the topic in this
>> list, I will take a shot; more experienced people, please correct me:
>>
>> MPI standards specify communication mechanism, not fault tolerance at
>> any level. You may achieve network tolerance at the IP level by
>> implementing 'equal cost multipath' routes (which means two equally
>> capable NIC cards connecting to the same destination and modifying the
>> kernel routing table to use both cards; the kernel will dynamically
>> load balance.). At the MAC level, you can achieve the same effect by
>> trunking multiple network cards.
>>
>> You can achieve process level fault tolerance by a checkpointing
>> scheme such as BLCR, which has been tested to work with OpenMPI (and
>> other processes as well)
>>
>> Durga
>>
>> On Thu, Jul 9, 2009 at 4:57 AM, vipin kumar<vipinkumar41_at_[hidden]> wrote:
>> >
>> > Hi all,
>> >
>> > I want to know whether open mpi supports Network and process fault
>> > tolerance
>> > or not? If there is any example demonstrating these features that will
>> > be
>> > best.
>> >
>> > Regards,
>> > --
>> > Vipin K.
>> > Research Engineer,
>> > C-DOTB, India
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> Vipin K.
> Research Engineer,
> C-DOTB, India
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>