This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
Is that kind of approach possible within an MPI framework? Perhaps a
grid approach would be better. More experienced people, speak up,
(The reason I say that is that I too am interested in the solution of
that kind of problem, where an individual blade of a blade server
fails and correcting for that failure on the fly is better than taking
checkpoints and restarting the whole process excluding the failed
On Mon, Aug 3, 2009 at 9:21 AM, jody<jody.xha_at_[hidden]> wrote:
> I guess "task-farming" could give you a certain amount of the kind of
> fault-tolerance you want.
> (i.e. a master process distributes tasks to idle slave processors -
> however, this will only work
> if the slave processes don't need to communicate with each other)
> On Mon, Aug 3, 2009 at 1:24 PM, vipin kumar<vipinkumar41_at_[hidden]> wrote:
>> Hi all,
>> Thanks Durga for your reply.
>> Jeff, once you wrote code for Mandelbrot set to demonstrate fault tolerance
>> in LAM-MPI. i. e. killing any slave process doesn't
>> affect others. Exact behaviour I am looking for in Open MPI. I attempted,
>> but no luck. Can you please tell how to write such programs in Open MPI.
>> Thanks in advance.
>> On Thu, Jul 9, 2009 at 8:30 PM, Durga Choudhury <dpchoudh_at_[hidden]> wrote:
>>> Although I have perhaps the least experience on the topic in this
>>> list, I will take a shot; more experienced people, please correct me:
>>> MPI standards specify communication mechanism, not fault tolerance at
>>> any level. You may achieve network tolerance at the IP level by
>>> implementing 'equal cost multipath' routes (which means two equally
>>> capable NIC cards connecting to the same destination and modifying the
>>> kernel routing table to use both cards; the kernel will dynamically
>>> load balance.). At the MAC level, you can achieve the same effect by
>>> trunking multiple network cards.
>>> You can achieve process level fault tolerance by a checkpointing
>>> scheme such as BLCR, which has been tested to work with OpenMPI (and
>>> other processes as well)
>>> On Thu, Jul 9, 2009 at 4:57 AM, vipin kumar<vipinkumar41_at_[hidden]> wrote:
>>> > Hi all,
>>> > I want to know whether open mpi supports Network and process fault
>>> > tolerance
>>> > or not? If there is any example demonstrating these features that will
>>> > be
>>> > best.
>>> > Regards,
>>> > --
>>> > Vipin K.
>>> > Research Engineer,
>>> > C-DOTB, India
>>> > _______________________________________________
>>> > users mailing list
>>> > users_at_[hidden]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> users mailing list
>> Vipin K.
>> Research Engineer,
>> C-DOTB, India
>> users mailing list
> users mailing list