Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Detecting Node Failure
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-06-20 18:18:48


Not at present, no.

But you might want to look at a fork of the OMPI code base that was exploring fault resilience issues:

    http://fault-tolerance.org/

On Jun 20, 2013, at 5:57 PM, Andreas Schäfer <gentryx_at_[hidden]>
 wrote:

> On 14:59 Thu 20 Jun , Ralph Castain wrote:
>> It should detect and abort - what version are you using?
>
> Would it be possible to call MPI_Comm_disconnect() in the case the
> communicator in question is an intercom -- without having OMPI abort?
>
> I'm asking because if we had a possibility to dynamically
> connect/disconnect nodes in a robust way, then we could build
> fault-resilient apps on top of that.
>
> Best
> -Andreas
>
>
> --
> ==========================================================
> Andreas Schäfer
> HPC and Grid Computing
> Chair of Computer Science 3
> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
> +49 9131 85-27910
> PGP/GPG key via keyserver
> http://www.libgeodecomp.org
> ==========================================================
>
> (\___/)
> (+'.'+)
> (")_(")
> This is Bunny. Copy and paste Bunny into your
> signature to help him gain world domination!
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/