Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?
From: Jerome BENOIT (ml.jgmbenoit_at_[hidden])
Date: 2009-04-01 07:09:34


Is there a firewall somewhere ?

Jerome

Guanyinzhu wrote:
> Hi!
> I'm using OpenMPI 1.3 on ten nodes connected with Gigabit Ethernet on
> Redhat Linux x86_64.
>
> I run a test like this: just killed the orted process and the job hung
> for a long time (hang for 2~3 hours then I killed the job).
>
> I have the follow questions:
>
> when network failed or host failed or orted deamon was killed by
> accident, How long would the running mpi job notice and exit?
>
> Does OpenMPI support a heartbeat mechanism or how could I fast
> detect the failture to avoid the mpi job hang?
>
>
> thanks a lot!
>
>
> ------------------------------------------------------------------------
> ?MSN????,??????????! ????! <http://mobile.msn.com.cn/>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users