Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?
From: Guanyinzhu (buptzhugy_at_[hidden])
Date: 2009-04-01 05:27:49


Hi!
  I'm using OpenMPI 1.3 on ten nodes connected with Gigabit Ethernet on Redhat Linux x86_64.

 

I run a test like this: just killed the orted process and the job hung for a long time (hang for 2~3 hours then I killed the job).

 

I have the follow questions:

    

     when network failed or host failed or orted deamon was killed by accident, How long would the running mpi job notice and exit?

     

     Does OpenMPI support a heartbeat mechanism or how could I fast detect the failture to avoid the mpi job hang£¿

 

 

thanks a lot!

 

_________________________________________________________________
´ò¹¤£¬ÕõÇ®£¬Âò·¿×Ó£¬¿ìÀ´MClubÒ»Æ𡱽ðÎݲؽ¿¡±£¡
http://club.msn.cn/?from=10