Hi! 
  I'm using OpenMPI 1.3 on ten nodes connected with Gigabit Ethernet on Redhat Linux x86_64. 

 
I run a test like this: just killed the orted process and the job hung for a long time (hang for 2~3 hours then I killed the job).
 
I have the follow questions:
   
     when network failed or host failed or orted deamon was killed by accident, How long would the running mpi job notice and exit? 
    
     Does OpenMPI support a heartbeat mechanism or how could I fast detect the failture to avoid the mpi job hang?
 
 
thanks a lot!
 


把MSN装进手机,更多聊天乐趣等你挖掘! 立刻下载!