Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?
From: Guanyinzhu (buptzhugy_at_[hidden])
Date: 2009-04-01 08:07:12


I mean killed the orted deamon process during the mpi job running , but the mpi job hang and could't notice one of it's rank failed.

 

 

 
> Date: Wed, 1 Apr 2009 19:09:34 +0800
> From: ml.jgmbenoit_at_[hidden]
> To: users_at_[hidden]
> Subject: Re: [OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?
>
> Is there a firewall somewhere ?
>
> Jerome
>
> Guanyinzhu wrote:
> > Hi!
> > I'm using OpenMPI 1.3 on ten nodes connected with Gigabit Ethernet on
> > Redhat Linux x86_64.
> >
> > I run a test like this: just killed the orted process and the job hung
> > for a long time (hang for 2~3 hours then I killed the job).
> >
> > I have the follow questions:
> >
> > when network failed or host failed or orted deamon was killed by
> > accident, How long would the running mpi job notice and exit?
> >
> > Does OpenMPI support a heartbeat mechanism or how could I fast
> > detect the failture to avoid the mpi job hang?
> >
> >
> > thanks a lot!
> >
> >
> > ------------------------------------------------------------------------
> > ?MSN????,??????????! ????! <http://mobile.msn.com.cn/>
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_________________________________________________________________
Live SearchÊÓƵËÑË÷£¬¿ìËÙ¼ìË÷ÊÓƵµÄÀûÆ÷£¡
http://www.live.com/?scope=video