Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...
From: Olivier Riff (oliriff_at_[hidden])
Date: 2010-09-24 08:48:08


That is already an answer that make sense. I understand that it is really
not a trivial issue. I have seen other recent threads about "running on
crashed nodes", and that the openmpi team is working hard on it. Well we
will wait and be glad to test the first versions when (I understand it will
take some time) they are released.

Thanks for this quick reply,

Olivier

2010/9/24 Jeff Squyres <jsquyres_at_[hidden]>

> Open MPI's fault tolerance is still somewhat rudimentary; it's a complex
> topic within the entire scope of MPI. There has been much research into MPI
> and fault tolerance over the years; the MPI Forum itself is grappling with
> terms and definitions that make sense. It's by no means a "solved" problem.
>
> It's unfortunately unsurprising that Open MPI may hang in the case of a
> node crash. I wish that I had a better answer for you, but I don't. :-\
>
>
> On Sep 24, 2010, at 3:36 AM, Olivier Riff wrote:
>
> > Hello,
> >
> > My question concerns the display of error message generated by a throw
> std::runtime_error("Explicit error message").
> > I am launching on a terminal an openMPI program on several machines
> using:
> > mpirun -v -machinefile MyMachineFile.txt MyProgram.
> > I am wondering why I cannot see an error message displayed on the
> terminal when one of my distant node (meaning not the node where the
> terminal is used) is crashing. I was expecting that following try catch
> could also generates a display in the terminal:
> > try {...My code where a crash happens... }
> > {
> > throw std::runtime_error( "Explicit error message" );
> > }
> >
> > Generally, my problem is that one of the node crashes and the global
> application waits forever data from this node. On the terminal, nothing is
> displayed indicating that the node has crashed and generated a useful
> information of the crash nature.
> >
> > ( I don't think these information are relevant here, but just in case: I
> am using openMPI 1.4.2, on a Mandriva 2008 system )
> >
> > Thanks in advance for any help/info/advice.
> >
> > Olivier
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>