Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...
From: Olivier Riff (oliriff_at_[hidden])
Date: 2010-09-24 08:48:08

That is already an answer that make sense. I understand that it is really
not a trivial issue. I have seen other recent threads about "running on
crashed nodes", and that the openmpi team is working hard on it. Well we
will wait and be glad to test the first versions when (I understand it will
take some time) they are released.

Thanks for this quick reply,


2010/9/24 Jeff Squyres <jsquyres_at_[hidden]>

> Open MPI's fault tolerance is still somewhat rudimentary; it's a complex
> topic within the entire scope of MPI. There has been much research into MPI
> and fault tolerance over the years; the MPI Forum itself is grappling with
> terms and definitions that make sense. It's by no means a "solved" problem.
> It's unfortunately unsurprising that Open MPI may hang in the case of a
> node crash. I wish that I had a better answer for you, but I don't. :-\
> On Sep 24, 2010, at 3:36 AM, Olivier Riff wrote:
> > Hello,
> >
> > My question concerns the display of error message generated by a throw
> std::runtime_error("Explicit error message").
> > I am launching on a terminal an openMPI program on several machines
> using:
> > mpirun -v -machinefile MyMachineFile.txt MyProgram.
> > I am wondering why I cannot see an error message displayed on the
> terminal when one of my distant node (meaning not the node where the
> terminal is used) is crashing. I was expecting that following try catch
> could also generates a display in the terminal:
> > try {...My code where a crash happens... }
> > {
> > throw std::runtime_error( "Explicit error message" );
> > }
> >
> > Generally, my problem is that one of the node crashes and the global
> application waits forever data from this node. On the terminal, nothing is
> displayed indicating that the node has crashed and generated a useful
> information of the crash nature.
> >
> > ( I don't think these information are relevant here, but just in case: I
> am using openMPI 1.4.2, on a Mandriva 2008 system )
> >
> > Thanks in advance for any help/info/advice.
> >
> > Olivier
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> >
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> _______________________________________________
> users mailing list
> users_at_[hidden]