Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-09-24 08:16:55

Open MPI's fault tolerance is still somewhat rudimentary; it's a complex topic within the entire scope of MPI. There has been much research into MPI and fault tolerance over the years; the MPI Forum itself is grappling with terms and definitions that make sense. It's by no means a "solved" problem.

It's unfortunately unsurprising that Open MPI may hang in the case of a node crash. I wish that I had a better answer for you, but I don't. :-\

On Sep 24, 2010, at 3:36 AM, Olivier Riff wrote:

> Hello,
> My question concerns the display of error message generated by a throw std::runtime_error("Explicit error message").
> I am launching on a terminal an openMPI program on several machines using:
> mpirun -v -machinefile MyMachineFile.txt MyProgram.
> I am wondering why I cannot see an error message displayed on the terminal when one of my distant node (meaning not the node where the terminal is used) is crashing. I was expecting that following try catch could also generates a display in the terminal:
> try {...My code where a crash happens... }
> {
> throw std::runtime_error( "Explicit error message" );
> }
> Generally, my problem is that one of the node crashes and the global application waits forever data from this node. On the terminal, nothing is displayed indicating that the node has crashed and generated a useful information of the crash nature.
> ( I don't think these information are relevant here, but just in case: I am using openMPI 1.4.2, on a Mandriva 2008 system )
> Thanks in advance for any help/info/advice.
> Olivier
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
For corporate legal information go to: