Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPI Forum question?
From: Larry Baker (baker_at_[hidden])
Date: 2010-04-29 19:07:13


I don't know if there is any standard ordering of non-zero exit status
codes. If so, another option would be to return the the largest
(smallest) value, when that is the most serious exit status.

Larry Baker
US Geological Survey

On Apr 29, 2010, at 3:52 PM, Ralph Castain wrote:

> I ran into something this week that I think may require
> consideration by the MPI Forum. Specifically, Rolf found a problem
> in their MTT runs where the tests expect mpirun to return a non-zero
> exit status because one or more application processes did so, even
> though all application procs terminate normally.
> I jury-rigged a simple algo that has mpirun return the exit status
> of the lowest rank that returned non-zero in the case where the job
> terminated normally. We still return the exit code of the first
> process to abnormally terminate (i.e., the process that is first
> reported to the HNP - may not be the first process that aborted).
> However, it begs the question - what is the actual behavior supposed
> to be in the case where all procs terminate normally, but some may
> return (possibly different) non-zero codes?
> I asked a few MPI users, and got a different answer from every one
> of them. Only consistent response I got was that the MPI standard
> doesn't say what should happen (can someone confirm that?).
> Here is a sampling of the responses:
> 1. return the exit status of the lowest rank that returned non-zero
> (which I implemented for now to silence Rolf's MTT problem)
> 2. return the exit status of the highest rank that returned non-zero
> 3. printout a histogram of exit statuses
> - ranks 0-9: 0
> - ranks 10-21,110: 1
> - ranks 22-35,40-51: 2
> ...
> 4. printout ALL the exit statuses
> 5. ignore it - mpirun's exit code should only reflect OMPI
> internals. It is the app developer's responsibility to properly deal
> with non-zero exit conditions (e.g., by calling MPI_Abort).
> When I circled back around with these alternatives, I got the
> expected answer: everyone felt that all of them were good, and
> wanted a cmd line option to select the behavior for their job. They
> also noted that --xml should cause any of them to output in a
> defined xml format.
> As I told Rolf, I honestly don't care what we do in this case. All I
> ask for is a clearly defined behavior so I don't get yanked in
> multiple directions, constantly circling around from one solution to
> the next.
> So if the MPI standard doesn't specify this behavior, could someone
> involved in the Forum -please- get it to address this??
> In the interim, what do -we- think it should do?
> Thanks
> Ralph
> _______________________________________________
> devel mailing list
> devel_at_[hidden]