Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPI Forum question?
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-04-30 10:12:40


On Apr 30, 2010, at 6:15 AM, Jeff Squyres wrote:

> On Apr 30, 2010, at 5:59 AM, N.M. Maclaren wrote:
>
>> MPI quite rightly does not specify this, because the matter is very system-
>> dependent, and it is not possible to return the exit code (or display it)
>> in all environments. Sorry, but that is reality.
>
> Correct -- MPI intentionally does not say what happens after MPI_FINALIZE. MPI intentionally doesn't even specify much about how to start an MPI job (just like Fortran, actually).

Frankly, I disagree - I think the standard can and should say something explicit about this situation. It doesn't have to say how we implement it, but it should clearly explain to users what they should expect to see at the end of an MPI job.

Guess the real issue is: is the standard written for the general community, or solely for MPI implementers? If the latter, then saying nothing is fine. If the former, then it needs to clearly say something about this question.

>
>> The last paragraph of the specification of MPI_Finalize makes it clear
>> that it is the USER'S responsibility to return an exit code to the system
>> for process 0, and that what happens for other ones is undefined. Or
>> fairly clear - it could be stated in so many words, rather than being
>> implicit in the requirement on implementors.
>
> I don't think that's quite feasible, because the user doesn't directly control what mpirun returns. So (many) implementations *have* to choose something from their job start agent (mpirun or mpiexec or whatever).
>
> I think OMPI's behavior of returning 0 from mpirun if and only if all processes call MPI_FINALIZE successfully *and* return 0 is good. Return arbitrary nonzero if some process aborts (calling MPI_ABORT, not calling MPI_INIT, not calling MPI_FINALIZE, or otherwise). Return any of the individual MPI processes' non-zero exit status if all call MPI_FINALIZE but some (or all) don't return an exit status of 0 (I don't have a strong opinion about which one to return -- e.g., the *first* one to return a non-zero exit value, the *highest* or *lowest* non-zero exit status, ...etc.).

If that's the case, then I think the standard needs clearer language. My admittedly non-scientific poll indicates that users seem to think there is some expected behavior, and were surprised by the question.

So while the developer community may think it is okay as things stand, it was clear from my limited conversations that users all think something else is supposed to happen.

Just my $0.0002. As I said at the start of this thread, I don't care what solution we adopt for OMPI.

However, I -do- insist that their be a formal specification of OMPI's behavior - not the current "whatever you want" approach. Otherwise, I will continue to be hit with these ad hoc requests that it behave the way someone thinks it should, with no recourse to some defined behavior accepted by this community.

>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel