On Apr 14, 2011, at 5:33 AM, Jeff Squyres wrote:
> On Apr 14, 2011, at 4:02 AM, N.M. Maclaren wrote:
>> ... It's hopeless, and whatever you do will be wrong for many
>> people. ...
> I think that sums it up pretty well. :-)
> It does seem a little strange that the scenario you describe somewhat implies that one process is calling MPI_Finalize loooong before the others do. Specifically, the user is concerned with tying up resources after one process has called Finalize -- which implies that the others may continue on for a while. It's not invalid, of course, but it is a little unusual.
I'm finding it more common than we thought. Note that I didn't say that one process called MPI_Finalize before the others. In this case, they call it fairly close together, but the individual processes continue running for quite some time, or until they determine that something is wrong and exit with non-zero status.
> I see two possibilities here:
> 1. have the user delay calling MPI_Finalize in the application until it can do the test that indicates that the rest of the job should be aborted (i.e., so that it can still call MPI_Abort if it wants to). Don't forget that an implementation is allowed to block in MPI_Finalize until all processes call MPI_Finalize, anyway.
> 2. add an MCA param and/or orterun CLI option to abort a job if an MPI process terminates after MPI_Finalize with a nonzero exit status.
I figure this last is the best option. My point was just that we abort the job if someone calls "abort". However, if they indicate their program is exiting with "something is wrong", we ignore it.
Not that big a deal - the param was my option too. Just thought I'd raise it to the group since it had never been discussed.
> Just my $0.02. :-)
> Jeff Squyres
> For corporate legal information go to:
> devel mailing list