Point well made, Nick. In other words, irrespective of OS or language, are we citing the need for "application correcting code" from OpenMPI, (relocate a/o retry) similar to ECC in memory?


On Thu, 2011-04-14 at 14:31 +0100, N.M. Maclaren wrote:
On Apr 14 2011, Ralph Castain wrote:
>>> ...  It's hopeless, and whatever you do will be wrong for many
>>> people.  ...
>> I think that sums it up pretty well.  :-)
>> It does seem a little strange that the scenario you describe somewhat 
>> implies that one process is calling MPI_Finalize loooong before the 
>> others do. Specifically, the user is concerned with tying up resources 
>> after one process has called Finalize -- which implies that the others 
>> may continue on for a while. It's not invalid, of course, but it is a 
>> little unusual.
> I'm finding it more common than we thought. Note that I didn't say that 
> one process called MPI_Finalize before the others. In this case, they 
> call it fairly close together, but the individual processes continue 
> running for quite some time, or until they determine that something is 
> wrong and exit with non-zero status.

Nobody is denying that it is common.  Now, what happens when you encounter
a language or compiler that uses return codes for mere warnings (e.g.
ignored IEEE 754 flags, as stated to be desirable by LIA-1)?  Bang!

Remember that C is not the universe and many languages use MPI via the
C interface, but do not let C control their model.

Nick Maclaren.

devel mailing list

Kenneth A. Lloyd
CEO - Director of Systems Science
Watt Systems Technologies Inc.

This e-mail is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521 and is intended only for the addressee named above. It may contain privileged or confidential information. If you are not the addressee you must not copy, distribute, disclose or use any of the information in it. If you have received it in error please delete it and immediately notify the sender.