Okay, this should finally be fixed. See the commit message for r23045 for an explanation.

It really wasn't anything in the cited changeset that caused the problem. The root cause is that $#@$ abort file we dropped in the session dir to indicate you called MPI_Abort vs trying to thoroughly cleanup. Been biting us in the butt for years - finally removed it.


On Apr 26, 2010, at 12:58 PM, Rolf vandeVaart wrote:

The ibm/final test does not call MPI_Abort directly.  It is calling MPI_Barrier after MPI_Finalize is called, which is a no-no.  This is detected and eventually the library calls ompi_mpi_abort().  This is very similar to MPI_Abort() which ultimately calls ompi_mpi_abort as well.  So, I guess I am saying for all intents and purposes, it calls MPI_Abort.

Rolf

On 04/26/10 14:41, Ralph Castain wrote:
I'll try to keep it in mind as I continue the errmgr work. I gather these tests all call MPI_Abort?


On Apr 26, 2010, at 12:31 PM, Rolf vandeVaart wrote:

  
With our MTT testing we have noticed a problem that has cropped up in the trunk.  There are some tests that are supposed to return a non-zero status because they are getting errors, but are instead returning 0.  This problem does not exist in r23022 but does exist in r23023.

One can use the ibm/final test to reproduce the problem.  An example of a passing case followed by a failing case is shown below.

Ralph, you want me to open a ticket on this?  Or do you just want to take a look.  I am asking you since you did the r23023 commit.

Rolf


TRUNK VERSION r23022:
[rolfv@burl-ct-x2200-6 environment]$ mpirun -np 1 -mca btl sm,self final
**************************************************************************
This test should generate a message about MPI is either not initialized or
has already been finialized.
ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!!
**************************************************************************
*** The MPI_Barrier() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[burl-ct-x2200-6:6072] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
[rolfv@burl-ct-x2200-6 environment]$ echo $status
1
[rolfv@burl-ct-x2200-6 environment]$


TRUNK VERSION r23023:
[rolfv@burl-ct-x2200-6 environment]$ mpirun -np 1 -mca btl sm,self final
**************************************************************************
This test should generate a message about MPI is either not initialized or
has already been finialized.
ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!!
**************************************************************************
*** The MPI_Barrier() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[burl-ct-x2200-6:4089] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!
[rolfv@burl-ct-x2200-6 environment]$ echo $status
0
[rolfv@burl-ct-x2200-6 environment]$

_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
    


_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  

_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel