Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] r23023 change to trunk causes problems with exit value
From: Rolf vandeVaart (rolf.vandevaart_at_[hidden])
Date: 2010-04-27 11:15:46


Ralph, did you get a chance to run the ibm/final test to see if these
changes fixed the problem? I just rebuilt the trunk and tried it and I
still get an exit status of 0 back. I will run it again to make sure I
have not made a mistake.

Rolf

On 04/26/10 23:43, Ralph Castain wrote:
> Okay, this should finally be fixed. See the commit message for r23045
> for an explanation.
>
> It really wasn't anything in the cited changeset that caused the
> problem. The root cause is that $#@$ abort file we dropped in the
> session dir to indicate you called MPI_Abort vs trying to thoroughly
> cleanup. Been biting us in the butt for years - finally removed it.
>
>
> On Apr 26, 2010, at 12:58 PM, Rolf vandeVaart wrote:
>
>> The ibm/final test does not call MPI_Abort directly. It is calling
>> MPI_Barrier after MPI_Finalize is called, which is a no-no. This is
>> detected and eventually the library calls ompi_mpi_abort(). This is
>> very similar to MPI_Abort() which ultimately calls ompi_mpi_abort as
>> well. So, I guess I am saying for all intents and purposes, it calls
>> MPI_Abort.
>>
>> Rolf
>>
>> On 04/26/10 14:41, Ralph Castain wrote:
>>> I'll try to keep it in mind as I continue the errmgr work. I gather these tests all call MPI_Abort?
>>>
>>>
>>> On Apr 26, 2010, at 12:31 PM, Rolf vandeVaart wrote:
>>>
>>>
>>>> With our MTT testing we have noticed a problem that has cropped up in the trunk. There are some tests that are supposed to return a non-zero status because they are getting errors, but are instead returning 0. This problem does not exist in r23022 but does exist in r23023.
>>>>
>>>> One can use the ibm/final test to reproduce the problem. An example of a passing case followed by a failing case is shown below.
>>>>
>>>> Ralph, you want me to open a ticket on this? Or do you just want to take a look. I am asking you since you did the r23023 commit.
>>>>
>>>> Rolf
>>>>
>>>>
>>>> TRUNK VERSION r23022:
>>>> [rolfv_at_burl-ct-x2200-6 environment]$ mpirun -np 1 -mca btl sm,self final
>>>> **************************************************************************
>>>> This test should generate a message about MPI is either not initialized or
>>>> has already been finialized.
>>>> ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!!
>>>> **************************************************************************
>>>> *** The MPI_Barrier() function was called after MPI_FINALIZE was invoked.
>>>> *** This is disallowed by the MPI standard.
>>>> *** Your MPI job will now abort.
>>>> [burl-ct-x2200-6:6072] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!
>>>> --------------------------------------------------------------------------
>>>> mpirun noticed that the job aborted, but has no info as to the process
>>>> that caused that situation.
>>>> --------------------------------------------------------------------------
>>>> [rolfv_at_burl-ct-x2200-6 environment]$ echo $status
>>>> 1
>>>> [rolfv_at_burl-ct-x2200-6 environment]$
>>>>
>>>>
>>>> TRUNK VERSION r23023:
>>>> [rolfv_at_burl-ct-x2200-6 environment]$ mpirun -np 1 -mca btl sm,self final
>>>> **************************************************************************
>>>> This test should generate a message about MPI is either not initialized or
>>>> has already been finialized.
>>>> ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!!
>>>> **************************************************************************
>>>> *** The MPI_Barrier() function was called after MPI_FINALIZE was invoked.
>>>> *** This is disallowed by the MPI standard.
>>>> *** Your MPI job will now abort.
>>>> [burl-ct-x2200-6:4089] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!
>>>> [rolfv_at_burl-ct-x2200-6 environment]$ echo $status
>>>> 0
>>>> [rolfv_at_burl-ct-x2200-6 environment]$
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden] <mailto:devel_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel