Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in trunk/ompi/mca: btl/usnic rte
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-02-27 16:52:00


Just to clarify my point, since the 1.7 branch was mentioned in this thread. I didn't worry about USNIC calling abort because, as Jeff pointed out, we do so in other places. However, I do believe that we shouldn't be doing so (including in orte) because it isn't the role of a library to abort the process. We should report errors upward to the app and let it decide how to respond.

That said, I know we initially did it because we hit places where we couldn't propagate an error code (e.g., in a void routine called by the event lib). I've been working on resolving that in orte, but it still isn't complete.

Figure we should do the same to the MPI layer, recognizing that it will take time to complete

On Feb 27, 2014, at 1:48 PM, Rolf vandeVaart <rvandevaart_at_[hidden]> wrote:

> It could. I added that argument 4 years ago to support by my failover work with the BFO. It was a way for a BTL to pass some type of string back to the PML telling the PML who it was for verbose output to understand what was happening.
>
>> -----Original Message-----
>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Jeff Squyres
>> (jsquyres)
>> Sent: Thursday, February 27, 2014 4:22 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r30860 - in
>> trunk/ompi/mca: btl/usnic rte
>>
>> Speaking of which, shouldn't the OB1 error handler send the error message
>> string that it received as the 4th param to ompi_rte_abort() so that it can be
>> printed out?
>>
>>
>> Index: ompi/mca/pml/ob1/pml_ob1.c
>> ===========================================================
>> ========
>> --- ompi/mca/pml/ob1/pml_ob1.c (revision 30877)
>> +++ ompi/mca/pml/ob1/pml_ob1.c (working copy)
>> @@ -780,7 +780,7 @@
>> return;
>> }
>> #endif /* OPAL_CUDA_SUPPORT */
>> - ompi_rte_abort(-1, NULL);
>> + ompi_rte_abort(-1, btlinfo);
>> }
>>
>> #if OPAL_ENABLE_FT_CR == 0
>>
>>
>>
>> On Feb 27, 2014, at 1:12 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]>
>> wrote:
>>
>>> FWIW, the following BTLs all have calls to abort() or ompi_rte_abort() within
>> them:
>>>
>>> - usnic
>>> - openib
>>> - portals4
>>> - the btl base itself
>>>
>>>
>>> On Feb 27, 2014, at 7:16 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>>>> The majority of places we call abort in this commit is actually down in a
>> progress thread. We didn't think it was safe to call the PML error function in a
>> progress thread -- is that incorrect?
>>>>
>>>> If not, then we probably should create some mechanism for doing so. I
>> agree with George that we shouldn't call abort inside a library
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information. Any unauthorized review, use, disclosure or distribution
> is prohibited. If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel