Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25431
From: Tim Mattox (timattox_at_[hidden])
Date: 2011-11-06 09:42:03


Sorry for disturbing a hornet's nest about using assert() inside OMPI.
I agree that assert is great for development, and in certain specific
situations. I must have incorrectly remembered that there was an
effort to remove all forms of "exit" from the OMPI library if at all possible,
and I equate "assert()" with exit.

On Fri, Nov 4, 2011 at 9:15 PM, Graham, Richard L. <rlgraham_at_[hidden]> wrote:
> I agree with Brian
>
>
>
> Sent with Good (www.good.com)
>
>
>  -----Original Message-----
> From:   Barrett, Brian W [mailto:bwbarre_at_[hidden]]
> Sent:   Friday, November 04, 2011 07:46 PM Eastern Standard Time
> To:     Open MPI Developers
> Cc:     Christopher Yeoh
> Subject:        Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25431
>
> Why not?  I use asserts all the time in OMPI code.  A quick grep in
> ompi/mca says I'm not alone.  There are a whole bunch of places where I
> "know" a fact, such as a pointer never being NULL or consistency checks
> between two values.  These don't need to run in production; I've
> theoretically tested the crap out of the code before then.  But they are
> really useful when developing and debugging later.
>
> Just my $0.02.
>
> Brian
>
> On 11/4/11 5:18 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>
>>+1 -- we shouldn't be using assert().
>>
>>
>>On Nov 4, 2011, at 2:15 PM, Tim Mattox wrote:
>>
>>> I doubt you want to be using assert() here or anywhere in the
>>> OMPI library. If the rem_info.rem_qps pointer is NULL, I would
>>> think you want to do something other than just die.
>>> And, in a production build, if it was to ever be null, things would
>>> eventually crash right?  The ompi library should strive to never
>>>crash...
>>> maybe print an error message and abort, but flat out crashing
>>> seems wrong.
>>>
>>> I've been away from active OMPI development for awhile now so I
>>> might be remembering this incorrectly, but isn't there some
>>> policy/SOP on not using assert in the OMPI library code?
>>>
>>> Please excuse me while I return to being a silent/lurking ghost
>>> of an OMPI developer.
>>>
>>> On Thu, Nov 3, 2011 at 8:15 PM,  <cyeoh_at_[hidden]> wrote:
>>>> Author: cyeoh
>>>> Date: 2011-11-03 20:15:08 EDT (Thu, 03 Nov 2011)
>>>> New Revision: 25431
>>>> URL: hxxps://svn.open-mpi.org/trac/ompi/changeset/25431
>>>>
>>>> Log:
>>>> Removes pointless memmove which because of a previous memcpy will
>>>>always
>>>> have identical source and destination pointers. See #2871
>>>> Plugs a couple of minor memory leaks related to remote qp info
>>>>
>>>>
>>>> Text files modified:
>>>>   trunk/ompi/mca/btl/openib/btl_openib_endpoint.c            |     3
>>>>+++
>>>>   trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c |    22
>>>>++++++++++++----------
>>>>   2 files changed, 15 insertions(+), 10 deletions(-)
>>>>
>>>> Modified: trunk/ompi/mca/btl/openib/btl_openib_endpoint.c
>>>>
>>>>========================================================================
>>>>======
>>>> --- trunk/ompi/mca/btl/openib/btl_openib_endpoint.c     (original)
>>>> +++ trunk/ompi/mca/btl/openib/btl_openib_endpoint.c     2011-11-03
>>>>20:15:08 EDT (Thu, 03 Nov 2011)
>>>> @@ -452,6 +452,9 @@
>>>>     free(endpoint->qps);
>>>>     endpoint->qps = NULL;
>>>>
>>>> +    free(endpoint->rem_info.rem_qps);
>>>> +    free(endpoint->rem_info.rem_srqs);
>>>> +
>>>>     /* unregister xrc recv qp */
>>>>  #if HAVE_XRC
>>>>     if (0 != endpoint->xrc_recv_qp_num) {
>>>>
>>>> Modified: trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
>>>>
>>>>========================================================================
>>>>======
>>>> --- trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
>>>>(original)
>>>> +++ trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
>>>>2011-11-03 20:15:08 EDT (Thu, 03 Nov 2011)
>>>> @@ -272,19 +272,14 @@
>>>>  static int set_remote_info(mca_btl_base_endpoint_t* endpoint,
>>>>                            mca_btl_openib_rem_info_t* rem_info)
>>>>  {
>>>> +    /* Free up the memory pointed to by rem_qps before overwriting
>>>>the pointer
>>>> +       in the following memcpy */
>>>> +    free(endpoint->rem_info.rem_qps);
>>>> +
>>>>     /* copy the rem_info stuff */
>>>>     memcpy(&((mca_btl_openib_endpoint_t*) endpoint)->rem_info,
>>>>            rem_info, sizeof(mca_btl_openib_rem_info_t));
>>>>
>>>> -    /* copy over the rem qp info */
>>>> -    /* per #2871, changed this from memcpy() to memmove() to handle
>>>> -     * the case of overlapping (or same) src/dest addresses.
>>>> -     * However, we still *should* figure out why the src and dest
>>>> -     * addresses are sometimes the same. */
>>>> -    memmove(endpoint->rem_info.rem_qps,
>>>> -           rem_info->rem_qps, sizeof(mca_btl_openib_rem_qp_info_t) *
>>>> -           mca_btl_openib_component.num_qps);
>>>> -
>>>>     BTL_VERBOSE(("Setting QP info,  LID = %d",
>>>>endpoint->rem_info.rem_lid));
>>>>     return OMPI_SUCCESS;
>>>>  }
>>>> @@ -671,7 +666,12 @@
>>>>     uint8_t message_type;
>>>>     bool master;
>>>>
>>>> -    /* start by unpacking data first so we know who is knocking at
>>>> +    /* We later memcpy this whole structure. Make sure
>>>> +       that all the parameters are initialized, especially
>>>> +       the pointers */
>>>> +    memset(&rem_info,0, sizeof(rem_info));
>>>> +
>>>> +   /* start by unpacking data first so we know who is knocking at
>>>>        our door */
>>>>     BTL_VERBOSE(("unpacking %d of %d\n", cnt, OPAL_UINT8));
>>>>     rc = opal_dss.unpack(buffer, &message_type, &cnt, OPAL_UINT8);
>>>> @@ -857,6 +857,7 @@
>>>>                to CONNECTING, and then reply with our QP
>>>>                information */
>>>>             if (master) {
>>>> +               assert(rem_info.rem_qps != NULL);
>>>>                 rc = reply_start_connect(ib_endpoint, &rem_info);
>>>>             } else {
>>>>                 rc =
>>>>oob_module_start_connect(ib_endpoint->endpoint_local_cpc,
>>>> @@ -879,6 +880,7 @@
>>>>             break;
>>>>
>>>>         case MCA_BTL_IB_CONNECTING :
>>>> +           assert(rem_info.rem_qps != NULL);
>>>>             set_remote_info(ib_endpoint, &rem_info);
>>>>             if (OMPI_SUCCESS != (rc = qp_connect_all(ib_endpoint))) {
>>>>                 BTL_ERROR(("endpoint connect error: %d", rc));
>>>> _______________________________________________
>>>> svn-full mailing list
>>>> svn-full_at_[hidden]
>>>> hxxp://www.open-mpi.org/mailman/listinfo.cgi/svn-full
>>>>
>>>
>>>
>>>
>>> --
>>> Tim Mattox, Ph.D. - hxxp://homepage.mac.com/tmattox/
>>>  timattox_at_[hidden] || tmattox_at_[hidden]
>>>     I'm a bright... hxxp://www.the-brights.net/
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> hxxp://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>--
>>Jeff Squyres
>>jsquyres_at_[hidden]
>>For corporate legal information go to:
>>hxxp://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>>_______________________________________________
>>devel mailing list
>>devel_at_[hidden]
>>hxxp://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
>
> --
>  Brian W. Barrett
>  Dept. 1423: Scalable System Software
>  Sandia National Laboratories
>
>
>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> hxxp://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 timattox_at_[hidden] || tmattox_at_[hidden]
    I'm a bright... http://www.the-brights.net/