Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-08-25 09:10:24


David Maxwell has finally gotten the Coverity automation stuff
working to download our nightly trunk tarballs and run them through
their tool. I was skimming through the results this morning and
noticed one that looked odd to me: in osc_rdma_component.c:619, we have:

619 if (NULL == datatype) {
620 opal_output(ompi_osc_base_output,
621 "Error recreating datatype.
Aborting.");
622 ompi_mpi_abort(module->m_comm, 1, false);
623 }

And then shortly after line 623, we use/dereference datatype.
Coverity marked this as "possible NULL dereference".

"Hah", I thought, "Coverity doesn't realize that ompi_mpi_abort()
will always abort, so this is a false positive."

Just for the heckuvit, I checked: ompi_mpi_abort does *not* guarantee
to abort. Indeed, there are two cases where it may actually return:

1. We have logic in ompi_mpi_abort to prevent recursive invocation
(ompi_mpi_abort.c:60):

     /* Protection for recursive invocation */
     if (have_been_invoked) {
         return OMPI_SUCCESS;
     }
     have_been_invoked = true;

I added this check in r13354, but I don't remember the exact case in
which it was happening. :-\ I'm wondering if we should loop forever
(perhaps over sleep or something) instead of returning...

Do anyone know how it could be that ompi_mpi_abort() could be invoked
*recursively*? I can imagine in a THREAD_MULTIPLE scenario that
multiple threads could call ompi_mpi_abort (which is I *think* is why
I added that logic). In that case, do we have enough protection, or
do we really need to use atomic operations to test have_been_invoked?

2. In several places, we call orte_errmgr.error_detected(). I think
that we will *always* have the orte "proxy" errmgr component loaded
in MPI processes, in which case errmgr.error_detected() will always
call exit(), so I think we're ok there.

-- 
Jeff Squyres
Cisco Systems