Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-08-25 09:10:24

David Maxwell has finally gotten the Coverity automation stuff
working to download our nightly trunk tarballs and run them through
their tool. I was skimming through the results this morning and
noticed one that looked odd to me: in osc_rdma_component.c:619, we have:

619 if (NULL == datatype) {
620 opal_output(ompi_osc_base_output,
621 "Error recreating datatype.
622 ompi_mpi_abort(module->m_comm, 1, false);
623 }

And then shortly after line 623, we use/dereference datatype.
Coverity marked this as "possible NULL dereference".

"Hah", I thought, "Coverity doesn't realize that ompi_mpi_abort()
will always abort, so this is a false positive."

Just for the heckuvit, I checked: ompi_mpi_abort does *not* guarantee
to abort. Indeed, there are two cases where it may actually return:

1. We have logic in ompi_mpi_abort to prevent recursive invocation

     /* Protection for recursive invocation */
     if (have_been_invoked) {
         return OMPI_SUCCESS;
     have_been_invoked = true;

I added this check in r13354, but I don't remember the exact case in
which it was happening. :-\ I'm wondering if we should loop forever
(perhaps over sleep or something) instead of returning...

Do anyone know how it could be that ompi_mpi_abort() could be invoked
*recursively*? I can imagine in a THREAD_MULTIPLE scenario that
multiple threads could call ompi_mpi_abort (which is I *think* is why
I added that logic). In that case, do we have enough protection, or
do we really need to use atomic operations to test have_been_invoked?

2. In several places, we call orte_errmgr.error_detected(). I think
that we will *always* have the orte "proxy" errmgr component loaded
in MPI processes, in which case errmgr.error_detected() will always
call exit(), so I think we're ok there.

Jeff Squyres
Cisco Systems