Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r17766
From: Tim Mattox (timattox_at_[hidden])
Date: 2008-03-06 15:34:50


This still has a race condition... which can be dealt with using
opal_atomic stuff.
See below.

On Thu, Mar 6, 2008 at 2:35 PM, <rhc_at_[hidden]> wrote:
> Author: rhc
> Date: 2008-03-06 14:35:57 EST (Thu, 06 Mar 2008)
> New Revision: 17766
> URL: https://svn.open-mpi.org/trac/ompi/changeset/17766
>
> Log:
> Fix a race condition - ensure we don't call terminate in orterun more than once, even if the timeout fires while we are doing so
[snip]
> Modified: trunk/orte/tools/orterun/orterun.c
> ==============================================================================
> --- trunk/orte/tools/orterun/orterun.c (original)
> +++ trunk/orte/tools/orterun/orterun.c 2008-03-06 14:35:57 EST (Thu, 06 Mar 2008)
> @@ -112,14 +112,15 @@
> static bool want_prefix_by_default = (bool) ORTE_WANT_ORTERUN_PREFIX_BY_DEFAULT;
> static opal_event_t *orterun_event, *orteds_exit_event;
> static char *ompi_server=NULL;
> +static bool terminating=false;
>
[snip]
> @@ -644,6 +638,12 @@
> orte_proc_t **procs;
> orte_vpid_t i;
>
> + /* flag that we are here to avoid doing it twice */
> + if (terminating) {
> + return;
> + }
> + terminating = true;
> +
[snip]

I think this race condition should be dealt with like this:

#include "opal/sys/atomic.h"

static opal_atomic_lock_t terminating = OPAL_ATOMIC_UNLOCKED;

...

if (opal_atomic_trylock(&terminating)) { /* returns 1 if already locked */
    return;
}

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmattox_at_[hidden] || timattox_at_[hidden]
    I'm a bright... http://www.the-brights.net/