This still has a race condition... which can be dealt with using
opal_atomic stuff.
See below.
On Thu, Mar 6, 2008 at 2:35 PM, <rhc_at_[hidden]> wrote:
> Author: rhc
> Date: 2008-03-06 14:35:57 EST (Thu, 06 Mar 2008)
> New Revision: 17766
> URL: https://svn.open-mpi.org/trac/ompi/changeset/17766
>
> Log:
> Fix a race condition - ensure we don't call terminate in orterun more than once, even if the timeout fires while we are doing so
[snip]
> Modified: trunk/orte/tools/orterun/orterun.c
> ==============================================================================
> --- trunk/orte/tools/orterun/orterun.c (original)
> +++ trunk/orte/tools/orterun/orterun.c 2008-03-06 14:35:57 EST (Thu, 06 Mar 2008)
> @@ -112,14 +112,15 @@
> static bool want_prefix_by_default = (bool) ORTE_WANT_ORTERUN_PREFIX_BY_DEFAULT;
> static opal_event_t *orterun_event, *orteds_exit_event;
> static char *ompi_server=NULL;
> +static bool terminating=false;
>
[snip]
> @@ -644,6 +638,12 @@
> orte_proc_t **procs;
> orte_vpid_t i;
>
> + /* flag that we are here to avoid doing it twice */
> + if (terminating) {
> + return;
> + }
> + terminating = true;
> +
[snip]
I think this race condition should be dealt with like this:
#include "opal/sys/atomic.h"
static opal_atomic_lock_t terminating = OPAL_ATOMIC_UNLOCKED;
...
if (opal_atomic_trylock(&terminating)) { /* returns 1 if already locked */
return;
}
--
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
tmattox_at_[hidden] || timattox_at_[hidden]
I'm a bright... http://www.the-brights.net/
|