Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r17766
From: Ralph H Castain (rhc_at_[hidden])
Date: 2008-03-06 16:37:47


Thanks Tim - good suggestion! Had to modify your proposed code a tad to get
it to compile and work, but it is definitely a cleaner solution.

Ralph

On 3/6/08 1:34 PM, "Tim Mattox" <timattox_at_[hidden]> wrote:

> This still has a race condition... which can be dealt with using
> opal_atomic stuff.
> See below.
>
> On Thu, Mar 6, 2008 at 2:35 PM, <rhc_at_[hidden]> wrote:
>> Author: rhc
>> Date: 2008-03-06 14:35:57 EST (Thu, 06 Mar 2008)
>> New Revision: 17766
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/17766
>>
>> Log:
>> Fix a race condition - ensure we don't call terminate in orterun more than
>> once, even if the timeout fires while we are doing so
> [snip]
>> Modified: trunk/orte/tools/orterun/orterun.c
>>
>>
=============================================================================>>
=
>> --- trunk/orte/tools/orterun/orterun.c (original)
>> +++ trunk/orte/tools/orterun/orterun.c 2008-03-06 14:35:57 EST (Thu, 06 Mar
>> 2008)
>> @@ -112,14 +112,15 @@
>> static bool want_prefix_by_default = (bool)
>> ORTE_WANT_ORTERUN_PREFIX_BY_DEFAULT;
>> static opal_event_t *orterun_event, *orteds_exit_event;
>> static char *ompi_server=NULL;
>> +static bool terminating=false;
>>
> [snip]
>> @@ -644,6 +638,12 @@
>> orte_proc_t **procs;
>> orte_vpid_t i;
>>
>> + /* flag that we are here to avoid doing it twice */
>> + if (terminating) {
>> + return;
>> + }
>> + terminating = true;
>> +
> [snip]
>
> I think this race condition should be dealt with like this:
>
> #include "opal/sys/atomic.h"
>
> static opal_atomic_lock_t terminating = OPAL_ATOMIC_UNLOCKED;
>
> ...
>
> if (opal_atomic_trylock(&terminating)) { /* returns 1 if already locked */
> return;
> }
>