Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-09-11 11:05:21


On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote:
> We don't want to prevent two thread from entering the code is same time.
> The algorithm you cited support this case. There is only one moment that is
Are you sure it support this case? There is a global var mask_in_use
that prevent multiple access.

> critical. The local selection of the next available cid. And this is what
> we try to protect there. If after the first run, the collective call do not
> manage to figure out the correct next_cid then we will execute the while
> loop again. And then this condition make sense, as only the thread running
> on the smallest communicator cid will continue. This insure that it will
> pickup the smallest next available cid, and then it's reduce operation will
> succeed. The other threads will wait until the selection of the next
> available cid is unlocked.
>
> Without the code you removed we face a deadlock situation. Multiple threads
> will pick different next_cid on each process and thy will never succeed
> with the reduce operation. And this is what we're trying to avoid with the
> test.
OK. I think now I get the idea behind this test. I'll restore it and
leave ompi_comm_unregister_cid() fix in place. Is this OK?

>
> george.
>
> On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote:
>
>> On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:
>>> Gleb,
>>>
>>> This patch is not correct. The code preventing the registration of the
>>> same
>>> communicator twice is later in the code (same file in the function
>>> ompi_comm_register_cid line 326). Once the function
>>> ompi_comm_register_cid
>> I saw this code and the comment. The problem is not with the same
>> communicator but with different communicators.
>>
>>> is called, we know that each communicator only handle one "communicator
>>> creation" function at the same time. Therefore, we want to give priority
>>> to
>>> the smallest com_id, which is what happens in the code you removed.
>> The code I removed was doing it wrongly. I.e the algorithm sometimes is
>> executed
>> for different communicators simultaneously by different threads. Think
>> about the case where the function is running for cid 1 and then another
>> thread runs it for cid 0. cid 0 will proceed although the function is
>> executed on another CPU. And this is not something theoretical, that
>> is happening with sun's thread test suit mpi_coll test case.
>>
>>>
>>> Without the condition in the ompi_comm_register_cid (each communicator
>>> only
>>> get registered once) your comment make sense. However, with the condition
>>> your patch allow a dead end situation, while 2 processes try to create
>>> communicators in multiple threads, and they will never succeed, simply
>>> because they will not order the creation based on the com_id.
>> If the algorithm is really prone to deadlock in case it is concurrently
>> executed for several different communicators (I haven't check this),
>> then we may want to fix original code to really prevent two threads to
>> enter the function, but then I don't see the reason for all those
>> complications with ompi_comm_register_cid()/ompi_comm_unregister_cid()
>> The algorithm described here:
>> http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm&hl=en&ct=clnk&cd=2
>> in section 5.3 works without it and we can do something similar.
>>
>>>
>>> george.
>>>
>>>
>>>
>>> On Sep 11, 2007, at 9:23 AM, gleb_at_[hidden] wrote:
>>>
>>>> Author: gleb
>>>> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
>>>> New Revision: 16088
>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088
>>>>
>>>> Log:
>>>> The code tries to prevent itself from running for more then one
>>>> communicator
>>>> simultaneously, but is doing it incorrectly. If the function is running
>>>> already
>>>> for one communicator and it is called from another thread for other
>>>> communicator
>>>> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
>>>> will fail and the function will be executed for two different
>>>> communicators by
>>>> two threads simultaneously. There is nothing in the algorithm that
>>>> prevent
>>>> it
>>>> from been running simultaneously for different communicators as far as I
>>>> can see,
>>>> but ompi_comm_unregister_cid() assumes that it is always called for a
>>>> communicator
>>>> with the lowest cid and this is not always the case. This patch removes
>>>> bogus
>>>> lowest cid check and fix ompi_comm_register_cid() to properly remove cid
>>>> from
>>>> the list.
>>>>
>>>> Text files modified:
>>>> trunk/ompi/communicator/comm_cid.c | 24 ++++++++++++------------
>>>> 1 files changed, 12 insertions(+), 12 deletions(-)
>>>>
>>>> Modified: trunk/ompi/communicator/comm_cid.c
>>>> ==============================================================================
>>>> --- trunk/ompi/communicator/comm_cid.c (original)
>>>> +++ trunk/ompi/communicator/comm_cid.c 2007-09-11 09:23:46 EDT (Tue, 11
>>>> Sep 2007)
>>>> @@ -11,6 +11,7 @@
>>>> * All rights reserved.
>>>> * Copyright (c) 2006-2007 University of Houston. All rights reserved.
>>>> * Copyright (c) 2007 Cisco, Inc. All rights reserved.
>>>> + * Copyright (c) 2007 Voltaire All rights reserved.
>>>> * $COPYRIGHT$
>>>> *
>>>> * Additional copyrights may follow
>>>> @@ -170,15 +171,6 @@
>>>> * This is the real algorithm described in the doc
>>>> */
>>>>
>>>> - OPAL_THREAD_LOCK(&ompi_cid_lock);
>>>> - if (comm->c_contextid != ompi_comm_lowest_cid() ) {
>>>> - /* if not lowest cid, we do not continue, but sleep and
>>>> try again */
>>>> - OPAL_THREAD_UNLOCK(&ompi_cid_lock);
>>>> - continue;
>>>> - }
>>>> - OPAL_THREAD_UNLOCK(&ompi_cid_lock);
>>>> -
>>>> -
>>>> for (i=start; i < mca_pml.pml_max_contextid ; i++) {
>>>>
>>>> flag=ompi_pointer_array_test_and_set_item(&ompi_mpi_communicators,
>>>> i, comm);
>>>> @@ -365,10 +357,18 @@
>>>>
>>>> static int ompi_comm_unregister_cid (uint32_t cid)
>>>> {
>>>> - ompi_comm_reg_t *regcom=NULL;
>>>> - opal_list_item_t
>>>> *item=opal_list_remove_first(&ompi_registered_comms);
>>>> + ompi_comm_reg_t *regcom;
>>>> + opal_list_item_t *item;
>>>>
>>>> - regcom = (ompi_comm_reg_t *) item;
>>>> + for (item = opal_list_get_first(&ompi_registered_comms);
>>>> + item != opal_list_get_end(&ompi_registered_comms);
>>>> + item = opal_list_get_next(item)) {
>>>> + regcom = (ompi_comm_reg_t *)item;
>>>> + if(regcom->cid == cid) {
>>>> + opal_list_remove_item(&ompi_registered_comms, item);
>>>> + break;
>>>> + }
>>>> + }
>>>> OBJ_RELEASE(regcom);
>>>> return OMPI_SUCCESS;
>>>> }
>>>> _______________________________________________
>>>> svn-full mailing list
>>>> svn-full_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
>>>
>>
>>
>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> --
>> Gleb.
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
			Gleb.