Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-09-11 11:35:41


On Tue, Sep 11, 2007 at 11:30:53AM -0400, George Bosilca wrote:
>
> On Sep 11, 2007, at 11:05 AM, Gleb Natapov wrote:
>
>> On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote:
>>> We don't want to prevent two thread from entering the code is same time.
>>> The algorithm you cited support this case. There is only one moment that
>>> is
>> Are you sure it support this case? There is a global var mask_in_use
>> that prevent multiple access.
>
> I'm unable to find the mask_in_use global variable. Where it is ?
I thought by "algorithm you cited" you meant the algorithm described in
the link I provided. There is mask_in_use global var there that IMO
ensure that algorithm is executed for only one communicator
simultaneously.

>
> george.
>
>>
>>> critical. The local selection of the next available cid. And this is what
>>> we try to protect there. If after the first run, the collective call do
>>> not
>>> manage to figure out the correct next_cid then we will execute the while
>>> loop again. And then this condition make sense, as only the thread
>>> running
>>> on the smallest communicator cid will continue. This insure that it will
>>> pickup the smallest next available cid, and then it's reduce operation
>>> will
>>> succeed. The other threads will wait until the selection of the next
>>> available cid is unlocked.
>>>
>>> Without the code you removed we face a deadlock situation. Multiple
>>> threads
>>> will pick different next_cid on each process and thy will never succeed
>>> with the reduce operation. And this is what we're trying to avoid with
>>> the
>>> test.
>> OK. I think now I get the idea behind this test. I'll restore it and
>> leave ompi_comm_unregister_cid() fix in place. Is this OK?
>>
>>>
>>> george.
>>>
>>> On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote:
>>>
>>>> On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:
>>>>> Gleb,
>>>>>
>>>>> This patch is not correct. The code preventing the registration of the
>>>>> same
>>>>> communicator twice is later in the code (same file in the function
>>>>> ompi_comm_register_cid line 326). Once the function
>>>>> ompi_comm_register_cid
>>>> I saw this code and the comment. The problem is not with the same
>>>> communicator but with different communicators.
>>>>
>>>>> is called, we know that each communicator only handle one "communicator
>>>>> creation" function at the same time. Therefore, we want to give
>>>>> priority
>>>>> to
>>>>> the smallest com_id, which is what happens in the code you removed.
>>>> The code I removed was doing it wrongly. I.e the algorithm sometimes is
>>>> executed
>>>> for different communicators simultaneously by different threads. Think
>>>> about the case where the function is running for cid 1 and then another
>>>> thread runs it for cid 0. cid 0 will proceed although the function is
>>>> executed on another CPU. And this is not something theoretical, that
>>>> is happening with sun's thread test suit mpi_coll test case.
>>>>
>>>>>
>>>>> Without the condition in the ompi_comm_register_cid (each communicator
>>>>> only
>>>>> get registered once) your comment make sense. However, with the
>>>>> condition
>>>>> your patch allow a dead end situation, while 2 processes try to create
>>>>> communicators in multiple threads, and they will never succeed, simply
>>>>> because they will not order the creation based on the com_id.
>>>> If the algorithm is really prone to deadlock in case it is concurrently
>>>> executed for several different communicators (I haven't check this),
>>>> then we may want to fix original code to really prevent two threads to
>>>> enter the function, but then I don't see the reason for all those
>>>> complications with ompi_comm_register_cid()/ompi_comm_unregister_cid()
>>>> The algorithm described here:
>>>> http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm&hl=en&ct=clnk&cd=2
>>>> in section 5.3 works without it and we can do something similar.
>>>>
>>>>>
>>>>> george.
>>>>>
>>>>>
>>>>>
>>>>> On Sep 11, 2007, at 9:23 AM, gleb_at_[hidden] wrote:
>>>>>
>>>>>> Author: gleb
>>>>>> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
>>>>>> New Revision: 16088
>>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088
>>>>>>
>>>>>> Log:
>>>>>> The code tries to prevent itself from running for more then one
>>>>>> communicator
>>>>>> simultaneously, but is doing it incorrectly. If the function is
>>>>>> running
>>>>>> already
>>>>>> for one communicator and it is called from another thread for other
>>>>>> communicator
>>>>>> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
>>>>>> will fail and the function will be executed for two different
>>>>>> communicators by
>>>>>> two threads simultaneously. There is nothing in the algorithm that
>>>>>> prevent
>>>>>> it
>>>>>> from been running simultaneously for different communicators as far as
>>>>>> I
>>>>>> can see,
>>>>>> but ompi_comm_unregister_cid() assumes that it is always called for a
>>>>>> communicator
>>>>>> with the lowest cid and this is not always the case. This patch
>>>>>> removes
>>>>>> bogus
>>>>>> lowest cid check and fix ompi_comm_register_cid() to properly remove
>>>>>> cid
>>>>>> from
>>>>>> the list.
>>>>>>
>>>>>> Text files modified:
>>>>>> trunk/ompi/communicator/comm_cid.c | 24 ++++++++++++------------
>>>>>> 1 files changed, 12 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> Modified: trunk/ompi/communicator/comm_cid.c
>>>>>> ==============================================================================
>>>>>> --- trunk/ompi/communicator/comm_cid.c (original)
>>>>>> +++ trunk/ompi/communicator/comm_cid.c 2007-09-11 09:23:46 EDT (Tue,
>>>>>> 11
>>>>>> Sep 2007)
>>>>>> @@ -11,6 +11,7 @@
>>>>>> * All rights reserved.
>>>>>> * Copyright (c) 2006-2007 University of Houston. All rights
>>>>>> reserved.
>>>>>> * Copyright (c) 2007 Cisco, Inc. All rights reserved.
>>>>>> + * Copyright (c) 2007 Voltaire All rights reserved.
>>>>>> * $COPYRIGHT$
>>>>>> *
>>>>>> * Additional copyrights may follow
>>>>>> @@ -170,15 +171,6 @@
>>>>>> * This is the real algorithm described in the doc
>>>>>> */
>>>>>>
>>>>>> - OPAL_THREAD_LOCK(&ompi_cid_lock);
>>>>>> - if (comm->c_contextid != ompi_comm_lowest_cid() ) {
>>>>>> - /* if not lowest cid, we do not continue, but sleep
>>>>>> and
>>>>>> try again */
>>>>>> - OPAL_THREAD_UNLOCK(&ompi_cid_lock);
>>>>>> - continue;
>>>>>> - }
>>>>>> - OPAL_THREAD_UNLOCK(&ompi_cid_lock);
>>>>>> -
>>>>>> -
>>>>>> for (i=start; i < mca_pml.pml_max_contextid ; i++) {
>>>>>>
>>>>>> flag=ompi_pointer_array_test_and_set_item(&ompi_mpi_communicators,
>>>>>> i, comm);
>>>>>> @@ -365,10 +357,18 @@
>>>>>>
>>>>>> static int ompi_comm_unregister_cid (uint32_t cid)
>>>>>> {
>>>>>> - ompi_comm_reg_t *regcom=NULL;
>>>>>> - opal_list_item_t
>>>>>> *item=opal_list_remove_first(&ompi_registered_comms);
>>>>>> + ompi_comm_reg_t *regcom;
>>>>>> + opal_list_item_t *item;
>>>>>>
>>>>>> - regcom = (ompi_comm_reg_t *) item;
>>>>>> + for (item = opal_list_get_first(&ompi_registered_comms);
>>>>>> + item != opal_list_get_end(&ompi_registered_comms);
>>>>>> + item = opal_list_get_next(item)) {
>>>>>> + regcom = (ompi_comm_reg_t *)item;
>>>>>> + if(regcom->cid == cid) {
>>>>>> + opal_list_remove_item(&ompi_registered_comms, item);
>>>>>> + break;
>>>>>> + }
>>>>>> + }
>>>>>> OBJ_RELEASE(regcom);
>>>>>> return OMPI_SUCCESS;
>>>>>> }
>>>>>> _______________________________________________
>>>>>> svn-full mailing list
>>>>>> svn-full_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
>>>>>
>>>>
>>>>
>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> --
>>>> Gleb.
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> --
>> Gleb.
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
			Gleb.