Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2007-09-11 10:14:30


Gleb,

This patch is not correct. The code preventing the registration of
the same communicator twice is later in the code (same file in the
function ompi_comm_register_cid line 326). Once the function
ompi_comm_register_cid is called, we know that each communicator only
handle one "communicator creation" function at the same time.
Therefore, we want to give priority to the smallest com_id, which is
what happens in the code you removed.

Without the condition in the ompi_comm_register_cid (each
communicator only get registered once) your comment make sense.
However, with the condition your patch allow a dead end situation,
while 2 processes try to create communicators in multiple threads,
and they will never succeed, simply because they will not order the
creation based on the com_id.

   george.

On Sep 11, 2007, at 9:23 AM, gleb_at_[hidden] wrote:

> Author: gleb
> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
> New Revision: 16088
> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088
>
> Log:
> The code tries to prevent itself from running for more then one
> communicator
> simultaneously, but is doing it incorrectly. If the function is
> running already
> for one communicator and it is called from another thread for other
> communicator
> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
> will fail and the function will be executed for two different
> communicators by
> two threads simultaneously. There is nothing in the algorithm that
> prevent it
> from been running simultaneously for different communicators as far
> as I can see,
> but ompi_comm_unregister_cid() assumes that it is always called for
> a communicator
> with the lowest cid and this is not always the case. This patch
> removes bogus
> lowest cid check and fix ompi_comm_register_cid() to properly
> remove cid from
> the list.
>
> Text files modified:
> trunk/ompi/communicator/comm_cid.c | 24 ++++++++++++------------
> 1 files changed, 12 insertions(+), 12 deletions(-)
>
> Modified: trunk/ompi/communicator/comm_cid.c
> ======================================================================
> ========
> --- trunk/ompi/communicator/comm_cid.c (original)
> +++ trunk/ompi/communicator/comm_cid.c 2007-09-11 09:23:46 EDT
> (Tue, 11 Sep 2007)
> @@ -11,6 +11,7 @@
> * All rights reserved.
> * Copyright (c) 2006-2007 University of Houston. All rights
> reserved.
> * Copyright (c) 2007 Cisco, Inc. All rights reserved.
> + * Copyright (c) 2007 Voltaire All rights reserved.
> * $COPYRIGHT$
> *
> * Additional copyrights may follow
> @@ -170,15 +171,6 @@
> * This is the real algorithm described in the doc
> */
>
> - OPAL_THREAD_LOCK(&ompi_cid_lock);
> - if (comm->c_contextid != ompi_comm_lowest_cid() ) {
> - /* if not lowest cid, we do not continue, but
> sleep and try again */
> - OPAL_THREAD_UNLOCK(&ompi_cid_lock);
> - continue;
> - }
> - OPAL_THREAD_UNLOCK(&ompi_cid_lock);
> -
> -
> for (i=start; i < mca_pml.pml_max_contextid ; i++) {
> flag=ompi_pointer_array_test_and_set_item
> (&ompi_mpi_communicators,
> i, comm);
> @@ -365,10 +357,18 @@
>
> static int ompi_comm_unregister_cid (uint32_t cid)
> {
> - ompi_comm_reg_t *regcom=NULL;
> - opal_list_item_t *item=opal_list_remove_first
> (&ompi_registered_comms);
> + ompi_comm_reg_t *regcom;
> + opal_list_item_t *item;
>
> - regcom = (ompi_comm_reg_t *) item;
> + for (item = opal_list_get_first(&ompi_registered_comms);
> + item != opal_list_get_end(&ompi_registered_comms);
> + item = opal_list_get_next(item)) {
> + regcom = (ompi_comm_reg_t *)item;
> + if(regcom->cid == cid) {
> + opal_list_remove_item(&ompi_registered_comms, item);
> + break;
> + }
> + }
> OBJ_RELEASE(regcom);
> return OMPI_SUCCESS;
> }
> _______________________________________________
> svn-full mailing list
> svn-full_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full



  • application/pkcs7-signature attachment: smime.p7s