Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Duplicated modex issue.
From: Victor Kocheganov (victor.kocheganov_at_[hidden])
Date: 2012-12-20 09:57:06


Hi.

I have an issue with understanding /ompi_mpi_init() /logic. Could you
please tell me if you have any guesses about following behavior.

I wonder if I understand ringh, there is a block in /ompi_mpi_init()
/function for exchanging procs information between processes (denote
this block 'modex'):

         coll = OBJ_NEW(orte_grpcomm_collective_t);
         coll->id = orte_process_info.peer_modex;
         if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
             error = "orte_grpcomm_modex failed";
             goto error;
         }
         /* wait for modex to complete - this may be moved anywhere in
    mpi_init
          * so long as it occurs prior to calling a function that needs
          * the modex info!
          */
         while (coll->active) {
             opal_progress(); /* block in progress pending events */
         }
         OBJ_RELEASE(coll);

and several instructions after this there is a block for processes
synchronization (denote this block 'barrier'):

         coll = OBJ_NEW(orte_grpcomm_collective_t);
         coll->id = orte_process_info.peer_init_barrier;
         if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
             error = "orte_grpcomm_barrier failed";
             goto error;
         }
         /* wait for barrier to complete */
         while (coll->active) {
             opal_progress(); /* block in progress pending events */
         }
         OBJ_RELEASE(coll);

So,//initially///ompi_mpi_init()/ has following structure:

    ...
    'modex' block;
    ...
    'barrier' block;
    ...

I made several experiments with this code and the following one is of
interest: if I add sequence of two additional blocks, 'barrier' and
'modex', right after 'modex' block, then///ompi_mpi_init() /hangs in
/opal_progress()/ of the last 'modex' block.

    ...
    'modex' block;
    'barrier' block;
    'modex' block; <- hangs
    ...
    'barrier' block;
    ...

Thanks,
Victor Kocheganov.