Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Duplicated modex issue.
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-12-20 10:48:49


Absolutely it will hang as the collective object passed into any grpcomm operation (modex or barrier) is only allowed to be used once - any attempt to reuse it will fail.

On Dec 20, 2012, at 6:57 AM, Victor Kocheganov <victor.kocheganov_at_[hidden]> wrote:

> Hi.
>
> I have an issue with understanding ompi_mpi_init() logic. Could you please tell me if you have any guesses about following behavior.
>
> I wonder if I understand ringh, there is a block in ompi_mpi_init() function for exchanging procs information between processes (denote this block 'modex'):
> coll = OBJ_NEW(orte_grpcomm_collective_t);
> coll->id = orte_process_info.peer_modex;
> if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
> error = "orte_grpcomm_modex failed";
> goto error;
> }
> /* wait for modex to complete - this may be moved anywhere in mpi_init
> * so long as it occurs prior to calling a function that needs
> * the modex info!
> */
> while (coll->active) {
> opal_progress(); /* block in progress pending events */
> }
> OBJ_RELEASE(coll);
> and several instructions after this there is a block for processes synchronization (denote this block 'barrier'):
> coll = OBJ_NEW(orte_grpcomm_collective_t);
> coll->id = orte_process_info.peer_init_barrier;
> if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
> error = "orte_grpcomm_barrier failed";
> goto error;
> }
> /* wait for barrier to complete */
> while (coll->active) {
> opal_progress(); /* block in progress pending events */
> }
> OBJ_RELEASE(coll);
> So, initially ompi_mpi_init() has following structure:
> ...
> 'modex' block;
> ...
> 'barrier' block;
> ...
> I made several experiments with this code and the following one is of interest: if I add sequence of two additional blocks, 'barrier' and 'modex', right after 'modex' block, then ompi_mpi_init() hangs in opal_progress() of the last 'modex' block.
> ...
> 'modex' block;
> 'barrier' block;
> 'modex' block; <- hangs
> ...
> 'barrier' block;
> ...
> Thanks,
> Victor Kocheganov.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel