On Dec 20, 2012, at 8:29 AM, Victor Kocheganov <victor.kocheganov@itseez.com> wrote:

Thanks for fast answer, Ralph.

In my example I use different collective objects. I mean in every mentioned block I call  coll = OBJ_NEW(orte_grpcomm_collective_t);  
and OBJ_RELEASE(coll); , so all the grpcomm operations use unique collective object. 

How are the procs getting the collective id for those new calls? They all have to match



On Thu, Dec 20, 2012 at 7:48 PM, Ralph Castain <rhc@open-mpi.org> wrote:
Absolutely it will hang as the collective object passed into any grpcomm operation (modex or barrier) is only allowed to be used once - any attempt to reuse it will fail.


On Dec 20, 2012, at 6:57 AM, Victor Kocheganov <victor.kocheganov@itseez.com> wrote:

Hi.

I have an issue with understanding  ompi_mpi_init() logic. Could you please tell me if you have any guesses about following behavior.

I wonder if I understand ringh, there is a block in ompi_mpi_init() function for exchanging procs information between processes (denote this block 'modex'):
    coll = OBJ_NEW(orte_grpcomm_collective_t);
    coll->id = orte_process_info.peer_modex;
    if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
        error = "orte_grpcomm_modex failed";
        goto error;
    }
    /* wait for modex to complete - this may be moved anywhere in mpi_init
     * so long as it occurs prior to calling a function that needs
     * the modex info!
     */
    while (coll->active) {
        opal_progress();  /* block in progress pending events */
    }
    OBJ_RELEASE(coll);
and several instructions after this there is a block for processes synchronization (denote this block 'barrier'):
    coll = OBJ_NEW(orte_grpcomm_collective_t);
    coll->id = orte_process_info.peer_init_barrier;
    if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
        error = "orte_grpcomm_barrier failed";
        goto error;
    }
    /* wait for barrier to complete */
    while (coll->active) {
        opal_progress();  /* block in progress pending events */
    }
    OBJ_RELEASE(coll);
So, initially ompi_mpi_init() has following structure:
...
'modex' block;
...
'barrier' block;
...
I made several experiments with this code and the following one is of interest: if I add sequence of two additional blocks, 'barrier' and 'modex', right after 'modex' block, then ompi_mpi_init() hangs in opal_progress() of the last 'modex' block.
...
'modex' block;
'barrier' block;
'modex' block; <- hangs
...
'barrier' block;
...
Thanks,
Victor Kocheganov.
_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel