On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres)
> To be totally clear: MPI says it is erroneous for only some (not all)
processes in a communicator to call MPI_COMM_FREE. So if that's the real
problem, then the discussion about why the parent(s) is(are) trying to
contact the children is moot -- the test is erroneous, and erroneous
application behavior is undefined.
This is definetly what happens : only some tasks call MPI_Comm_free()
i will commit my changes and the initially reported issue is solved :-)
about the "bonus points" :
v1.8 does not have this issue
i digged it and bottom line, the parent (who did not call MPI_Comm_free
unlike the children) calls ompi_dpm_base_dyn_finalize, which tries to isend
the already exited tasks.
bottom line, in pml_ob1_sendreq.h line 450
mca_bml_base_btl_array_get_size(&endpoint->btl_eager) = 0
nothing is sent but isend is reported successful
mca_bml_base_btl_array_get_size(&endpoint->btl_eager) = 1
and then try to send the message => BOUM
i found various things that seem counter intuitive to me and will summarize
all this tomorrow.