i could not find anything wrong with loop_spawn and unless i am missing
something obvious :
from mtt http://mtt.open-mpi.org/index.php?do_redir=2196
all tests ran this month (both trunk and v1.8) failed (timeout) and there
was no error message such as
dpm_base_disconnect_init: error -12 in isend to process 1
loop_spawn tries to spawn 2000 tasks in 10 minutes.
my system is not fast enough to achieve this so the iteration count is
/* if time exceeded, then bump iteration count to the end */
the test would success in 10 minutes and a few seconds ( required to
complete the last spawn and MPI_Finalize())
the slurm timeout is set to 10 minutes exactly, so the job is aborted
before it has time to finish (and i believe it would have finished
you can either increase the slurm timeout (10min30s looks good to me),
decrease nseconds (570 looks good to me) in loop_spawn.c or run
mpirun ... dynamic/loop_spawn <nseconds>
where nseconds is "a bit less" than 600 seconds (once again, 570 looks good
did i miss something ?
On Wed, May 28, 2014 at 12:53 PM, Gilles Gouaillardet <
> On 2014/05/28 12:10, Ralph Castain wrote:
> > my understanding is that there are two ways of seeing things :
> > a) the "R-way" : the problem is the parent should not try to communicate
> to already exited processes
> > b) the "J-way" : the problem is the children should have waited either
> in MPI_Comm_free() or MPI_Finalize()
> > I don't think you can use option (b) - we can't have the children
> lingering around for the parent to call finalize, if I'm understanding you
> you understood me correctly.
> once again, i did not start investigating loop_spawn.
> in the case of intercomm_create, we would not run into this if the
> application had explicitly called MPI_Comm_free in the parent.
> so in this case *only*, and as explained by Jeff, b) could be an option
> to make OpenMPI happy.
> (to be blunt : if the user is not happy with children lingering around,
> he can explicitly call MPI_Comm_free before calling MPI_Comm_disconnect)
> i will start investigating loop_spawn from now
> devel mailing list
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: