Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite
From: Gilles Gouaillardet (gilles.gouaillardet_at_[hidden])
Date: 2014-05-28 02:18:26


i could not find anything wrong with loop_spawn and unless i am missing
something obvious :

from mtt

all tests ran this month (both trunk and v1.8) failed (timeout) and there
was no error message such as
dpm_base_disconnect_init: error -12 in isend to process 1

loop_spawn tries to spawn 2000 tasks in 10 minutes.
my system is not fast enough to achieve this so the iteration count is
/* if time exceeded, then bump iteration count to the end */

the test would success in 10 minutes and a few seconds ( required to
complete the last spawn and MPI_Finalize())

the slurm timeout is set to 10 minutes exactly, so the job is aborted
before it has time to finish (and i believe it would have finished

you can either increase the slurm timeout (10min30s looks good to me),
decrease nseconds (570 looks good to me) in loop_spawn.c or run
mpirun ... dynamic/loop_spawn <nseconds>
where nseconds is "a bit less" than 600 seconds (once again, 570 looks good
to me)

did i miss something ?



On Wed, May 28, 2014 at 12:53 PM, Gilles Gouaillardet <
gilles.gouaillardet_at_[hidden]> wrote:

> Ralph,
> On 2014/05/28 12:10, Ralph Castain wrote:
> > my understanding is that there are two ways of seeing things :
> > a) the "R-way" : the problem is the parent should not try to communicate
> to already exited processes
> > b) the "J-way" : the problem is the children should have waited either
> in MPI_Comm_free() or MPI_Finalize()
> > I don't think you can use option (b) - we can't have the children
> lingering around for the parent to call finalize, if I'm understanding you
> correctly.
> you understood me correctly.
> once again, i did not start investigating loop_spawn.
> in the case of intercomm_create, we would not run into this if the
> application had explicitly called MPI_Comm_free in the parent.
> so in this case *only*, and as explained by Jeff, b) could be an option
> to make OpenMPI happy.
> (to be blunt : if the user is not happy with children lingering around,
> he can explicitly call MPI_Comm_free before calling MPI_Comm_disconnect)
> i will start investigating loop_spawn from now
> Cheers,
> Gilles
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription:
> Link to this post: