Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-05-28 08:33:51


On May 28, 2014, at 4:45 AM, Gilles Gouaillardet <gilles.gouaillardet_at_[hidden]> wrote:

> Jeff,
>
> On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres)
> > To be totally clear: MPI says it is erroneous for only some (not all) processes in a communicator to call MPI_COMM_FREE. So if that's the real problem, then the discussion about why the parent(s) is(are) trying to contact the children is moot -- the test is erroneous, and erroneous application behavior is undefined.
>
> This is definetly what happens : only some tasks call MPI_Comm_free()

Really? I don't see how that can happen in loop_spawn - every process is clearly calling comm_free. Or are you referring to the intercomm_create test?

> i will commit my changes and the initially reported issue is solved :-)
>
>
>
> about the "bonus points" :
>
> v1.8 does not have this issue
>
> i digged it and bottom line, the parent (who did not call MPI_Comm_free unlike the children)

I see the parent doing it in every loop:

    MPI_Init( &argc, &argv);

    for (iter = 0; iter < 1000; ++iter) {
        MPI_Comm_spawn(EXE_TEST, NULL, 1, MPI_INFO_NULL,
                       0, MPI_COMM_WORLD, &comm, &err);
        printf("parent: MPI_Comm_spawn #%d return : %d\n", iter, err);

        MPI_Intercomm_merge(comm, 0, &merged);
        MPI_Comm_rank(merged, &rank);
        MPI_Comm_size(merged, &size);
        printf("parent: MPI_Comm_spawn #%d rank %d, size %d\n",
               iter, rank, size);
        MPI_Comm_free(&merged);
    }

    MPI_Finalize();

I suspect that you are talking about intercomm_create, hence my confusion.

> calls ompi_dpm_base_dyn_finalize, which tries to isend the already exited tasks.
>
>
> bottom line, in pml_ob1_sendreq.h line 450
>
> with v1,8
> mca_bml_base_btl_array_get_size(&endpoint->btl_eager) = 0
> nothing is sent but isend is reported successful
>
> with trunk
> mca_bml_base_btl_array_get_size(&endpoint->btl_eager) = 1
> and then try to send the message => BOUM
>
> i found various things that seem counter intuitive to me and will summarize all this tomorrow.
>
> Cheers,
>
> Gilles
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14884.php