Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-05-28 08:33:51


On May 28, 2014, at 4:45 AM, Gilles Gouaillardet <gilles.gouaillardet_at_[hidden]> wrote:

> Jeff,
>
> On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres)
> > To be totally clear: MPI says it is erroneous for only some (not all) processes in a communicator to call MPI_COMM_FREE. So if that's the real problem, then the discussion about why the parent(s) is(are) trying to contact the children is moot -- the test is erroneous, and erroneous application behavior is undefined.
>
> This is definetly what happens : only some tasks call MPI_Comm_free()

Really? I don't see how that can happen in loop_spawn - every process is clearly calling comm_free. Or are you referring to the intercomm_create test?

> i will commit my changes and the initially reported issue is solved :-)
>
>
>
> about the "bonus points" :
>
> v1.8 does not have this issue
>
> i digged it and bottom line, the parent (who did not call MPI_Comm_free unlike the children)

I see the parent doing it in every loop:

    MPI_Init( &argc, &argv);

    for (iter = 0; iter < 1000; ++iter) {
        MPI_Comm_spawn(EXE_TEST, NULL, 1, MPI_INFO_NULL,
                       0, MPI_COMM_WORLD, &comm, &err);
        printf("parent: MPI_Comm_spawn #%d return : %d\n", iter, err);

        MPI_Intercomm_merge(comm, 0, &merged);
        MPI_Comm_rank(merged, &rank);
        MPI_Comm_size(merged, &size);
        printf("parent: MPI_Comm_spawn #%d rank %d, size %d\n",
               iter, rank, size);
        MPI_Comm_free(&merged);
    }

    MPI_Finalize();

I suspect that you are talking about intercomm_create, hence my confusion.

> calls ompi_dpm_base_dyn_finalize, which tries to isend the already exited tasks.
>
>
> bottom line, in pml_ob1_sendreq.h line 450
>
> with v1,8
> mca_bml_base_btl_array_get_size(&endpoint->btl_eager) = 0
> nothing is sent but isend is reported successful
>
> with trunk
> mca_bml_base_btl_array_get_size(&endpoint->btl_eager) = 1
> and then try to send the message => BOUM
>
> i found various things that seem counter intuitive to me and will summarize all this tomorrow.
>
> Cheers,
>
> Gilles
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14884.php