Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-09-18 10:20:04


On Tue, Sep 18, 2007 at 09:44:42AM -0400, George Bosilca wrote:
> The setup of a communicators include as a last stage, a collective
> communication. As a result, some of the nodes can exit the collective
> before the others and therefore can start sending messages using this
> communicator [while some of the other nodes are still waiting for the
> collective completion]. This will lead to a situation where a node receive
> a message for a communicator that they are building up.
>
> There is a bug filled in trac about this. In FT-MPI we temporary put these
> messages in an internal queue, and deliver them to the right communicator
> only once this communicator is completely created.
In ompi_comm_nextcid() function there is this code for thread_multiple
case:

 /* for synchronization purposes, avoids receiving fragments for
    a communicator id, which might not yet been known. For single-threaded
    scenarios, this call is in ompi_comm_activate, for multi-threaded
    scenarios, it has to be already here ( before releasing another
    thread into the cid-allocation loop ) */
 (allredfnct)(&response, &glresponse, 1, MPI_MIN, comm, bridgecomm,
                     local_leader, remote_leader, send_first );

This collective is executed on old communicator after setup of a new
cid. Is this not enough to solve the problem? Some ranks may leave
this collective call earlier than others, but none can leave it before
all ranks enter it and at this stage new communicator is already exists
in all of them. Do I miss something?

>
> george.
>
> On Sep 18, 2007, at 9:06 AM, Gleb Natapov wrote:
>
>> George,
>>
>> In the comment you are saying that "a message for a not yet existing
>> communicator can happen". Can you explain in what situation it can
>> happen?
>>
>> Thanks,
>>
>> --
>> Gleb.
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
			Gleb.