Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2007-09-18 10:57:38


More information about this can be founded in the trac #1127
(https://svn.open-mpi.org/trac/ompi/ticket/1127).

   george.

On Sep 18, 2007, at 10:20 AM, Gleb Natapov wrote:

> On Tue, Sep 18, 2007 at 09:44:42AM -0400, George Bosilca wrote:
>> The setup of a communicators include as a last stage, a collective
>> communication. As a result, some of the nodes can exit the collective
>> before the others and therefore can start sending messages using this
>> communicator [while some of the other nodes are still waiting for the
>> collective completion]. This will lead to a situation where a node
>> receive
>> a message for a communicator that they are building up.
>>
>> There is a bug filled in trac about this. In FT-MPI we temporary
>> put these
>> messages in an internal queue, and deliver them to the right
>> communicator
>> only once this communicator is completely created.
> In ompi_comm_nextcid() function there is this code for thread_multiple
> case:
>
> /* for synchronization purposes, avoids receiving fragments for
> a communicator id, which might not yet been known. For single-
> threaded
> scenarios, this call is in ompi_comm_activate, for multi-threaded
> scenarios, it has to be already here ( before releasing another
> thread into the cid-allocation loop ) */
> (allredfnct)(&response, &glresponse, 1, MPI_MIN, comm, bridgecomm,
> local_leader, remote_leader, send_first );
>
> This collective is executed on old communicator after setup of a new
> cid. Is this not enough to solve the problem? Some ranks may leave
> this collective call earlier than others, but none can leave it before
> all ranks enter it and at this stage new communicator is already
> exists
> in all of them. Do I miss something?
>
>
>>
>> george.
>>
>> On Sep 18, 2007, at 9:06 AM, Gleb Natapov wrote:
>>
>>> George,
>>>
>>> In the comment you are saying that "a message for a not yet
>>> existing
>>> communicator can happen". Can you explain in what situation it can
>>> happen?
>>>
>>> Thanks,
>>>
>>> --
>>> Gleb.
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> Gleb.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



  • application/pkcs7-signature attachment: smime.p7s