Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Simple program (103 lines) makes Open-1.4.3 hang
From: George Bosilca (bosilca_at_[hidden])
Date: 2010-11-23 18:03:04


If you know the max size of the receives I would take a different approach. Post few persistent receives, and manage them in a circular buffer. Instead of doing an MPI_Iprobe, use MPI_Test on the current head of your circular buffer. Once you use the data related to the receive, just do an MPI_Start on your request.

This approach will minimize the unexpected messages, and drain the connections faster. Moreover, at the end it is very easy to MPI_Cancel all the receives not yet matched.

  george.

On Nov 23, 2010, at 17:43 , Sébastien Boisvert wrote:

> Le mardi 23 novembre 2010 à 17:38 -0500, George Bosilca a écrit :
>> The eager size reported by ompi_info includes the Open MPI internal headers. They are anywhere between 20 and 64 bytes long (potentially more for some particular networks), so what Eugene suggested was a safe boundary.
>
> I see.
>
>>
>> Moreover, eager send can improve performance if and only if the matching receives are already posted on the peer. If not, the data will become unexpected, and there will be one additional memcpy.
>
> So it won't improve performance in my application (Ray,
> http://denovoassembler.sf.net) because I use MPI_Iprobe to check for
> incoming messages, which means any receive (MPI_Recv) is never posted
> before any send (MPI_Isend).
>
> Thanks, this thread is very informative for me !
>
>>
>> george.
>>
>> On Nov 23, 2010, at 17:29 , Sébastien Boisvert wrote:
>>
>>> Le mardi 23 novembre 2010 à 16:07 -0500, Eugene Loh a écrit :
>>>> Sébastien Boisvert wrote:
>>>>
>>>>> Now I can describe the cases.
>>>>>
>>>>>
>>>> The test cases can all be explained by the test requiring eager messages
>>>> (something that test4096.cpp does not require).
>>>>
>>>>> Case 1: 30 MPI ranks, message size is 4096 bytes
>>>>>
>>>>> File: mpirun-np-30-Program-4096.txt
>>>>> Outcome: It hangs -- I killed the poor thing after 30 seconds or so.
>>>>>
>>>>>
>>>> 4096 is rendezvous. For eager, try 4000 or lower.
>>>
>>> According to ompi_info, the threshold is 4096, not 4000, right ?
>>>
>>> (Open-MPI 1.4.3)
>>> [sboisver12_at_colosse1 ~]$ ompi_info -a|less
>>> MCA btl: parameter "btl_sm_eager_limit" (current value:
>>> "4096", data source: default value)
>>> Maximum size (in bytes) of "short" messages
>>> (must be >= 1).
>>>
>>>
>>> "btl_sm_eager_limit: Below this size, messages are sent "eagerly" --
>>> that is, a sender attempts to write its entire message to shared buffers
>>> without waiting for a receiver to be ready. Above this size, a sender
>>> will only write the first part of a message, then wait for the receiver
>>> to acknowledge its ready before continuing. Eager sends can improve
>>> performance by decoupling senders from receivers."
>>>
>>>
>>>
>>> source:
>>> http://www.open-mpi.org/faq/?category=sm#more-sm
>>>
>>>
>>> It should say "Below this size or equal to this size" instead of "Below
>>> this size" as ompi_info says. ;)
>>>
>>>
>>>
>>>
>>> As Mr. George Bosilca put it:
>>>
>>> "__should__ is not correct, __might__ is a better verb to describe the
>>> most "common" behavior for small messages. The problem comes from the
>>> fact that in each communicator the FIFO ordering is required by the MPI
>>> standard. As soon as there is any congestion, MPI_Send will block even
>>> for small messages (and this independent on the underlying network)
>>> until all he pending packets have been delivered."
>>>
>>> source:
>>> http://www.open-mpi.org/community/lists/devel/2010/11/8696.php
>>>
>>>
>>>
>>>>
>>>>> Case 2: 30 MPI ranks, message size is 1 byte
>>>>>
>>>>> File: mpirun-np-30-Program-1.txt.gz
>>>>> Outcome: It runs just fine.
>>>>>
>>>>>
>>>> 1 byte is eager.
>>>
>>> I agree.
>>>
>>>>
>>>>> Case 3: 2 MPI ranks, message size is 4096 bytes
>>>>>
>>>>> File: mpirun-np-2-Program-4096.txt
>>>>> Outcome: It hangs -- I killed the poor thing after 30 seconds or so.
>>>>>
>>>>>
>>>> Same as Case 1.
>>>>
>>>>> Case 4: 30 MPI ranks, message size if 4096 bytes, shared memory is
>>>>> disabled
>>>>>
>>>>> File: mpirun-mca-btl-^sm-np-30-Program-4096.txt.gz
>>>>> Outcome: It runs just fine.
>>>>>
>>>>>
>>>> Eager limit for TCP is 65536 (perhaps less some overhead). So, these
>>>> messages are eager.
>>>
>>> I agree.
>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> M. Sébastien Boisvert
> Étudiant au doctorat en physiologie-endocrinologie à l'Université Laval
> Boursier des Instituts de recherche en santé du Canada
> Équipe du Professeur Jacques Corbeil
>
> Centre de recherche en infectiologie de l'Université Laval
> Local R-61B
> 2705, boulevard Laurier
> Québec, Québec
> Canada G1V 4G2
> Téléphone: 418 525 4444 46342
>
> Courriel: SEB_at_[hidden]
> Web: http://boisvert.info
>
> "Innovation comes only from an assault on the unknown" -Sydney Brenner
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel