Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times
From: Oliver Geisler (openmpi_at_[hidden])
Date: 2010-04-06 18:04:18


On 4/6/2010 2:54 PM, Jeff Squyres wrote:
> Sorry for the delay -- I just replied on the user list -- I think the first thing to do is to establish baseline networking performance and see if that is out of whack. If the underlying network is bad, then MPI performance will also be bad.
>
>

Using netpipe and comparing tcp and mpi communication I get the
following results:

TCP is much faster than MPI, approx. by factor 12
e.g a packet size of 4096 bytes deliveres in
97.11 usec with NPtcp and
15338.98 usec with NPmpi
or
packet size 262kb
0.05268801 sec NPtcp
0.00254560 sec NPmpi

Further our benchmark started with "--mca btl tcp,self" runs with short
communication times, even using kernel 2.6.33.1

Is there a way to see what type of communication is actually selected?

Can anybody imagine why shared memory leads to these problems?
Kernel configuration?

Thanks, Jeff, for insisting upon testing network performance.
Thanks all others, too ;-)

oli

> On Apr 6, 2010, at 11:51 AM, Oliver Geisler wrote:
>
>> On 4/6/2010 10:11 AM, Rainer Keller wrote:
>>> Hello Oliver,
>>> Hmm, this is really a teaser...
>>> I haven't seen such a drastic behavior, and haven't read of any on the list.
>>>
>>> One thing however, that might interfere is process binding.
>>> Could You make sure, that processes are not bound to cores (default in 1.4.1):
>>> with mpirun --bind-to-none
>>>
>>
>> I have tried version 1.4.1. Using default settings and watched processes
>> switching from core to core in "top" (with "f" + "j"). Then I tried
>> --bind-to-core and explicitly --bind-to-none. All with the same result:
>> ~20% cpu wait and lot longer over-all computation times.
>>
>> Thanks for the idea ...
>> Every input is helpful.
>>
>> Oli
>>
>>
>>> Just an idea...
>>>
>>> Regards,
>>> Rainer
>>>
>>> On Tuesday 06 April 2010 10:07:35 am Oliver Geisler wrote:
>>>> Hello Devel-List,
>>>>
>>>> I am a little bit helpless about this matter. I already posted in the
>>>> user list. In case you don't read the users list, I post in here.
>>>>
>>>> This is the original posting:
>>>>
>>>> http://www.open-mpi.org/community/lists/users/2010/03/12474.php
>>>>
>>>> Short:
>>>> Switching from kernel 2.6.23 to 2.6.24 (and up), using openmpi 1.2.7-rc2
>>>> (I know outdated, but in debian stable, and same results with 1.4.1)
>>>> increases communication times between processes (essentially between one
>>>> master and several slave processes). This is regardless of whether the
>>>> processes are local only or communication is over ethernet.
>>>>
>>>> Did anybody witness such a behavior?
>>>>
>>>> Ideas what should I test for?
>>>>
>>>> What additional information should I provide for you?
>>>>
>>>> Thanks for your time
>>>>
>>>> oli
>>>>
>>>
>>
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.