Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times
From: Oliver Geisler (openmpi_at_[hidden])
Date: 2010-04-06 16:29:58


On 4/6/2010 2:54 PM, Jeff Squyres wrote:
> Sorry for the delay -- I just replied on the user list -- I think the first thing to do is to establish baseline networking performance and see if that is out of whack. If the underlying network is bad, then MPI performance will also be bad.
>

Could make sense. With kernel 2.6.24 it seems a major change in the
modules for Intel PCI-Express network cards was introduced.
Does openmpi use TCP communication, even if all processes are on the
same local node?

>
> On Apr 6, 2010, at 11:51 AM, Oliver Geisler wrote:
>
>> On 4/6/2010 10:11 AM, Rainer Keller wrote:
>>> Hello Oliver,
>>> Hmm, this is really a teaser...
>>> I haven't seen such a drastic behavior, and haven't read of any on the list.
>>>
>>> One thing however, that might interfere is process binding.
>>> Could You make sure, that processes are not bound to cores (default in 1.4.1):
>>> with mpirun --bind-to-none
>>>
>>
>> I have tried version 1.4.1. Using default settings and watched processes
>> switching from core to core in "top" (with "f" + "j"). Then I tried
>> --bind-to-core and explicitly --bind-to-none. All with the same result:
>> ~20% cpu wait and lot longer over-all computation times.
>>
>> Thanks for the idea ...
>> Every input is helpful.
>>
>> Oli
>>
>>
>>> Just an idea...
>>>
>>> Regards,
>>> Rainer
>>>
>>> On Tuesday 06 April 2010 10:07:35 am Oliver Geisler wrote:
>>>> Hello Devel-List,
>>>>
>>>> I am a little bit helpless about this matter. I already posted in the
>>>> user list. In case you don't read the users list, I post in here.
>>>>
>>>> This is the original posting:
>>>>
>>>> http://www.open-mpi.org/community/lists/users/2010/03/12474.php
>>>>
>>>> Short:
>>>> Switching from kernel 2.6.23 to 2.6.24 (and up), using openmpi 1.2.7-rc2
>>>> (I know outdated, but in debian stable, and same results with 1.4.1)
>>>> increases communication times between processes (essentially between one
>>>> master and several slave processes). This is regardless of whether the
>>>> processes are local only or communication is over ethernet.
>>>>
>>>> Did anybody witness such a behavior?
>>>>
>>>> Ideas what should I test for?
>>>>
>>>> What additional information should I provide for you?
>>>>
>>>> Thanks for your time
>>>>
>>>> oli
>>>>
>>>
>>
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.