Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times
From: Oliver Geisler (openmpi_at_[hidden])
Date: 2010-04-07 09:21:30


On 4/6/2010 5:09 PM, Jeff Squyres wrote:
> On Apr 6, 2010, at 6:04 PM, Oliver Geisler wrote:
>
>> Further our benchmark started with "--mca btl tcp,self" runs with short
>> communication times, even using kernel 2.6.33.1
>
> I'm not sure what this statement means (^^). Can you explain?
>
In the first place we witnessed the problem upgrading our hardware and
thus had to upgrade the running kernel version in order to get the
network cards running.
I used a typical application that we use on the cluster (in-house
development) to benchmark old vs. new hardware. There I witnessed an
performance drop instead of an increase to be expected.
Searching for the loss of performance we figured out that the pure
computation time on each data packet meets the expected increase due to
the accelerated hardware, but communication times between the master and
the slave processes increased largely.
Furthermore we broke down the problem to kernel versions larger than
2.6.23 (which we could not use, because the network cards aren't
supported yet)
Now that I run the program with mpirun option "--mca btl tcp,self", I
could achieve shortened communication times (and all over completion
times as expected), even running on an new node with kernel version
2.6.33.1.

>> Is there a way to see what type of communication is actually selected?
>
> If you "--mca btl tcp,self" is used, then TCP sockets are used for non-self communications (i.e., communications with peer MPI processes, regardless of location).
>
>> Can anybody imagine why shared memory leads to these problems?
>
> I'm not sure I understand this -- if "--mca btl tcp,self", shared memory is not used...?
>
When I use "--mca btl sm,selfm", I get the issue, so my guess is it has
to do something with shared memory?

> ....re-reading your email, I'm wondering: did you run the NPmpi process with "--mca btl tcp,sm,self" (or no --mca btl param)? That might explain some of my confusion, above.
>
I ran NPmpi without explicit mca-btl option .. which should default to
/etc/openmpi/openmpi-mca-params.conf with
btl = self,sm,tcp

-- 
-
--------------------------------------------------------------------------------
Oliver Geisler
TERRASYS Geophysics
3100 Wilcrest Drive                      www.terrasysgeo.com
Suite 325
                                         Tel: +1-713-893-3630
Houston, TX 77042                        Fax: +1-713-893-3631
United States
                                         e-mail: geisler_at_[hidden]
-
--------------------------------------------------------------------------------
TERRASYS Geophysics USA Inc.             UBI#: 602 171 274
15131 Carter Loop SE                     FEIN: 52-726308
Yelm, WA 98597
-
--------------------------------------------------------------------------------
This e-mail contains proprietary information some or all
of which may be legally privileged.
It is for the intended recipient only. The views expressed
in this e-mail may not be official policy, but the personal
views of the originator.
If an addressing or transmission error has misdirected this
e-mail, please notify the author by replying to this e-mail.
If you are not the intended recipient you must not use,
disclose, distribute, copy, print, or rely on this e-mail.
All messages sent and received are monitored for viruses
and high risk file extensions.
-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.