Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] HPMPI versus OpenMPI performance
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-06-05 04:46:17


If I correctly understand how you run your application I think I know
where the problem is coming from. In few words you're using buffered
send over shared memory.

First, buffered send has only one main benefit, it double the amount
of memory required for the communication. A side effect is that it
increase the number of memory copies. The original buffer have to be
copied in the attached buffer, then from this attached buffer the data
will be moved into the shared memory region, and from there the
receiver can finally copy the data in the receive buffer. In total
there are 3 memory copies involved in this operation, which
automatically limit the available bandwidth to a 1/3 of the available
memory bandwidth on the architecture. Additionally, if the amount of
data involved in this communication is large enough, the cache will be
completely thrashed by the end of the communication.

Second, using buffered send requires asynchronous progress. If your
code doesn't call any MPI communication functions, there is no reason
that the data transfer take place at least not until the
MPI_Buffer_detach function is called (or any other communication
related MPI function).

   george.

On Jun 4, 2008, at 1:55 PM, Jeff Squyres wrote:

> Thanks for all the detailed information!
>
> It is quite likely that our bsend performance has never been tuned; we
> simply implemented it, verified that it works, and then moved on -- we
> hadn't considered that real applications would actually use it. :-\
>
> But that being said, 60% difference is a bit odd. Have you tried
> running with "--mca mpi_leave_pinned 1"? If all your sends are
> MPI_BSEND, it *may* not make a difference, but it could make a
> difference on the receive side.
>
> What are the typical communication patterns for your application?
>
>
>
> On Jun 2, 2008, at 3:39 PM, Ayer, Timothy C. wrote:
>
>>
>>
>>> We a performing a comparison of HPMPI versus OpenMPI using
>>> Infiniband and
>>> seeing a performance hit in the vicinity of 60% (OpenMPI is slower)
>>> on
>>> controlled benchmarks. Since everything else is similar, we
>>> suspect a
>>> problem with the way we are using or have installed OpenMPI.
>>>
>>> Please find attached the following info as requested from
>>> http://www.open-mpi.org/community/help/
>>> <http://www.open-mpi.org/community/help/>
>>>
>>> Application: in house CFD solver using both point-point and
>>> collective
>>> operations. Also, for historical reasons it makes extensive use of
>>> BSEND.
>>> We recognize that BSEND's can be inefficient but it is not
>>> practical to
>>> change them at this time. We are trying to understand why the
>>> performance
>>> is so significantly different from HPMPI. The application is mixed
>>> FORTRAN 90 and C built with Portland Group compilers.
>>>
>>> HPMPI Version info:
>>>
>>> mpirun: HP MPI 02.02.05.00 Linux x86-64
>>> major version 202 minor version 5
>>>
>>> OpenMPI Version info:
>>>
>>> mpirun (Open MPI) 1.2.4
>>> Report bugs to http://www.open-mpi.org/community/help/
>>> <http://www.open-mpi.org/community/help/>
>>>
>>>
>>>
>>> Configuration info :
>>>
>>> The benchmark was a 4-processor job run on a single dual-socket
>>> dual core
>>> HP DL140G3 (Woodcrest 3.0) with 4 GB of memory. Each rank requires
>>> approximately 250MB of memory.
>>>
>>> 1) Output from ompi_info --all
>>>
>>> See attached file ompi_info_output.txt
>>> << File: ompi_info_output.txt >>
>>>
>>> Below is the output requested in the FAQ section:
>>>
>>> In order for us to help you, it is most helpful if you can run a
>>> few steps
>>> before sending an e-mail to both perform some basic troubleshooting
>>> and
>>> provide us with enough information about your environment to help
>>> you.
>>> Please include answers to the following questions in your e-mail:
>>>
>>>
>>> 1. Which OpenFabrics version are you running? Please specify where
>>> you
>>> got the software from (e.g., from the OpenFabrics community web
>>> site, from
>>> a vendor, or it was already included in your Linux distribution).
>>>
>>> We obtained the software from www.openfabrics.org <www.openfabrics.org
>>>>
>>>
>>> Output from ofed_info command:
>>>
>>> OFED-1.1
>>>
>>> openib-1.1 (REV=9905)
>>> # User space
>>> https://openib.org/svn/gen2/branches/1.1/src/userspace
>>> <https://openib.org/svn/gen2/branches/1.1/src/userspace>
>>> Git:
>>> ref: refs/heads/ofed_1_1
>>> commit a083ec1174cb4b5a5052ef5de9a8175df82e864a
>>>
>>> # MPI
>>> mpi_osu-0.9.7-mlx2.2.0.tgz
>>> openmpi-1.1.1-1.src.rpm
>>> mpitests-2.0-0.src.rpm
>>>
>>>
>>>
>>> 2. What distro and version of Linux are you running? What is your
>>> kernel version?
>>>
>>> Linux xxxxxxxx 2.6.9-64.EL.IT133935.jbtest.1smp #1 SMP Fri Oct 19
>>> 11:28:12
>>> EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>>
>>> 3. Which subnet manager are you running? (e.g., OpenSM, a
>>> vendor-specific subnet manager, etc.)
>>>
>>> We believe this to be HP or Voltaire but we are not certain how to
>>> determine this.
>>>
>>>
>>> 4. What is the output of the ibv_devinfo command on a known "good"
>>> node
>>> and a known "bad" node? (NOTE: there must be at least one port
>>> listed as
>>> "PORT_ACTIVE" for Open MPI to work. If there is not at least one
>>> PORT_ACTIVE port, something is wrong with your OpenFabrics
>>> environment and
>>> Open MPI will not be able to run).
>>>
>>> hca_id: mthca0
>>> fw_ver: 1.2.0
>>> node_guid: 001a:4bff:ff0b:5f9c
>>> sys_image_guid: 001a:4bff:ff0b:5f9f
>>> vendor_id: 0x08f1
>>> vendor_part_id: 25204
>>> hw_ver: 0xA0
>>> board_id: VLT0030010001
>>> phys_port_cnt: 1
>>> port: 1
>>> state: PORT_ACTIVE (4)
>>> max_mtu: 2048 (4)
>>> active_mtu: 2048 (4)
>>> sm_lid: 1
>>> port_lid: 161
>>> port_lmc: 0x00
>>>
>>>
>>> 5. What is the output of the ifconfig command on a known "good" node
>>> and a known "bad" node? (mainly relevant for IPoIB installations)
>>> Note
>>> that some Linux distributions do not put ifconfig in the default
>>> path for
>>> normal users; look for it in /sbin/ifconfig or /usr/sbin/ifconfig.
>>>
>>> eth0 Link encap:Ethernet HWaddr 00:XX:XX:XX:XX:XX
>>> inet addr:X.Y.Z.Q Bcast:X.Y.Z.255 Mask:255.255.255.0
>>> inet6 addr: X::X:X:X:X/64 Scope:Link
>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>> RX packets:1021733054 errors:0 dropped:10717 overruns:0
>>> frame:0
>>> TX packets:1047320834 errors:0 dropped:0 overruns:0
>>> carrier:0
>>> collisions:0 txqueuelen:1000
>>> RX bytes:1035986839096 (964.8 GiB) TX bytes:1068055599116
>>> (994.7 GiB)
>>> Interrupt:169
>>>
>>> ib0 Link encap:UNSPEC HWaddr
>>> 80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
>>> inet addr:A.B.C.D Bcast:A.B.C.255 Mask:255.255.255.0
>>> inet6 addr: X::X:X:X:X/64 Scope:Link
>>> UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
>>> RX packets:137021 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:128
>>> RX bytes:12570947 (11.9 MiB) TX bytes:1504 (1.4 KiB)
>>>
>>> lo Link encap:Local Loopback
>>> inet addr:127.0.0.1 Mask:255.0.0.0
>>> inet6 addr: ::1/128 Scope:Host
>>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>>> RX packets:1498664 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:1498664 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:0
>>> RX bytes:1190810468 (1.1 GiB) TX bytes:1190810468 (1.1 GiB)
>>>
>>>
>>> 6. If running under Bourne shells, what is the output of the "ulimit
>>> -l" command?
>>> If running under C shells, what is the output of the "limit | grep
>>> memorylocked" command?
>>> (NOTE: If the value is not "unlimited", this FAQ entry
>>> <http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-
>>> pages> and
>>> this FAQ entry
>>> <http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-more
>>>>
>>> ).
>>>
>>> memorylocked 3500000 kbytes
>>>
>>> Gather up this information and see this page
>>> <http://www.open-mpi.org/community/help/> about how to submit a
>>> help
>>> request to the user's mailing list.
>>>
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users



  • application/pkcs7-signature attachment: smime.p7s