Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bad performance when scattering big size of data?
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-10-04 13:27:40

Some of what you are seeing is the natural result of context switching....some thoughts regarding the results:

1. You didn't bind your procs to cores when running with #procs < #cores, so you're performance in those scenarios will also be less than max.

2. Once the number of procs exceeds the number of cores, you guarantee a lot of context switching, so performance will definitely take a hit.

3. Sometime in the not-too-distant-future, OMPI will (hopefully) become hyperthread aware. For now, we don't see them as separate processing units. So as far as OMPI is concerned, you only have 512 computing units to work with, not 1024.

Bottom line is that you are running oversubscribed, so OMPI turns down your performance so that the machine doesn't hemorrhage as it context switches.

On Oct 4, 2010, at 11:06 AM, Doug Reeder wrote:

> In my experience hyperthreading can't really deliver two cores worth of processing simultaneously for processes expecting sole use of a core. Since you really have 512 cores I'm not surprised that you see a performance hit when requesting > 512 compute units. We should really get input from a hyperthreading expert, preferably form intel.
> Doug Reeder
> On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote:
>> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. So we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to scatter an array from the master node to the compute nodes using mpiCC and mpirun using C++.
>> Here is my test:
>> The array size is 18KB * Number of compute nodes and is scattered to the compute nodes 5000 times repeatly.
>> The average running time(seconds):
>> 100 nodes: 170,
>> 400 nodes: 690,
>> 500 nodes: 855,
>> 600 nodes: 2550,
>> 700 nodes: 2720,
>> 800 nodes: 2900,
>> There is a big jump of running time from 500 nodes to 600 nodes. Don't know what's the problem.
>> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster for all the tests in 1.4.2 but the jump still exists.
>> Tried using either Bcast function or simply Send/Recv which give very close results.
>> Tried both in running it directly or using SGE and got the same results.
>> The code and ompi_info are attached to this email. The direct running command is :
>> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile ../machines -np 600 scatttest
>> The ifconfig of head node for eth0 is:
>> eth0 Link encap:Ethernet HWaddr 00:26:B9:56:8B:44
>> inet addr: Bcast: Mask:
>> inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link
>> RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0
>> TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:832328807459 (775.1 GiB) TX bytes:250824621959 (233.5 GiB)
>> Interrupt:106 Memory:d6000000-d6012800
>> A typical ifconfig of a compute node is:
>> eth0 Link encap:Ethernet HWaddr 00:21:9B:9A:15:AC
>> inet addr: Bcast: Mask:
>> inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link
>> RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:139699954685 (130.1 GiB) TX bytes:338207741480 (314.9 GiB)
>> Interrupt:82 Memory:d6000000-d6012800
>> Does anyone help me out of this? It bothers me a lot.
>> Thank you very much.
>> Linbao
>> <scatttest.cpp><ompi_info>_______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]