Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bad performance when scattering big size of data?
From: Storm Zhang (stormzhg_at_[hidden])
Date: 2010-10-04 21:10:45


Here is what I meant: the results of 500 procs in fact shows it with
272-304(<500) real cores, the program's running time is good, which is
almost five times 100 procs' time. So it can be handled very well. Therefore
I guess OpenMPI or Rocks OS does make use of hyperthreading to do the job.
But with 600 procs, the running time is more than double of that of 500
procs. I don't know why. This is my problem.

BTW, how to use -bind-to-core? I added it as mpirun's options. It always
gives me error " the executable 'bind-to-core' can't be found. Isn't it
like:
mpirun --mca btl_tcp_if_include eth0 -np 600 -bind-to-core scatttest

Thank you very much.

Linbao

On Mon, Oct 4, 2010 at 4:42 PM, Ralph Castain <rhc_at_[hidden]> wrote:

>
> On Oct 4, 2010, at 1:48 PM, Storm Zhang wrote:
>
> Thanks a lot, Ralgh. As I said, I also tried to use SGE(also showing 1024
> available for parallel tasks) which only assign 34-38 compute nodes which
> only has 272-304 real cores for 500 procs running. The running time is
> consistent with 100 procs and not a lot fluctuations due to the number of
> machines' changing.
>
>
> Afraid I don't understand your statement. If you have 500 procs running on
> < 500 cores, then the performance relative to a high-performance job (#procs
> <= #cores) will be worse. We deliberately dial down the performance when
> oversubscribed to ensure that procs "play nice" in situations where the node
> is oversubscribed.
>
> So I guess it is not related to hyperthreading. Correct me if I'm wrong.
>
>
> Has nothing to do with hyperthreading - OMPI has no knowledge of
> hyperthreads at this time.
>
>
> BTW, how to bind the proc to the core? I tried --bind-to-core or
> -bind-to-core but neither works. Is it for OpenMP, not for OpenMPI?
>
>
> Those should work. You might try --report-bindings to see what OMPI thought
> it did.
>
>
> Linbao
>
>
> On Mon, Oct 4, 2010 at 12:27 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> Some of what you are seeing is the natural result of context
>> switching....some thoughts regarding the results:
>>
>> 1. You didn't bind your procs to cores when running with #procs < #cores,
>> so you're performance in those scenarios will also be less than max.
>>
>> 2. Once the number of procs exceeds the number of cores, you guarantee a
>> lot of context switching, so performance will definitely take a hit.
>>
>> 3. Sometime in the not-too-distant-future, OMPI will (hopefully) become
>> hyperthread aware. For now, we don't see them as separate processing units.
>> So as far as OMPI is concerned, you only have 512 computing units to work
>> with, not 1024.
>>
>> Bottom line is that you are running oversubscribed, so OMPI turns down
>> your performance so that the machine doesn't hemorrhage as it context
>> switches.
>>
>>
>> On Oct 4, 2010, at 11:06 AM, Doug Reeder wrote:
>>
>> In my experience hyperthreading can't really deliver two cores worth of
>> processing simultaneously for processes expecting sole use of a core. Since
>> you really have 512 cores I'm not surprised that you see a performance hit
>> when requesting > 512 compute units. We should really get input from a
>> hyperthreading expert, preferably form intel.
>>
>> Doug Reeder
>> On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote:
>>
>> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs.
>> So we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to
>> scatter an array from the master node to the compute nodes using mpiCC and
>> mpirun using C++.
>>
>> Here is my test:
>>
>> The array size is 18KB * Number of compute nodes and is scattered to the
>> compute nodes 5000 times repeatly.
>>
>> The average running time(seconds):
>>
>> 100 nodes: 170,
>> 400 nodes: 690,
>> 500 nodes: 855,
>> 600 nodes: 2550,
>> 700 nodes: 2720,
>> 800 nodes: 2900,
>>
>> There is a big jump of running time from 500 nodes to 600 nodes. Don't
>> know what's the problem.
>> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster
>> for all the tests in 1.4.2 but the jump still exists.
>> Tried using either Bcast function or simply Send/Recv which give very
>> close results.
>> Tried both in running it directly or using SGE and got the same results.
>>
>> The code and ompi_info are attached to this email. The direct running
>> command is :
>> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile
>> ../machines -np 600 scatttest
>>
>> The ifconfig of head node for eth0 is:
>> eth0 Link encap:Ethernet HWaddr 00:26:B9:56:8B:44
>> inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
>> inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:1096060373 errors:0 dropped:2512622 overruns:0
>> frame:0
>> TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:832328807459 (775.1 GiB) TX bytes:250824621959 (233.5
>> GiB)
>> Interrupt:106 Memory:d6000000-d6012800
>>
>> A typical ifconfig of a compute node is:
>> eth0 Link encap:Ethernet HWaddr 00:21:9B:9A:15:AC
>> inet addr:192.168.1.253 Bcast:192.168.1.255 Mask:255.255.255.0
>> inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:139699954685 (130.1 GiB) TX bytes:338207741480 (314.9
>> GiB)
>> Interrupt:82 Memory:d6000000-d6012800
>>
>>
>> Does anyone help me out of this? It bothers me a lot.
>>
>> Thank you very much.
>>
>> Linbao
>> <scatttest.cpp><ompi_info>_______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>