Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Bad performance when scattering big size of data?
From: Storm Zhang (stormzhg_at_[hidden])
Date: 2010-10-04 21:10:45


Here is what I meant: the results of 500 procs in fact shows it with
272-304(<500) real cores, the program's running time is good, which is
almost five times 100 procs' time. So it can be handled very well. Therefore
I guess OpenMPI or Rocks OS does make use of hyperthreading to do the job.
But with 600 procs, the running time is more than double of that of 500
procs. I don't know why. This is my problem.

BTW, how to use -bind-to-core? I added it as mpirun's options. It always
gives me error " the executable 'bind-to-core' can't be found. Isn't it
like:
mpirun --mca btl_tcp_if_include eth0 -np 600 -bind-to-core scatttest

Thank you very much.

Linbao

On Mon, Oct 4, 2010 at 4:42 PM, Ralph Castain <rhc_at_[hidden]> wrote:

>
> On Oct 4, 2010, at 1:48 PM, Storm Zhang wrote:
>
> Thanks a lot, Ralgh. As I said, I also tried to use SGE(also showing 1024
> available for parallel tasks) which only assign 34-38 compute nodes which
> only has 272-304 real cores for 500 procs running. The running time is
> consistent with 100 procs and not a lot fluctuations due to the number of
> machines' changing.
>
>
> Afraid I don't understand your statement. If you have 500 procs running on
> < 500 cores, then the performance relative to a high-performance job (#procs
> <= #cores) will be worse. We deliberately dial down the performance when
> oversubscribed to ensure that procs "play nice" in situations where the node
> is oversubscribed.
>
> So I guess it is not related to hyperthreading. Correct me if I'm wrong.
>
>
> Has nothing to do with hyperthreading - OMPI has no knowledge of
> hyperthreads at this time.
>
>
> BTW, how to bind the proc to the core? I tried --bind-to-core or
> -bind-to-core but neither works. Is it for OpenMP, not for OpenMPI?
>
>
> Those should work. You might try --report-bindings to see what OMPI thought
> it did.
>
>
> Linbao
>
>
> On Mon, Oct 4, 2010 at 12:27 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> Some of what you are seeing is the natural result of context
>> switching....some thoughts regarding the results:
>>
>> 1. You didn't bind your procs to cores when running with #procs < #cores,
>> so you're performance in those scenarios will also be less than max.
>>
>> 2. Once the number of procs exceeds the number of cores, you guarantee a
>> lot of context switching, so performance will definitely take a hit.
>>
>> 3. Sometime in the not-too-distant-future, OMPI will (hopefully) become
>> hyperthread aware. For now, we don't see them as separate processing units.
>> So as far as OMPI is concerned, you only have 512 computing units to work
>> with, not 1024.
>>
>> Bottom line is that you are running oversubscribed, so OMPI turns down
>> your performance so that the machine doesn't hemorrhage as it context
>> switches.
>>
>>
>> On Oct 4, 2010, at 11:06 AM, Doug Reeder wrote:
>>
>> In my experience hyperthreading can't really deliver two cores worth of
>> processing simultaneously for processes expecting sole use of a core. Since
>> you really have 512 cores I'm not surprised that you see a performance hit
>> when requesting > 512 compute units. We should really get input from a
>> hyperthreading expert, preferably form intel.
>>
>> Doug Reeder
>> On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote:
>>
>> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs.
>> So we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to
>> scatter an array from the master node to the compute nodes using mpiCC and
>> mpirun using C++.
>>
>> Here is my test:
>>
>> The array size is 18KB * Number of compute nodes and is scattered to the
>> compute nodes 5000 times repeatly.
>>
>> The average running time(seconds):
>>
>> 100 nodes: 170,
>> 400 nodes: 690,
>> 500 nodes: 855,
>> 600 nodes: 2550,
>> 700 nodes: 2720,
>> 800 nodes: 2900,
>>
>> There is a big jump of running time from 500 nodes to 600 nodes. Don't
>> know what's the problem.
>> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster
>> for all the tests in 1.4.2 but the jump still exists.
>> Tried using either Bcast function or simply Send/Recv which give very
>> close results.
>> Tried both in running it directly or using SGE and got the same results.
>>
>> The code and ompi_info are attached to this email. The direct running
>> command is :
>> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile
>> ../machines -np 600 scatttest
>>
>> The ifconfig of head node for eth0 is:
>> eth0 Link encap:Ethernet HWaddr 00:26:B9:56:8B:44
>> inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
>> inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:1096060373 errors:0 dropped:2512622 overruns:0
>> frame:0
>> TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:832328807459 (775.1 GiB) TX bytes:250824621959 (233.5
>> GiB)
>> Interrupt:106 Memory:d6000000-d6012800
>>
>> A typical ifconfig of a compute node is:
>> eth0 Link encap:Ethernet HWaddr 00:21:9B:9A:15:AC
>> inet addr:192.168.1.253 Bcast:192.168.1.255 Mask:255.255.255.0
>> inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:139699954685 (130.1 GiB) TX bytes:338207741480 (314.9
>> GiB)
>> Interrupt:82 Memory:d6000000-d6012800
>>
>>
>> Does anyone help me out of this? It bothers me a lot.
>>
>> Thank you very much.
>>
>> Linbao
>> <scatttest.cpp><ompi_info>_______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>