Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Hyper-thread architecture effect on MPI jobs
From: Saygin Arkan (saygenius_at_[hidden])
Date: 2010-08-12 04:39:27


Hi Gus,

1 - first of all, turning off hyper-threading is not an option. And it gives
pretty good results if I can find a way to arrange the cores.

2 - Actually Eugene (one of her messages in this thread) had suggested to
arrange the slots.
I did and wrote the results, it delivers the cores randomly, nothing
changed.
but I haven't checked loadbalance option. -byslot or -bynode is not gonna
help.

3 - Could you give me a bit more detail how affinity works? or what it does
actually?

Thanks a lot for your suggestions

Saygin

On Wed, Aug 11, 2010 at 6:18 PM, Gus Correa <gus_at_[hidden]> wrote:

> Hi Saygin
>
> You could:
>
> 1) turn off hyperthreading (on BIOS), or
>
> 2) use the mpirun options (you didn't send your mpirun command)
> to distribute the processes across the nodes, cores, etc.
> "man mpirun" is a good resource, see the explanations about
> the -byslot, -bynode, -loadbalance options.
>
> 3) In addition, you can use the mca parameters to set processor affinity
> in the mpirun command line "mpirun -mca mpi_paffinity_alone 1 ..."
> I don't know how this will play in a hyperthreaded machine,
> but it works fine in our dual processor quad-core computers
> (not hyperthreaded).
>
> Depending on your code, hyperthreading may not help performance anyway.
>
> I hope this helps,
> Gus Correa
>
> Saygin Arkan wrote:
>
>> Hello,
>>
>> I'm running mpi jobs in non-homogeneous cluster. 4 of my machines have the
>> following properties, os221, os222, os223, os224:
>>
>> vendor_id : GenuineIntel
>> cpu family : 6
>> model : 23
>> model name : Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz
>> stepping : 7
>> cache size : 3072 KB
>> physical id : 0
>> siblings : 4
>> core id : 3
>> cpu cores : 4
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 10
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
>> constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx smx est
>> tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
>> bogomips : 4999.40
>> clflush size : 64
>> cache_alignment : 64
>> address sizes : 36 bits physical, 48 bits virtual
>>
>> and the problematic, hyper-threaded 2 machines are as follows, os228 and
>> os229:
>>
>> vendor_id : GenuineIntel
>> cpu family : 6
>> model : 26
>> model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
>> stepping : 5
>> cache size : 8192 KB
>> physical id : 0
>> siblings : 8
>> core id : 3
>> cpu cores : 4
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 11
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
>> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx
>> est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm ida
>> bogomips : 5396.88
>> clflush size : 64
>> cache_alignment : 64
>> address sizes : 36 bits physical, 48 bits virtual
>>
>>
>> The problem is: those 2 machines seem to be having 8 cores (virtually,
>> actualy core number is 4).
>> When I submit an MPI job, I calculated the comparison times in the
>> cluster. I got strange results.
>>
>> I'm running the job on 6 nodes, 3 core per node. And sometimes ( I can say
>> 1/3 of the tests) os228 or os229 returns strange results. 2 cores are slow
>> (slower than the first 4 nodes) but the 3rd core is extremely fast.
>>
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - RANK(0) Printing
>> Times...
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(1)
>> :38 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os222 RANK(2)
>> :38 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os224 RANK(3)
>> :38 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os228 RANK(4)
>> :37 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os229 RANK(5)
>> :34 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os223 RANK(6)
>> :38 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(7)
>> :39 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os222 RANK(8)
>> :37 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os224 RANK(9)
>> :38 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os228 RANK(10)
>> :*48 sec*
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os229 RANK(11)
>> :35 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os223 RANK(12)
>> :38 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(13)
>> :37 sec
>> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os222 RANK(14)
>> :37 sec
>> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os224 RANK(15)
>> :38 sec
>> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os228 RANK(16)
>> :*43 sec*
>> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os229 RANK(17)
>> :35 sec
>> TOTAL CORRELATION TIME: 48 sec
>>
>>
>> or another test:
>>
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - RANK(0) Printing
>> Times...
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os221 RANK(1)
>> :170 sec
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os222 RANK(2)
>> :161 sec
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os224 RANK(3)
>> :158 sec
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os228 RANK(4)
>> :142 sec
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os229 RANK(5)
>> :*256 sec*
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os223 RANK(6)
>> :156 sec
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os221 RANK(7)
>> :162 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os222 RANK(8)
>> :159 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os224 RANK(9)
>> :168 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os228 RANK(10)
>> :141 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os229 RANK(11)
>> :136 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os223 RANK(12)
>> :173 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os221 RANK(13)
>> :164 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os222 RANK(14)
>> :171 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os224 RANK(15)
>> :156 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os228 RANK(16)
>> :136 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os229 RANK(17)
>> :*250 sec*
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - TOTAL CORRELATION
>> TIME: 256 sec
>>
>>
>> Do you have any idea? Why it is happening?
>> I assume that it gives 2 jobs to 2 cores in os229, but actually those 2
>> are one core.
>> Do you have any idea? If you have, how can I fix it? because the longest
>> time affects the whole time information. 100 sec delay is too much for 250
>> sec comparison time,
>> and it might have finish around 160 sec.
>>
>>
>>
>> --
>> Saygin
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Saygin