Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Hyper-thread architecture effect on MPI jobs
From: Saygin Arkan (saygenius_at_[hidden])
Date: 2010-08-11 10:55:11


Hello,

I'm running mpi jobs in non-homogeneous cluster. 4 of my machines have the
following properties, os221, os222, os223, os224:

vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz
stepping : 7
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx smx est
tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips : 4999.40
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual

and the problematic, hyper-threaded 2 machines are as follows, os228 and
os229:

vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 5
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx
est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm ida
bogomips : 5396.88
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual

The problem is: those 2 machines seem to be having 8 cores (virtually,
actualy core number is 4).
When I submit an MPI job, I calculated the comparison times in the cluster.
I got strange results.

I'm running the job on 6 nodes, 3 core per node. And sometimes ( I can say
1/3 of the tests) os228 or os229 returns strange results. 2 cores are slow
(slower than the first 4 nodes) but the 3rd core is extremely fast.

2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - RANK(0) Printing
Times...
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(1) :38
sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os222 RANK(2) :38
sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os224 RANK(3) :38
sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os228 RANK(4) :37
sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os229 RANK(5) :34
sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os223 RANK(6) :38
sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(7) :39
sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os222 RANK(8) :37
sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os224 RANK(9) :38
sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os228 RANK(10) :*48
sec*
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os229 RANK(11) :35
sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os223 RANK(12) :38
sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(13) :37
sec
2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os222 RANK(14) :37
sec
2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os224 RANK(15) :38
sec
2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os228 RANK(16) :*43
sec*
2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os229 RANK(17) :35
sec
TOTAL CORRELATION TIME: 48 sec

or another test:

2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - RANK(0) Printing
Times...
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os221 RANK(1)
:170 sec
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os222 RANK(2)
:161 sec
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os224 RANK(3)
:158 sec
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os228 RANK(4)
:142 sec
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os229 RANK(5) :*256
sec*
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os223 RANK(6)
:156 sec
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os221 RANK(7)
:162 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os222 RANK(8)
:159 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os224 RANK(9)
:168 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os228 RANK(10)
:141 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os229 RANK(11)
:136 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os223 RANK(12)
:173 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os221 RANK(13)
:164 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os222 RANK(14)
:171 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os224 RANK(15)
:156 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os228 RANK(16)
:136 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os229 RANK(17) :*250
sec*
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - TOTAL CORRELATION
TIME: 256 sec

Do you have any idea? Why it is happening?
I assume that it gives 2 jobs to 2 cores in os229, but actually those 2 are
one core.
Do you have any idea? If you have, how can I fix it? because the longest
time affects the whole time information. 100 sec delay is too much for 250
sec comparison time,
and it might have finish around 160 sec.

-- 
Saygin