Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance
From: Gilbert Grosdidier (Gilbert.Grosdidier_at_[hidden])
Date: 2010-12-20 15:45:47


Yes, there is definitely only 1 process per core with both MPI
implementations.

  Thanks, G.

Le 20/12/2010 20:39, George Bosilca a écrit :
> Are your processes places the same way with the two MPI implementations? Per-node vs. per-core ?
>
> george.
>
> On Dec 20, 2010, at 11:14 , Gilbert Grosdidier wrote:
>
>> Bonjour,
>>
>> I am now at a loss with my running of OpenMPI (namely 1.4.3)
>> on a SGI Altix cluster with 2048 or 4096 cores, running over Infiniband.
>>
>> After fixing several rather obvious failures with Ralph, Jeff and John help,
>> I am now facing the bottom of this story since :
>> - there are no more obvious failures with messages
>> - compared to the running of the application with SGI-MPT, the CPU performances I get
>> are very low, decreasing when the number of cores increases (cf below)
>> - these performances are highly reproducible
>> - I tried a very high number of -mca parameters, to no avail
>>
>> If I take as a reference the MPT CPU speed performance,
>> it is of about 900 (in some arbitrary unit), whatever the
>> number of cores I used (up to 8192).
>>
>> But, when running with OMPI, I get:
>> - 700 with 1024 cores (which is already rather low)
>> - 300 with 2048 cores
>> - 60 with 4096 cores.
>>
>> The computing loop, over which the above CPU performance is evaluated, includes
>> a stack of MPI exchanges [per core : 8 x (MPI_Isend + MPI_Irecv) + MPI_Waitall]
>>
>> The application is of the 'domain partition' type,
>> and the performances, together with the memory footprint,
>> are very identical on all cores. The memory footprint is twice higher in
>> the OMPI case (1.5GB/core) than in the MPT case (0.7GB/core).
>>
>> What could be wrong with all these, please ?
>>
>> I provided (in attachment) the 'ompi_info -all ' output.
>> The config.log is in attachment as well.
>> I compiled OMPI with icc. I checked numa and affinity are OK.
>>
>> I use the following command to run my OMPI app:
>> "mpiexec -mca btl_openib_rdma_pipeline_send_length 65536\
>> -mca btl_openib_rdma_pipeline_frag_size 65536\
>> -mca btl_openib_min_rdma_pipeline_size 65536\
>> -mca btl_self_rdma_pipeline_send_length 262144\
>> -mca btl_self_rdma_pipeline_frag_size 262144\
>> -mca plm_rsh_num_concurrent 4096 -mca mpi_paffinity_alone 1\
>> -mca mpi_leave_pinned 1 -mca btl_sm_max_send_size 128\
>> -mca coll_tuned_pre_allocate_memory_comm_size_limit 128\
>> -mca btl_openib_cq_size 128 -mca btl_ofud_rd_num 128\
>> -mca mpool_rdma_rcache_size_limit 131072 -mca mpi_preconnect_mpi 0\
>> -mca mpool_sm_min_size 131072 -mca mpi_abort_print_stack 1\
>> -mca btl sm,openib,self -mca btl_openib_want_fork_support 0\
>> -mca opal_set_max_sys_limits 1 -mca osc_pt2pt_no_locks 1\
>> -mca osc_rdma_no_locks 1\
>> $PBS_JOBDIR/phmc_tm_p2.$PBS_JOBID -v -f $Jinput".
>>
>> OpenIB info:
>>
>> 1) OFED-1.4.1, installed by SGI SGI
>>
>> 2) Linux xxxxxx 2.6.16.60-0.42.10-smp #1 SMP Tue Apr 27 05:11:27 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
>> OS : SGI ProPack 6SP5 for Linux, Build 605r1.sles10-0909302200
>>
>> 3) Running most probably an SGI subnet manager
>>
>> 4)> ibv_devinfo (on a worker node)
>> hca_id: mlx4_0
>> fw_ver: 2.7.000
>> node_guid: 0030:48ff:ffcc:4c44
>> sys_image_guid: 0030:48ff:ffcc:4c47
>> vendor_id: 0x02c9
>> vendor_part_id: 26418
>> hw_ver: 0xA0
>> board_id: SM_2071000001000
>> phys_port_cnt: 2
>> port: 1
>> state: PORT_ACTIVE (4)
>> max_mtu: 2048 (4)
>> active_mtu: 2048 (4)
>> sm_lid: 1
>> port_lid: 6009
>> port_lmc: 0x00
>>
>> port: 2
>> state: PORT_ACTIVE (4)
>> max_mtu: 2048 (4)
>> active_mtu: 2048 (4)
>> sm_lid: 1
>> port_lid: 6010
>> port_lmc: 0x00
>>
>> 5)> ifconfig -a (on a worker node)
>> eth0 Link encap:Ethernet HWaddr 00:30:48:CE:73:30
>> inet adr:192.168.159.10 Bcast:192.168.159.255 Masque:255.255.255.0
>> adr inet6: fe80::230:48ff:fece:7330/64 Scope:Lien
>> UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:32337499 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:34733462 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 lg file transmission:1000
>> RX bytes:11486224753 (10954.1 Mb) TX bytes:16450996864 (15688.8 Mb)
>> Mémoire:fbc60000-fbc80000
>>
>> eth1 Link encap:Ethernet HWaddr 00:30:48:CE:73:31
>> BROADCAST MULTICAST MTU:1500 Metric:1
>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 lg file transmission:1000
>> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>> Mémoire:fbce0000-fbd00000
>>
>> ib0 Link encap:UNSPEC HWaddr 80-00-00-48-FE-C0-00-00-00-00-00-00-00-00-00-00
>> inet adr:10.148.9.198 Bcast:10.148.255.255 Masque:255.255.0.0
>> adr inet6: fe80::230:48ff:ffcc:4c45/64 Scope:Lien
>> UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
>> RX packets:115055101 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:5390843 errors:0 dropped:182 overruns:0 carrier:0
>> collisions:0 lg file transmission:256
>> RX bytes:49592870352 (47295.4 Mb) TX bytes:43566897620 (41548.6 Mb)
>>
>> ib1 Link encap:UNSPEC HWaddr 80-00-00-49-FE-C0-00-00-00-00-00-00-00-00-00-00
>> inet adr:10.149.9.198 Bcast:10.149.255.255 Masque:255.255.0.0
>> adr inet6: fe80::230:48ff:ffcc:4c46/64 Scope:Lien
>> UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
>> RX packets:673448 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:187 errors:0 dropped:5 overruns:0 carrier:0
>> collisions:0 lg file transmission:256
>> RX bytes:37713088 (35.9 Mb) TX bytes:11228 (10.9 Kb)
>>
>> lo Link encap:Boucle locale
>> inet adr:127.0.0.1 Masque:255.0.0.0
>> adr inet6: ::1/128 Scope:Hôte
>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>> RX packets:33504149 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:33504149 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 lg file transmission:0
>> RX bytes:5100850397 (4864.5 Mb) TX bytes:5100850397 (4864.5 Mb)
>>
>> sit0 Link encap:IPv6-dans-IPv4
>> NOARP MTU:1480 Metric:1
>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 lg file transmission:0
>> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>>
>> 6)> limit (on a worker node)
>> cputime unlimited
>> filesize unlimited
>> datasize unlimited
>> stacksize 300000 kbytes
>> coredumpsize 0 kbytes
>> memoryuse unlimited
>> vmemoryuse unlimited
>> descriptors 16384
>> memorylocked unlimited
>> maxproc 303104
>>
>> If some info is still missing despite all my efforts, please ask.
>>
>> Thanks in advance for any hints, Best, G.
>>
>>
>> <config.log.gz><ompi_info-all.001.gz>_______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
  Cordialement,   Gilbert.
--
*---------------------------------------------------------------------*
   Gilbert Grosdidier             Gilbert.Grosdidier_at_[hidden]
   LAL / IN2P3 / CNRS                 Phone : +33 1 6446 8909
   Faculté des Sciences, Bat. 200     Fax   : +33 1 6446 8546
   B.P. 34, F-91898 Orsay Cedex (FRANCE)
*---------------------------------------------------------------------*