Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance
From: Gilbert Grosdidier (Gilbert.Grosdidier_at_[hidden])
Date: 2011-01-07 09:56:00


Yes, here it is :

> mpirun -np 8 --mca mpi_paffinity_alone 1 /opt/software/SGI/hwloc/
1.1rc6r3028/bin/hwloc-bind --get
0x00000001
0x00000002
0x00000004
0x00000008
0x00000010
0x00000020
0x00000040
0x00000080

  Gilbert.

Le 7 janv. 11 à 15:50, Jeff Squyres a écrit :

> Can you run with np=8?
>
> On Jan 7, 2011, at 9:49 AM, Gilbert Grosdidier wrote:
>
>> Hi Jeff,
>>
>> Thanks for taking care of this.
>>
>> Here is what I got on a worker node:
>>
>>> mpirun --mca mpi_paffinity_alone 1 /opt/software/SGI/hwloc/
>>> 1.1rc6r3028/bin/hwloc-bind --get
>> 0x00000001
>>
>> Is this what is expected, please ? Or should I try yet another
>> command ?
>>
>> Thanks, Regards, Gilbert.
>>
>>
>>
>> Le 7 janv. 11 à 15:35, Jeff Squyres a écrit :
>>
>>> On Jan 6, 2011, at 11:23 PM, Gilbert Grosdidier wrote:
>>>
>>>>> lstopo
>>>> Machine (35GB)
>>>> NUMANode L#0 (P#0 18GB) + Socket L#0 + L3 L#0 (8192KB)
>>>> L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
>>>> PU L#0 (P#0)
>>>> PU L#1 (P#8)
>>>> L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
>>>> PU L#2 (P#1)
>>>> PU L#3 (P#9)
>>>> L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2
>>>> PU L#4 (P#2)
>>>> PU L#5 (P#10)
>>>> L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3
>>>> PU L#6 (P#3)
>>>> PU L#7 (P#11)
>>> [snip]
>>>
>>> Well, this might disprove my theory. :-\ The OS indexing is not
>>> contiguous on the hyperthreads, so I might be wrong about what
>>> happened here. Try this:
>>>
>>> mpirun --mca mpi_paffinity_alone 1 hwloc-bind --get
>>>
>>> You can even run that on just one node; let's see what you get.
>>> This will tell us what each process is *actually* bound to. hwloc-
>>> bind --get will report a bitmask of the P#'s from above. So if we
>>> see 001, 010, 011, ...etc, then my theory of OMPI binding 1 proc
>>> per hyperthread (vs. 1 proc per core) is incorrect.
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>
>> --
>> *---------------------------------------------------------------------*
>> Gilbert Grosdidier Gilbert.Grosdidier_at_[hidden]
>> LAL / IN2P3 / CNRS Phone : +33 1 6446 8909
>> Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546
>> B.P. 34, F-91898 Orsay Cedex (FRANCE)
>> *---------------------------------------------------------------------*
>>
>>
>>
>>
>>
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>

--
*---------------------------------------------------------------------*
   Gilbert Grosdidier                 Gilbert.Grosdidier_at_[hidden]
   LAL / IN2P3 / CNRS                 Phone : +33 1 6446 8909
   Faculté des Sciences, Bat. 200     Fax   : +33 1 6446 8546
   B.P. 34, F-91898 Orsay Cedex (FRANCE)
*---------------------------------------------------------------------*