Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-01-07 09:50:10


Can you run with np=8?

On Jan 7, 2011, at 9:49 AM, Gilbert Grosdidier wrote:

> Hi Jeff,
>
> Thanks for taking care of this.
>
> Here is what I got on a worker node:
>
> > mpirun --mca mpi_paffinity_alone 1 /opt/software/SGI/hwloc/1.1rc6r3028/bin/hwloc-bind --get
> 0x00000001
>
> Is this what is expected, please ? Or should I try yet another command ?
>
> Thanks, Regards, Gilbert.
>
>
>
> Le 7 janv. 11 à 15:35, Jeff Squyres a écrit :
>
>> On Jan 6, 2011, at 11:23 PM, Gilbert Grosdidier wrote:
>>
>>>> lstopo
>>> Machine (35GB)
>>> NUMANode L#0 (P#0 18GB) + Socket L#0 + L3 L#0 (8192KB)
>>> L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
>>> PU L#0 (P#0)
>>> PU L#1 (P#8)
>>> L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
>>> PU L#2 (P#1)
>>> PU L#3 (P#9)
>>> L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2
>>> PU L#4 (P#2)
>>> PU L#5 (P#10)
>>> L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3
>>> PU L#6 (P#3)
>>> PU L#7 (P#11)
>> [snip]
>>
>> Well, this might disprove my theory. :-\ The OS indexing is not contiguous on the hyperthreads, so I might be wrong about what happened here. Try this:
>>
>> mpirun --mca mpi_paffinity_alone 1 hwloc-bind --get
>>
>> You can even run that on just one node; let's see what you get. This will tell us what each process is *actually* bound to. hwloc-bind --get will report a bitmask of the P#'s from above. So if we see 001, 010, 011, ...etc, then my theory of OMPI binding 1 proc per hyperthread (vs. 1 proc per core) is incorrect.
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>
> --
> *---------------------------------------------------------------------*
> Gilbert Grosdidier Gilbert.Grosdidier_at_[hidden]
> LAL / IN2P3 / CNRS Phone : +33 1 6446 8909
> Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546
> B.P. 34, F-91898 Orsay Cedex (FRANCE)
> *---------------------------------------------------------------------*
>
>
>
>
>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/