Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-05-30 15:23:29


Ok, so I'm viewing this has a hardware/BIOS/something else failure, and doesn't indicate one way or the other whether the new OMPI 1.6 affinity code is working.

I would still very much like to see other people's testing results.

On May 30, 2012, at 2:02 PM, Brice Goglin wrote:

> Something is preventing all cores from appearing. The BIOS?
> My E5-2650 processors definitely have 8 cores (without counting hyperthreads) as advertised by Intel.
>
> Brice
>
>
>
> Le 30/05/2012 19:58, Mike Dubman a écrit :
>> no cgroups or cpusets.
>>
>> On Wed, May 30, 2012 at 4:59 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>> On May 30, 2012, at 9:47 AM, Mike Dubman wrote:
>>
>> > ohh.. you are right, false alarm :) sorry siblings != cores - so it is HT
>>
>> OMPI 1.6.soon-to-be-1 should handle HT properly, meaning that it will bind to all the HT's in a core and/or socket.
>>
>> Are you using Linux cgroups/cpusets to restrict available cores? Because Brice is saying that E5-2650 is supposed to have more cores.
>>
>>
>> > On Wed, May 30, 2012 at 4:36 PM, Brice Goglin <Brice.Goglin_at_[hidden]> wrote:
>> > Your /proc/cpuinfo output (filtered below) looks like only two sockets (physical ids 0 and 1), with one core each (cpu cores=1, core id=0), with hyperthreading (siblings=2). So lstopo looks good.
>> > E5-2650 is supposed to have 8 cores. I assume you use Linux cgroups/cpusets to restrict the available cores. The missconfiguration may be there.
>> > Brice
>> >
>> >
>> >
>> >
>> > Le 30/05/2012 15:14, Mike Dubman a écrit :
>> >> or, lstopo lies (Im not using the latest hwloc but one which comes with distro).
>> >> The machine has two dual-code sockets, total 4 physical cores:
>> >> processor : 0
>> >>
>> >> physical id : 0
>> >> siblings : 2
>> >> core id : 0
>> >> cpu cores : 1
>> >>
>> >> processor : 1
>> >>
>> >> physical id : 1
>> >> siblings : 2
>> >> core id : 0
>> >> cpu cores : 1
>> >>
>> >> processor : 2
>> >>
>> >> physical id : 0
>> >> siblings : 2
>> >> core id : 0
>> >> cpu cores : 1
>> >>
>> >> processor : 3
>> >>
>> >> physical id : 1
>> >> siblings : 2
>> >> core id : 0
>> >> cpu cores : 1
>> >>
>> >>
>> >>
>> >> On Wed, May 30, 2012 at 3:40 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>> >> Hmmm...well, from what I see, mpirun was actually giving you the right answer! I only see TWO cores on each node, yet you told it to bind FOUR processes on each node, each proc to be bound to a unique core.
>> >>
>> >> The error message was correct - there are not enough cores on those nodes to do what you requested.
>> >>
>> >>
>> >> On May 30, 2012, at 6:19 AM, Mike Dubman wrote:
>> >>
>> >>> attached.
>> >>>
>> >>> On Wed, May 30, 2012 at 2:32 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>> >>> On May 30, 2012, at 7:20 AM, Jeff Squyres wrote:
>> >>>
>> >>> >> $hwloc-ls --of console
>> >>> >> Machine (32GB)
>> >>> >> NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (20MB) + L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
>> >>> >> PU L#0 (P#0)
>> >>> >> PU L#1 (P#2)
>> >>> >> NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (20MB) + L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
>> >>> >> PU L#2 (P#1)
>> >>> >> PU L#3 (P#3)
>> >>> >
>> >>> > Is this hwloc output exactly the same on both nodes?
>> >>>
>> >>>
>> >>> More specifically, can you send the lstopo xml output from each of the 2 nodes you ran on?
>> >>>
>> >>> --
>> >>> Jeff Squyres
>> >>> jsquyres_at_[hidden]
>> >>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> devel mailing list
>> >>> devel_at_[hidden]
>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>
>> >>> <lstopo-out.tbz>_______________________________________________
>> >>>
>> >>> devel mailing list
>> >>> devel_at_[hidden]
>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>
>> >>
>> >> _______________________________________________
>> >> devel mailing list
>> >> devel_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> devel mailing list
>> >>
>> >> devel_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>>
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/