Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OMPI 1.6 affinity fixes: PLEASE TEST
From: Mike Dubman (mike.ompi_at_[hidden])
Date: 2012-05-30 05:05:43


Not good:

/labhome/alexm/workspace/openmpi-1.6.1a1hge06c2f2a0859/inst/bin/mpirun
--host
h-qa-017,h-qa-017,h-qa-017,h-qa-017,h-qa-018,h-qa-018,h-qa-018,h-qa-018 -np
8 --bind-to-core -bynode -display-map
/usr/mpi/gcc/mlnx-openmpi-1.6rc4/tests/osu_benchmarks-3.1.1/osu_alltoall

 ======================== JOB MAP ========================

 Data for node: h-qa-017 Num procs: 4

                Process OMPI jobid: [36855,1] Process rank: 0

                Process OMPI jobid: [36855,1] Process rank: 2

                Process OMPI jobid: [36855,1] Process rank: 4

                Process OMPI jobid: [36855,1] Process rank: 6

 Data for node: h-qa-018 Num procs: 4

                Process OMPI jobid: [36855,1] Process rank: 1

                Process OMPI jobid: [36855,1] Process rank: 3

                Process OMPI jobid: [36855,1] Process rank: 5

                Process OMPI jobid: [36855,1] Process rank: 7

 =============================================================

--------------------------------------------------------------------------

An invalid physical processor ID was returned when attempting to bind

an MPI process to a unique processor.

This usually means that you requested binding to more processors than

exist (e.g., trying to bind N MPI processes to M processors, where N >

M). Double check that you have enough unique processors for all the

MPI processes that you are launching on this host.

$hwloc-ls --of console
Machine (32GB)
  NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (20MB) + L2 L#0 (256KB) +
L1 L#0 (32KB) + Core L#0
    PU L#0 (P#0)
    PU L#1 (P#2)
  NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (20MB) + L2 L#1 (256KB) +
L1 L#1 (32KB) + Core L#1
    PU L#2 (P#1)
    PU L#3 (P#3)

On Tue, May 29, 2012 at 11:00 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:

> Per ticket #3108, there were still some unfortunate bugs in the affinity
> code in 1.6. :-(
>
> These have now been fixed. ...but since is the 2nd or 3rd time we have
> "fixed" the 1.5/1.6 series w.r.t. processor affinity, I'd really like
> people to test this stuff before it's committed and we ship 1.6.1. I've
> put tarballs containing the fixes here:
>
> http://www.open-mpi.org/~jsquyres/unofficial/
>
> Can you please try mpirun options like --bind-to-core and --bind-to-socket
> and ensure that they still work for you? (even on machines with
> hyperthreading enabled, if you have access to such things)
>
> IBM: I'd particularly like to hear that we haven't made anything worse on
> POWER systems. Thanks.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>