Not good:

 

/labhome/alexm/workspace/openmpi-1.6.1a1hge06c2f2a0859/inst/bin/mpirun --host h-qa-017,h-qa-017,h-qa-017,h-qa-017,h-qa-018,h-qa-018,h-qa-018,h-qa-018 -np 8 --bind-to-core -bynode -display-map /usr/mpi/gcc/mlnx-openmpi-1.6rc4/tests/osu_benchmarks-3.1.1/osu_alltoall

 

 ========================   JOB MAP   ========================

 

 Data for node: h-qa-017               Num procs: 4

                Process OMPI jobid: [36855,1] Process rank: 0

                Process OMPI jobid: [36855,1] Process rank: 2

                Process OMPI jobid: [36855,1] Process rank: 4

                Process OMPI jobid: [36855,1] Process rank: 6

 

 Data for node: h-qa-018               Num procs: 4

                Process OMPI jobid: [36855,1] Process rank: 1

                Process OMPI jobid: [36855,1] Process rank: 3

                Process OMPI jobid: [36855,1] Process rank: 5

                Process OMPI jobid: [36855,1] Process rank: 7

 

 =============================================================

--------------------------------------------------------------------------

An invalid physical processor ID was returned when attempting to bind

an MPI process to a unique processor.

 

This usually means that you requested binding to more processors than

exist (e.g., trying to bind N MPI processes to M processors, where N >

M).  Double check that you have enough unique processors for all the

MPI processes that you are launching on this host.

 

 

$hwloc-ls --of console
Machine (32GB)
  NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (20MB) + L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
    PU L#0 (P#0)
    PU L#1 (P#2)
  NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (20MB) + L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
    PU L#2 (P#1)
    PU L#3 (P#3)



On Tue, May 29, 2012 at 11:00 PM, Jeff Squyres <jsquyres@cisco.com> wrote:
Per ticket #3108, there were still some unfortunate bugs in the affinity code in 1.6.  :-(

These have now been fixed.  ...but since is the 2nd or 3rd time we have "fixed" the 1.5/1.6 series w.r.t. processor affinity, I'd really like people to test this stuff before it's committed and we ship 1.6.1.  I've put tarballs containing the fixes here:

   http://www.open-mpi.org/~jsquyres/unofficial/

Can you please try mpirun options like --bind-to-core and --bind-to-socket and ensure that they still work for you?  (even on machines with hyperthreading enabled, if you have access to such things)

IBM: I'd particularly like to hear that we haven't made anything worse on POWER systems.  Thanks.

--
Jeff Squyres
jsquyres@cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel