Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] bindings not reported and other problems in openmpi-1.7a1r27358
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-09-23 06:20:11


Hi,

yesterday I installed openmpi-1.7a1r27358 and it has an improved
error message compared to openmpi-1.6.2, but doesn't show process bindings
and has some other problems as well.

"sunpc0" and "linpc0" are equipped with two dual-core processors running
Solaris 10 x86_64 and Linux x86_64 resp. "tyr" is a dual-processor machine
running Solaris 10 Sparc.

tyr fd1026 105 mpiexec -np 2 -host sunpc0 -report-bindings \
  -map-by core -bind-to-core date
Sun Sep 23 11:46:36 CEST 2012
Sun Sep 23 11:46:36 CEST 2012

tyr fd1026 106 mpicc -showme
cc -I/usr/local/openmpi-1.7_64_cc/include -mt -m64
  -L/usr/local/openmpi-1.7_64_cc/lib64 -lmpi -lpicl -lm -lkstat -llgrp
  -lsocket -lnsl -lrt -lm

openmpi-1.6.2 shows process bindings.

tyr fd1026 103 mpiexec -np 2 -host sunpc0 -report-bindings \
  -bycore -bind-to-core date
Sun Sep 23 12:09:06 CEST 2012
[sunpc0:13197] MCW rank 0 bound to socket 0[core 0]: [B .][. .]
[sunpc0:13197] MCW rank 1 bound to socket 0[core 1]: [. B][. .]
Sun Sep 23 12:09:06 CEST 2012

tyr fd1026 104 mpicc -showme
cc -I/usr/local/openmpi-1.6.2_64_cc/include -mt -m64
  -L/usr/local/openmpi-1.6.2_64_cc/lib64 -lmpi -lm -lkstat -llgrp
  -lsocket -lnsl -lrt -lm

On my Linux machine I get a warning.

tyr fd1026 113 mpiexec -np 2 -host linpc0 -report-bindings \
  -map-by core -bind-to-core date
--------------------------------------------------------------------------
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

  Node: linpc0

This is a warning only; your job will continue, though performance may
be degraded.
--------------------------------------------------------------------------
Sun Sep 23 11:56:04 CEST 2012
Sun Sep 23 11:56:04 CEST 2012

Everything works fine with openmpi-1.6.2.

tyr fd1026 106 mpiexec -np 2 -host linpc0 -report-bindings \
  -bycore -bind-to-core date
[linpc0:15808] MCW rank 0 bound to socket 0[core 0]: [B .][. .]
[linpc0:15808] MCW rank 1 bound to socket 0[core 1]: [. B][. .]
Sun Sep 23 12:11:47 CEST 2012
Sun Sep 23 12:11:47 CEST 2012

Om my Solaris Sparc machine I get the following errors.

tyr fd1026 121 mpiexec -np 2 -report-bindings -map-by core -bind-to-core date
[tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of bounds in file
../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at line 847
[tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of bounds in file
../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at line 1414
[tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of bounds in file
../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at line 847
[tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of bounds in file
../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at line 1414

tyr fd1026 122 mpiexec -np 2 -host tyr -report-bindings -map-by core -bind-to core date
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------

Once more everything works fine with openmpi-1.6.2.

tyr fd1026 109 mpiexec -np 2 -report-bindings -bycore -bind-to-core date
[tyr.informatik.hs-fulda.de:23869] MCW rank 0 bound to socket 0[core 0]: [B][.]
[tyr.informatik.hs-fulda.de:23869] MCW rank 1 bound to socket 1[core 0]: [.][B]
Sun Sep 23 12:14:09 CEST 2012
Sun Sep 23 12:14:09 CEST 2012

tyr fd1026 110 mpiexec -np 2 -host tyr -report-bindings -bycore -bind-to-core date
[tyr.informatik.hs-fulda.de:23877] MCW rank 0 bound to socket 0[core 0]: [B][.]
[tyr.informatik.hs-fulda.de:23877] MCW rank 1 bound to socket 1[core 0]: [.][B]
Sun Sep 23 12:16:05 CEST 2012
Sun Sep 23 12:16:05 CEST 2012

Kind regards

Siegmar