Hi,
yesterday I installed openmpi-1.7a1r27358 and it has an improved
error message compared to openmpi-1.6.2, but doesn't show process bindings
and has some other problems as well.
"sunpc0" and "linpc0" are equipped with two dual-core processors running
Solaris 10 x86_64 and Linux x86_64 resp. "tyr" is a dual-processor machine
running Solaris 10 Sparc.
tyr fd1026 105 mpiexec -np 2 -host sunpc0 -report-bindings \
-map-by core -bind-to-core date
Sun Sep 23 11:46:36 CEST 2012
Sun Sep 23 11:46:36 CEST 2012
tyr fd1026 106 mpicc -showme
cc -I/usr/local/openmpi-1.7_64_cc/include -mt -m64
-L/usr/local/openmpi-1.7_64_cc/lib64 -lmpi -lpicl -lm -lkstat -llgrp
-lsocket -lnsl -lrt -lm
openmpi-1.6.2 shows process bindings.
tyr fd1026 103 mpiexec -np 2 -host sunpc0 -report-bindings \
-bycore -bind-to-core date
Sun Sep 23 12:09:06 CEST 2012
[sunpc0:13197] MCW rank 0 bound to socket 0[core 0]: [B .][. .]
[sunpc0:13197] MCW rank 1 bound to socket 0[core 1]: [. B][. .]
Sun Sep 23 12:09:06 CEST 2012
tyr fd1026 104 mpicc -showme
cc -I/usr/local/openmpi-1.6.2_64_cc/include -mt -m64
-L/usr/local/openmpi-1.6.2_64_cc/lib64 -lmpi -lm -lkstat -llgrp
-lsocket -lnsl -lrt -lm
On my Linux machine I get a warning.
tyr fd1026 113 mpiexec -np 2 -host linpc0 -report-bindings \
-map-by core -bind-to-core date
--------------------------------------------------------------------------
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: linpc0
This is a warning only; your job will continue, though performance may
be degraded.
--------------------------------------------------------------------------
Sun Sep 23 11:56:04 CEST 2012
Sun Sep 23 11:56:04 CEST 2012
Everything works fine with openmpi-1.6.2.
tyr fd1026 106 mpiexec -np 2 -host linpc0 -report-bindings \
-bycore -bind-to-core date
[linpc0:15808] MCW rank 0 bound to socket 0[core 0]: [B .][. .]
[linpc0:15808] MCW rank 1 bound to socket 0[core 1]: [. B][. .]
Sun Sep 23 12:11:47 CEST 2012
Sun Sep 23 12:11:47 CEST 2012
Om my Solaris Sparc machine I get the following errors.
tyr fd1026 121 mpiexec -np 2 -report-bindings -map-by core -bind-to-core date
[tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of bounds in file
../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at line 847
[tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of bounds in file
../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at line 1414
[tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of bounds in file
../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at line 847
[tyr.informatik.hs-fulda.de:23773] [[32457,0],0] ORTE_ERROR_LOG: Value out of bounds in file
../../../../openmpi-1.7a1r27358/orte/mca/odls/base/odls_base_default_fns.c at line 1414
tyr fd1026 122 mpiexec -np 2 -host tyr -report-bindings -map-by core -bind-to core date
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------
Once more everything works fine with openmpi-1.6.2.
tyr fd1026 109 mpiexec -np 2 -report-bindings -bycore -bind-to-core date
[tyr.informatik.hs-fulda.de:23869] MCW rank 0 bound to socket 0[core 0]: [B][.]
[tyr.informatik.hs-fulda.de:23869] MCW rank 1 bound to socket 1[core 0]: [.][B]
Sun Sep 23 12:14:09 CEST 2012
Sun Sep 23 12:14:09 CEST 2012
tyr fd1026 110 mpiexec -np 2 -host tyr -report-bindings -bycore -bind-to-core date
[tyr.informatik.hs-fulda.de:23877] MCW rank 0 bound to socket 0[core 0]: [B][.]
[tyr.informatik.hs-fulda.de:23877] MCW rank 1 bound to socket 1[core 0]: [.][B]
Sun Sep 23 12:16:05 CEST 2012
Sun Sep 23 12:16:05 CEST 2012
Kind regards
Siegmar
|