Hi
today I tried a different rankfile and got once more a problem. :-((
> > thank you very much for your patch. I have applied the patch to
> > openmpi-1.6.4rc4.
> >
> > Open MPI: 1.6.4rc4r28022
> > : [B .][. .] (slot list 0:0)
> > : [. B][. .] (slot list 0:1)
> > : [B B][. .] (slot list 0:0-1)
> > : [. .][B .] (slot list 1:0)
> > : [. .][. B] (slot list 1:1)
> > : [. .][B B] (slot list 1:0-1)
> > : [B B][B B] (slot list 0:0-1,1:0-1)
>
> That looks great. I'll file a CMR to get this patch into 1.6.
> Unless you indicate otherwise, I'll assume this issue is understood
> for 1.6.
Rankfile rf_6 is the same as last time. I have added one more
line in rf_7 and I switched the sequence of the hosts in rf_8.
Everything is still fine with rf_6. I don't get any output for
rank 1 with rf_7 and I get an error for rf_8. Both machines
use the same hardware.
sunpc1 rankfiles 106 cat rf_6
# mpiexec -report-bindings -rf rf_6 hostname
rank 0=sunpc1 slot=0:0-1,1:0-1
sunpc1 rankfiles 107 cat rf_7
# mpiexec -report-bindings -rf rf_7 hostname
rank 0=sunpc1 slot=0:0-1,1:0-1
rank 1=sunpc0 slot=0:0-1
sunpc1 rankfiles 108 cat rf_8
# mpiexec -report-bindings -rf rf_8 hostname
rank 0=sunpc0 slot=0:0-1,1:0-1
rank 1=sunpc1 slot=0:0-1
sunpc1 rankfiles 109 mpiexec -report-bindings -rf rf_6 hostname
[sunpc1:09779] MCW rank 0 bound to socket 0[core 0-1]
socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
sunpc1 rankfiles 110 mpiexec -report-bindings -rf rf_7 hostname
[sunpc1:09782] MCW rank 0 bound to socket 0[core 0-1]
socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
sunpc1 rankfiles 111 mpiexec -report-bindings -rf rf_8 hostname
--------------------------------------------------------------------------
The rankfile that was used claimed that a host was either not
allocated or oversubscribed its slots. Please review your rank-slot
assignments and your host allocation to ensure a proper match. Also,
some systems may require using full hostnames, such as
"host1.example.com" (instead of just plain "host1").
Host: sunpc0
--------------------------------------------------------------------------
I get the following output, if I use sunpc0 as local host.
sunpc0 rankfiles 102 mpiexec -report-bindings -rf rf_6 hostname
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------
sunpc0 rankfiles 103 mpiexec -report-bindings -rf rf_7 hostname
--------------------------------------------------------------------------
The rankfile that was used claimed that a host was either not
allocated or oversubscribed its slots. Please review your rank-slot
assignments and your host allocation to ensure a proper match. Also,
some systems may require using full hostnames, such as
"host1.example.com" (instead of just plain "host1").
Host: sunpc1
--------------------------------------------------------------------------
sunpc0 rankfiles 104 mpiexec -report-bindings -rf rf_8 hostname
[sunpc0:19027] MCW rank 0 bound to socket 0[core 0-1]
socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
I get the following output, if I use tyr as local host.
tyr rankfiles 218 mpiexec -report-bindings -rf rf_6 hostname
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------
tyr rankfiles 219 mpiexec -report-bindings -rf rf_7 hostname
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------
tyr rankfiles 220 mpiexec -report-bindings -rf rf_8 hostname
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------
Do you have any ideas why this happens? Thank you very much for
any help in advance.
Kind regards
Siegmar
|