Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] problem with rankfile in openmpi-1.6.2
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-10-01 02:17:54


Hi,

I installed openmpi-1.6.2 on our heterogeneous platform (Solaris 10
Sparc, Solaris 10 x86_84, and Linux x86_64).

tyr small_prog 125 mpiexec -report-bindings -np 4 -host sunpc0,sunpc1 \
  -bysocket -bind-to-core date
Mon Oct 1 07:53:15 CEST 2012
[sunpc0:02084] MCW rank 0 bound to socket 0[core 0]: [B .][. .]
[sunpc0:02084] MCW rank 2 bound to socket 1[core 0]: [. .][B .]
Mon Oct 1 07:53:15 CEST 2012
Mon Oct 1 07:53:15 CEST 2012
[sunpc1:21881] MCW rank 1 bound to socket 0[core 0]: [B .][. .]
Mon Oct 1 07:53:15 CEST 2012
[sunpc1:21881] MCW rank 3 bound to socket 1[core 0]: [. .][B .]

Now I try to do the same thing with the following rankfile.

rank 0=sunpc0.informatik.hs-fulda.de slot=0:0
rank 1=sunpc1.informatik.hs-fulda.de slot=0:0
rank 2=sunpc0.informatik.hs-fulda.de slot=1:0
rank 3=sunpc1.informatik.hs-fulda.de slot=1:0

tyr small_prog 126 mpiexec -report-bindings -rf rf_date_1.openmpi date
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------

I can also run the following commands successfully, but fail with the
same error message when I use an equivalent rankfile.

mpiexec -report-bindings -np 4 -host sunpc0,sunpc1 -bycore \
  -bind-to-socket date

mpiexec -report-bindings -np 10 -host linpc0,linpc1,sunpc0,sunpc1,tyr \
  -byslot -bind-to-core date

Do you have any ideas why it doesn't work with a rankfile?
Can I provide more information so that you can track down and
solve the problem?

I still have problems with our Sun M4000 server (two hardware threads per
core so that I should use "-bind-to hwthread").

tyr small_prog 133 mpiexec -report-bindings -np 2 -host rs0 -byslot \
  -bind-to-core date
--------------------------------------------------------------------------
An attempt to set processor affinity has failed - please check to
ensure that your system supports such functionality. If so, then
this is probably something that should be reported to the OMPI developers.
--------------------------------------------------------------------------
[rs0....:23147] MCW rank 0 bound to socket 0[core 0]: [B . . .][. . . .]
--------------------------------------------------------------------------
mpiexec was unable to start the specified application as it encountered an error:

Error name: Resource temporarily unavailable
Node: rs0

when attempting to start process rank 0.
--------------------------------------------------------------------------
2 total processes failed to start

I would be grateful if there is some kind of solution for this
machine as well in the (near) future. Thank you very much for
any help in advance.

Kind regards

Siegmar