Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] one more problem with process bindings on openmpi-1.6.2
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-10-03 09:19:38


Hi,

I recognized another problem with procecss bindings. The command
works, if I use "-host" and it breaks, if I use "-hostfile" with
the same machines.

tyr fd1026 178 mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 \
  -cpus-per-proc 2 -bind-to-core hostname
sunpc1
[sunpc1:00086] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .]
[sunpc1:00086] MCW rank 3 bound to socket 1[core 0-1]: [. .][B B]
sunpc0
[sunpc0:10929] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .]
sunpc0
[sunpc0:10929] MCW rank 2 bound to socket 1[core 0-1]: [. .][B B]
sunpc1

tyr fd1026 179 cat host_sunpc0_1
sunpc0 slots=4
sunpc1 slots=4

tyr fd1026 180 mpiexec -report-bindings -hostfile host_sunpc0_1 -np 4 \
  -cpus-per-proc 2 -bind-to-core hostname
--------------------------------------------------------------------------
An invalid physical processor ID was returned when attempting to bind
an MPI process to a unique processor.

This usually means that you requested binding to more processors than
exist (e.g., trying to bind N MPI processes to M processors, where N >
M). Double check that you have enough unique processors for all the
MPI processes that you are launching on this host.

You job will now abort.
--------------------------------------------------------------------------
sunpc0
[sunpc0:10964] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .]
sunpc0
[sunpc0:10964] MCW rank 1 bound to socket 1[core 0-1]: [. .][B B]
--------------------------------------------------------------------------
mpiexec was unable to start the specified application as it encountered
  an error
on node sunpc0. More information may be available above.
--------------------------------------------------------------------------
4 total processes failed to start

Perhaps this error is related to the other errors. Thank you very
much for any help in advance.

Kind regards

Siegmar