Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] question to binding options in openmpi-1.6.2
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2012-10-03 09:29:35


Hi,

thank you very much for your help. Now the command with "-npersocket"
works. Unfortunately it is not a solution for the other problem, which
I reported a few minutes ago.

tyr fd1026 191 cat host_sunpc0_1
sunpc0 sockets=2 slots=4
sunpc1 sockets=2 slots=4

tyr fd1026 192 mpiexec -report-bindings -hostfile host_sunpc0_1 -np 4
-cpus-per-proc 2 -bind-to-core hostname
--------------------------------------------------------------------------
An invalid physical processor ID was returned when attempting to bind
an MPI process to a unique processor.

This usually means that you requested binding to more processors than
exist (e.g., trying to bind N MPI processes to M processors, where N >
M). Double check that you have enough unique processors for all the
MPI processes that you are launching on this host.

You job will now abort.
--------------------------------------------------------------------------
sunpc0
[sunpc0:11341] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .]
sunpc0
[sunpc0:11341] MCW rank 1 bound to socket 1[core 0-1]: [. .][B B]
--------------------------------------------------------------------------
mpiexec was unable to start the specified application as it encountered an error
on node sunpc0. More information may be available above.
--------------------------------------------------------------------------
4 total processes failed to start

Perhaps you find a solution for that error as well. Thank you very much
for your help in advance.

Kind regards

Siegmar

> Okay, I looked at this and the problem isn't in the code. The
> problem is that the 1.6 series doesn't have the more sophisticated
> discovery and mapping algorithms of the 1.7 series. In this case,
< the specific problem is that the 1.6 series doesn't automatically
> detect the number of sockets on a node - you have to tell it in
> your hostfile:
>
> foo.domain.org sockets=2 slots=4
>
> Otherwise, you'll get this poor error message as it tries to
> communicate that 0 sockets => zero processes.
>
>
> On Oct 2, 2012, at 2:44 AM, Siegmar Gross
<Siegmar.Gross_at_[hidden]> wrote:
>
> > Option "-npersocket" doesnt't work, even if I reduce "-npersocket"
> > to "1". Why doesn't it find any sockets, although the above commands
> > could find both sockets?
> >
> > mpiexec -report-bindings -host sunpc0 -np 2 -npersocket 1 hostname
> > --------------------------------------------------------------------------
> > Your job has requested a conflicting number of processes for the
> > application:
> >
> > App: hostname
> > number of procs: 2
> >
> > This is more processes than we can launch under the following
> > additional directives and conditions:
> >
> > number of sockets: 0
> > npersocket: 1
> >
> > Please revise the conflict and try again.
> > --------------------------------------------------------------------------
>