Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] 1.6.2 affinity failures
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-12-19 21:08:10


I'm afraid these are both known problems in the 1.6.2 release. I believe we fixed npersocket in 1.6.3, though you might check to be sure. On the large-scale issue, cpus-per-rank well might fail under those conditions. The algorithm in the 1.6 series hasn't seen much use, especially at scale.

In fact, cpus-per-rank has somewhat fallen by the wayside recently due to apparent lack of interest. I'm restoring it for the 1.7 series over the holiday (currently doesn't work in 1.7 or trunk).

On Dec 19, 2012, at 4:34 PM, Brock Palen <brockp_at_[hidden]> wrote:

> Using openmpi 1.6.2 with intel 13.0 though the problem not specific to the compiler.
>
> Using two 12 core 2 socket nodes,
>
> mpirun -np 4 -npersocket 2 uptime
> --------------------------------------------------------------------------
> Your job has requested a conflicting number of processes for the
> application:
>
> App: uptime
> number of procs: 4
>
> This is more processes than we can launch under the following
> additional directives and conditions:
>
> number of sockets: 0
> npersocket: 2
>
>
> Any idea why this wouldn't work?
>
> Another problem the following does what I expect, two 2 socket 8 core sockets. 16 total cores/node.
>
> mpirun -np 8 -npernode 4 -bind-to-core -cpus-per-rank 4 hwloc-bind --get
> 0x0000000f
> 0x0000000f
> 0x000000f0
> 0x000000f0
> 0x00000f00
> 0x00000f00
> 0x0000f000
> 0x0000f000
>
> But fails at large scale:
>
> mpirun -np 276 -npernode 4 -bind-to-core -cpus-per-rank 4 hwloc-bind --get
>
> --------------------------------------------------------------------------
> An invalid physical processor ID was returned when attempting to bind
> an MPI process to a unique processor.
>
> This usually means that you requested binding to more processors than
> exist (e.g., trying to bind N MPI processes to M processors, where N >
> M). Double check that you have enough unique processors for all the
> MPI processes that you are launching on this host.
> You job will now abort.
> --------------------------------------------------------------------------
>
>
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> brockp_at_[hidden]
> (734)936-1985
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users