Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] what is a "node"?
From: Gus Correa (gus_at_[hidden])
Date: 2012-08-30 14:32:28

Hi Zbigniew

Besides the OpenMPI processor affinity capability that Jeff mentioned.

If your Curie cluster has a resource manager [Torque, SGE, etc],
your job submission script to the resource manager/ queue system
should specifically request a single node, for the test that you have in

For instance, on Torque/PBS, this would be done by adding this directive to
the top of the job script:

#PBS -l nodes=1:ppn=8
mpiexec -np 8 ...

meaning that you want the 8 processors [i.e. cores] to be in a single node.

On top of this, you need to add the appropriate process binding
keywords to the mpiexec command line, as Jeff suggested.
'man mpiexec' will tell you a lot about the OpenMPI process binding
capability, specially in 1.6 and 1.4 series.

In the best of the worlds your resource manager has the ability to also
assign a group of
cores exclusively to each of the jobs that may be sharing the node.
Say, job1 requests 4 cores and gets cores 0-3 and cannot use any other
job2 requests 8 cores and gets cores 4-11 and cannot use any other
cores, and so on.

However, not all resource managers/ queue systems are built this way
[particularly the older versions],
and may let the various job processes to drift across all cores in the node.

If the resource manager is old and doesn't have that hardware locality
and if you don't want your performance test to risk being polluted by
other jobs running on the same node, that perhaps share the same cores
with your job,
then you can request all 32 cores in the node for your job,
but use only 8 of them to run your MPI program.
It is wasteful, but may be the only way to go.
For instance, on Torque:

#PBS -l nodes=1:ppn=32
mpiexec -np 8 ...

Again, add the OpenMPI process binding keywords to the mpiexec command line,
to ensure the use of a fixed group of 8 cores.

With SGE and Slurm the syntax is different than the above,
but I would guess that there is an equivalent setup.

I hope this helps,
Gus Correa

On 08/30/2012 08:07 AM, Jeff Squyres wrote:
> In the OMPI v1.6 series, you can use the processor affinity options. And you can use --report-bindings to show exactly where processes were bound. For example:
> -----
> % mpirun -np 4 --bind-to-core --report-bindings -bycore uptime
> [svbu-mpi056:18904] MCW rank 0 bound to socket 0[core 0]: [B . . .][. . . .]
> [svbu-mpi056:18904] MCW rank 1 bound to socket 0[core 1]: [. B . .][. . . .]
> [svbu-mpi056:18904] MCW rank 2 bound to socket 0[core 2]: [. . B .][. . . .]
> [svbu-mpi056:18904] MCW rank 3 bound to socket 0[core 3]: [. . . B][. . . .]
> 05:06:13 up 7 days, 6:57, 1 user, load average: 0.29, 0.10, 0.03
> 05:06:13 up 7 days, 6:57, 1 user, load average: 0.29, 0.10, 0.03
> 05:06:13 up 7 days, 6:57, 1 user, load average: 0.29, 0.10, 0.03
> 05:06:13 up 7 days, 6:57, 1 user, load average: 0.29, 0.10, 0.03
> %
> -----
> I bound each process to a single core, and mapped them on a round-robin basis by core. Hence, all 4 processes ended up on their own cores on a single processor socket.
> The --report-bindings output shows that this particular machine has 2 sockets, each with 4 cores.
> On Aug 30, 2012, at 5:37 AM, Zbigniew Koza wrote:
>> Hi,
>> consider this specification:
>> "Curie fat consists in 360 nodes which contains 4 eight cores CPU Nehalem-EX clocked at 2.27 GHz, let 32 cores / node and 11520 cores for the full fat configuration"
>> Suppose I would like to run some performance tests just on a single processor rather than 4 of them.
>> Is there a way to do this?
>> I'm afraid specifying that I need 1 cluster node with 8 MPI prcesses
>> will result in OS distributing these 8 processes among 4
>> processors forming the node, and this is not what I'm after.
>> Z Koza
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]