Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Help with multicore AMD machine performance
From: Pavel Mezentsev (pavel.mezentsev_at_[hidden])
Date: 2012-03-30 08:17:08


You can try running using this script:
#!/bin/bash

s=$(($OMPI_COMM_WORLD_NODE_RANK))

numactl --physcpubind=$((s)) --localalloc ./YOUR_PROG

instead of 'mpirun ... ./YOUR_PROG' run 'mpirun ... ./SCRIPT

I tried this with openmpi-1.5.4 and it helped.

Best regards, Pavel Mezentsev

P.S openmpi-1.5.5 bind processes correctly, so you can try it as well.

2012/3/30 Ralph Castain <rhc_at_[hidden]>

> I think you'd have much better luck using the developer's trunk as the
> binding there is much better - e.g., you can bind to NUMA instead of just
> cores. The 1.4 binding is pretty limited.
>
> http://www.open-mpi.org/nightly/trunk/
>
> On Mar 30, 2012, at 5:02 AM, Ricardo Fonseca wrote:
>
> > Hi guys
> >
> > I'm benchmarking our (well tested) parallel code on and AMD based
> system, featuring 2x AMD Opteron(TM) Processor 6276, with 16 cores each for
> a total of 32 cores. The system is running Scientific Linux 6.1 and OpenMPI
> 1.4.5.
> >
> > When I run a single core job the performance is as expected. However,
> when I run with 32 processes the performance drops to about 60% (when
> compared with other systems running the exact same problem, so this is not
> a code scaling issue). I think this may have to do with core binding /
> NUMA, but I haven't been able to get any improvement out of the bind-*
> mpirun options.
> >
> > Any suggestions?
> >
> > Thanks in advance,
> > Ricardo
> >
> > P.S: Here's the output of lscpu
> >
> > Architecture: x86_64
> > CPU op-mode(s): 32-bit, 64-bit
> > Byte Order: Little Endian
> > CPU(s): 32
> > On-line CPU(s) list: 0-31
> > Thread(s) per core: 2
> > Core(s) per socket: 8
> > CPU socket(s): 2
> > NUMA node(s): 4
> > Vendor ID: AuthenticAMD
> > CPU family: 21
> > Model: 1
> > Stepping: 2
> > CPU MHz: 2300.045
> > BogoMIPS: 4599.38
> > Virtualization: AMD-V
> > L1d cache: 16K
> > L1i cache: 64K
> > L2 cache: 2048K
> > L3 cache: 6144K
> > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14
> > NUMA node1 CPU(s): 16,18,20,22,24,26,28,30
> > NUMA node2 CPU(s): 1,3,5,7,9,11,13,15
> > NUMA node3 CPU(s): 17,19,21,23,25,27,29,31
> >
> > ---
> > Ricardo Fonseca
> >
> > Associate Professor
> > GoLP - Grupo de Lasers e Plasmas
> > Instituto de Plasmas e Fusão Nuclear
> > Instituto Superior Técnico
> > Av. Rovisco Pais
> > 1049-001 Lisboa
> > Portugal
> >
> > tel: +351 21 8419202
> > fax: +351 21 8464455
> > web: http://golp.ist.utl.pt/
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>