Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups
From: Brock Palen (brockp_at_[hidden])
Date: 2012-11-06 09:24:38


Chis,

If you assume your Cpusets are correct, and you are not doing any hybrid thread+mpi I found the problem is avoided if you enable -bind-to-core with openmpi 1.6.x

We just don't enable binding by default on our setup and thus far no users have been bit by this.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
brockp_at_[hidden]
(734)936-1985

On Nov 5, 2012, at 9:00 PM, Christopher Samuel wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 06/11/12 08:57, Brock Palen wrote:
>
>> Ok more information (had to build newer hwloc) My job today only
>> 2 processes are running at half speed and they indeed are sharing
>> the same core:
>
> We've seen the same occasionally using CentOS5/RHEL5 with jobs running
> under Torque with cpusets enabled.
>
> Never been able to explain it and the most recent case was someone
> using a home compiled version of NAMD, the problem disappeared when
> they started using our provided builds.
>
> I was fixing up the running problem jobs by hand by assigning procs to
> individual cores on the nodes with cpusets. :-/
>
> cheers,
> Chris
> - --
> Christopher Samuel Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/ http://twitter.com/vlsci
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
>
> iEYEARECAAYFAlCYb1sACgkQO2KABBYQAh/OGACeNL7bow7z26El31zIg16q+tPw
> toIAnigL5SHhZXM42DGY3M2Ewt6PUNIk
> =/bNA
> -----END PGP SIGNATURE-----
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users