You might want to try the OMPI tarball that is about to become OMPI v1.6.1 -- we made a bunch of affinity-related fixes, and it should be much more predictable / stable in what it does in terms of process binding:
http://www.open-mpi.org/~jsquyres/unofficial/
(these affinity fixes are not yet in a nightly 1.6 tarball because we're testing them before they get committed to the OMPI v1.6 SVN branch)
On May 30, 2012, at 9:54 AM, Brice Goglin wrote:
> Hello Youri,
> When using openmpi 1.4.4 with --np 2 --bind-to-core --bycore it reports the following:
>> [hostname:03339] [[17125,0],0] odls:default:fork binding child [[17125,1],0] to cpus 0001
>>
>> [hostname:03339] [[17125,0],0] odls:default:fork binding child [[17125,1],1] to cpus 0002
>>
>
> Bitmask 0001 and 0002 mean CPUs with physical indexes 0 and 1 in OMPI 1.4. So that corresponds to the first core of each socket, and that matches what hwloc-ps says. Try "hwloc-ps -c" should show the same bitmask.
>
> However, I agree that these are not adjacent cores, but I don't know enough of OMPI binding options to understand what it was supposed to do in your case.
>
> Brice
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
|