Thanks - yes, that helps. Can you do add --display-map to you cmd
line? That will tell us what mpirun thinks it is doing.
On Dec 4, 2008, at 2:07 PM, V. Ram wrote:
> Ralph H. Castain wrote:
>> I confess to confusion. OpenMPI will by default map your processes in
>> a round-robin fashion based on process slot. If you are in a resource
>> managed environment (e.g., TM or SLURM), then the slots correspond to
>> cores. If you are in an unmanaged environment, then your hostfile
>> needs to specify a single hostname, and the slots=x number should
>> reflect the total number of cores on your machine.
>> If you then set mpi_paffinity_alone=1, OMPI will bind each rank to
>> associated core.
>> Is that not what you are trying to do?
> I probably didn't explain myself well. In this case, the system is
> running a resource manager like SLURM. It is running Linux. If I run
> numactl --hardware, then I get:
> available: 8 nodes (0-7)
> node 0 cpus: 0 1 2 3
> node 1 cpus: 4 5 6 7
> node 3 cpus: 12 13 14 15
> node 4 cpus: 16 17 18 19
> node 5 cpus: 20 21 22 23
> node 6 cpus: 24 25 26 27
> node 7 cpus: 28 29 30 31
> where I've elided the memory-related output as well as the node
> distances. Just to reiterate, each node here is an AMD processor,
> of the 8-way system; there is no IP networking going on.
> What I'd like is, if start a job with mpirun -np 16 <executablename>,
> these 16 MPI processes get allocated on continuous "cpus" in numactl
> parlance, e.g. cpus 0-15, or 12-27, etc.
> As it stands, if I check the cpus allocated to the aforementioned -
> np 16
> job, I see various cores active on multiple sockets, but I don't see
> whole sockets (all 4 cores) active at a time on this job.
> Does this make more sense?
> V. Ram
> http://www.fastmail.fm - A no graphics, no pop-ups email service
> users mailing list