Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Oversubscription of nodes with Torque and OpenMPI
From: Jason Gans (jgans_at_[hidden])
Date: 2013-11-22 12:56:28

On 11/22/13 10:47 AM, Reuti wrote:
> Hi,
> Am 22.11.2013 um 17:32 schrieb Gans, Jason D:
>> I would like to run an instance of my application on every *core* of a small cluster. I am using Torque 2.5.12 to run jobs on the cluster. The cluster in question is a heterogeneous collection of machines that are all past their prime. Specifically, the number of cores ranges from 2-8. Here is the Torque "nodes" file:
>> n0000 np=2
>> n0001 np=2
>> n0002 np=8
>> n0003 np=8
>> n0004 np=2
>> n0005 np=2
>> n0006 np=2
>> n0007 np=4
>> When I use openmpi-1.6.3, I can oversubscribe nodes but the tasks are allocated to nodes without regard to the number of cores on each node (specified by the "np=xx" in the nodes file). For example, when I run "mpirun -np 24 hostname", mpirun places three instances of "hostname" on each node, despite the fact that some nodes only have two processors and some have more.
> You submitted the job itself by requesting 24 cores for it too?
> -- Reuti
Since there are only 8 Torque nodes in the cluster, I submitted the job
by requesting 8 nodes, i.e. "qsub -I -l nodes=8".
>> Is there a way to have OpenMPI "gracefully" oversubscribe nodes by allocating instances based on the "np=xx" information in the Torque nodes file? It this a Torque problem?
>> p.s. I do get the desired behavior when I run *without* Torque and specify the following machine file to mpirun:
>> n0000 slots=2
>> n0001 slots=2
>> n0002 slots=8
>> n0003 slots=8
>> n0004 slots=2
>> n0005 slots=2
>> n0006 slots=2
>> n0007 slots=4
>> Regards,
>> Jason
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]