Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Oversubscription of nodes with Torque and OpenMPI
From: Jason Gans (jgans_at_[hidden])
Date: 2013-11-22 13:34:31

On 11/22/13 11:18 AM, Lloyd Brown wrote:
> As far as I understand, the mpirun will assign processes to hosts in the
> hostlist ($PBS_NODEFILE) sequentially, and if it runs out of hosts in
> the list, it starts over at the top of the file.
> Theoretically, you should be able to request specific hostnames, and the
> processor counts per hostname, in your torque submit request. I'm not
> sure if this is correct (we don't use Torque here anymore, and I'm going
> off memory), but it should be approximately correct:
>> qsub -l nodes=n0000:2+n0001:2+n0002:8+n0003:8+n0004:2+n0005:2+n0006:2+n0007:4 ...
Thanks! This is awkward, but it did the trick. To get the desired
behavior I first
had to provide a "fake" nodes file to Torque (where all of the nodes
were listed
as having a large number of processors, i.e. np=8). Now I can submit
jobs using:

qsub -I -l nodes=n0000:ppn=2+n0001:ppn=2+n0002:ppn=8+...

and get the expected behavior (including the expected $PBS_NODFILE,
where the
name of each node appears "ppn" number of times).

Thanks to everyone who responded!


> Granted, that's awkward, but I'm not sure if there's another way in
> Torque to request different numbers of processors per node. You might
> ask on the Torque Users list. They might tell you to change the nodes
> file to reflect the number of actual processes you want on each node,
> rather than the number of physical processors on the hosts. Whether
> this works for you, depends on whether you want this type of
> oversubscription to happen all the time, or on a per-job basis, etc.
> Lloyd Brown
> Systems Administrator
> Fulton Supercomputing Lab
> Brigham Young University
> On 11/22/2013 11:11 AM, Gans, Jason D wrote:
>> I have tried the 1.7 series (specifically 1.7.3) and I get the same
>> behavior.
>> When I run "mpirun -oversubscribe -np 24 hostname", three instances of
>> "hostname" are run on each node.
>> The contents of the $PBS_NODEFILE are:
>> n0007
>> n0006
>> n0005
>> n0004
>> n0003
>> n0002
>> n0001
>> n0000
>> but, since I have compiled OpenMPI using the "--with-tm", it appears
>> that OpenMPI is not using the $PBS_NODEFILE (which I tested by modifying
>> the torque pbs_mom to write a $PBS_NODEFILE that contained "slot=xx"
>> information for each node. mpirun complained when I did this).
>> Regards,
>> Jason
>> ------------------------------------------------------------------------
>> *From:* users [users-bounces_at_[hidden]] on behalf of Ralph Castain
>> [rhc_at_[hidden]]
>> *Sent:* Friday, November 22, 2013 11:04 AM
>> *To:* Open MPI Users
>> *Subject:* Re: [OMPI users] Oversubscription of nodes with Torque and
>> OpenMPI
>> Really shouldn't matter - this is clearly a bug in OMPI if it is doing
>> mapping as you describe. Out of curiosity, have you tried the 1.7
>> series? Does it behave the same?
>> I can take a look at the code later today and try to figure out what
>> happened.
>> On Nov 22, 2013, at 9:56 AM, Jason Gans <jgans_at_[hidden]
>> <mailto:jgans_at_[hidden]>> wrote:
>>> On 11/22/13 10:47 AM, Reuti wrote:
>>>> Hi,
>>>> Am 22.11.2013 um 17:32 schrieb Gans, Jason D:
>>>>> I would like to run an instance of my application on every *core* of
>>>>> a small cluster. I am using Torque 2.5.12 to run jobs on the
>>>>> cluster. The cluster in question is a heterogeneous collection of
>>>>> machines that are all past their prime. Specifically, the number of
>>>>> cores ranges from 2-8. Here is the Torque "nodes" file:
>>>>> n0000 np=2
>>>>> n0001 np=2
>>>>> n0002 np=8
>>>>> n0003 np=8
>>>>> n0004 np=2
>>>>> n0005 np=2
>>>>> n0006 np=2
>>>>> n0007 np=4
>>>>> When I use openmpi-1.6.3, I can oversubscribe nodes but the tasks
>>>>> are allocated to nodes without regard to the number of cores on each
>>>>> node (specified by the "np=xx" in the nodes file). For example, when
>>>>> I run "mpirun -np 24 hostname", mpirun places three instances of
>>>>> "hostname" on each node, despite the fact that some nodes only have
>>>>> two processors and some have more.
>>>> You submitted the job itself by requesting 24 cores for it too?
>>>> -- Reuti
>>> Since there are only 8 Torque nodes in the cluster, I submitted the
>>> job by requesting 8 nodes, i.e. "qsub -I -l nodes=8".
>>>>> Is there a way to have OpenMPI "gracefully" oversubscribe nodes by
>>>>> allocating instances based on the "np=xx" information in the Torque
>>>>> nodes file? It this a Torque problem?
>>>>> p.s. I do get the desired behavior when I run *without* Torque and
>>>>> specify the following machine file to mpirun:
>>>>> n0000 slots=2
>>>>> n0001 slots=2
>>>>> n0002 slots=8
>>>>> n0003 slots=8
>>>>> n0004 slots=2
>>>>> n0005 slots=2
>>>>> n0006 slots=2
>>>>> n0007 slots=4
>>>>> Regards,
>>>>> Jason
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]