Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi+torque: How run job in a subset of the allocation?
From: George Markomanolis (george_at_[hidden])
Date: 2013-11-28 05:07:36


Hi,

Here is what I do to execute 20 mpirun calls using LSF and one job but
it is similar for your case I assume.

I use $LSB_HOSTS to extract the hosts from the job. I know how many
cores I want per job so I create machine files. For our application,
each execution has its own nodes but the last MPI processes are in
shared node. For example if I have two mpirun calls I need 40 cores (20
cores each one). I use three nodes (16 cores per node). First mpirun
call: first node + 0-3 core on the second node. Second mpirun call:
third node + 4-7 core on the second node. I do this in order not to
waste resources as I will need to execute ~20 mpirun calls not just two
and also the last 4 MPI processes do different task from the first 16 ones.

So I create machine files like that:
rank 0=s15r1b45 slot=0
rank 1=s15r1b45 slot=1
rank 2=s15r1b45 slot=2
rank 3=s15r1b45 slot=3
....

Now from the root node execute multiple mpirun calls like:

mpirun .... &

and after them use the command wait.

So you start many mpirun calls on the background and with the wait you
are sure that the job will not be killed before the executions are finished.

Just be careful that the machine files do not include common resources
(cores in my case).

Best regards,
George Markomanolis

On 11/27/2013 10:02 PM, Ralph Castain wrote:
> I'm afraid the two solvers would be in the same comm_world if launched that way
>
> Sent from my iPhone
>
>> On Nov 27, 2013, at 11:58 AM, Gus Correa <gus_at_[hidden]> wrote:
>>
>> Hi Ola, Ralph
>>
>> I may be wrong, but I'd guess launching the two solvers
>> in MPMD/MIMD mode would work smoothly with the torque PBS_NODEFILE,
>> in a *single* Torque job.
>> If I understood Ola right, that is what he wants.
>>
>> Say, something like this (for one 32-core node):
>>
>> #PBS -l nodes=1:ppn=32
>> ...
>> mpiexec -np 8 ./solver1 : -np 24 ./solver2
>>
>> I am assuming the two executables never talk to each other, right?
>> They solve the same problem with different methods, in a completely
>> independent and "embarrassingly parallel" fashion, and could run
>> concurrently.
>>
>> Is that right?
>> Or did I misunderstand Ola's description, and they work in a staggered sequence to each other?
>> [first s1, then s2, then s1 again, then s2 once more...]
>> I am a bit confused by Ola's use of the words "loosely coupled" in his description, which might indicate cooperation to solve the same problem,
>> rather than independent work on two instances of the same problem.
>>
>> Ralph: Does the MPI model assume that MPMD/MIMD executables
>> have to necessarily communicate with each other,
>> or perhaps share a common MPI_COMM_WORLD?
>> [I guess not.]
>>
>> Anyway, just a guess,
>> Gus Correa
>>
>>> On 11/27/2013 10:23 AM, Ralph Castain wrote:
>>> Are you wanting to run the solvers on different nodes within the
>>> allocation? Or on different cores across all nodes?
>>>
>>> For different nodes, you can just use -host to specify which nodes you
>>> want that specific mpirun to use, or a hostfile should also be fine. The
>>> FAQ's comment was aimed at people who were giving us the PBS_NODEFILE as
>>> the hostfile - which could confuse older versions of OMPI into using the
>>> rsh launcher instead of Torque. Remember that we have the relative node
>>> syntax so you don't actually have to name the nodes - helps if you want
>>> to execute batch scripts and won't know the node names in advance.
>>>
>>> For different cores across all nodes, you would need to use some binding
>>> trickery that may not be in the 1.4 series, so you might need to update
>>> to the 1.6 series. You have two options: (a) have Torque bind your
>>> mpirun to specific cores (I believe it can do that), or (b) use
>>> --slot-list to specify which cores that particular mpirun is to use. You
>>> can then separate the two solvers but still run on all the nodes, if
>>> that is of concern.
>>>
>>> HTH
>>> Ralph
>>>
>>>
>>>
>>> On Wed, Nov 27, 2013 at 6:10 AM, <Ola.Widlund_at_[hidden]
>>> <mailto:Ola.Widlund_at_[hidden]>> wrote:
>>>
>>> Hi,
>>>
>>> We have an in-house application where we run two solvers in a
>>> loosely coupled manner: The first solver runs a timestep, then the
>>> second solver does work on the same timestep, etc. As the two
>>> solvers never execute at the same time, we would like to run the two
>>> solvers in the same allocation (launching mpirun once for each of
>>> them). RAM is not an issue, so there should not be any risk of
>>> excessive swapping degrading performance.
>>>
>>> We use openmpi-1.4.5 compiled with torque integration. The torque
>>> integration means we do not give a hostfile to mpirun, it will
>>> itself query torque for the allocation info.
>>>
>>> Question:
>>>
>>> Can we force one of the solvers to run in a *subset* of the full
>>> allocation? How do we do that? I read in the FAQ that providing a
>>> hostfile to mpirun in this case (when it's not needed due to torque
>>> integration) would cause a lot of problems...
>>>
>>> Thanks in advance,
>>>
>>> Ola
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users