Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi+torque: How run job in a subset of the allocation?
From: Ralph Castain (rhc.openmpi_at_[hidden])
Date: 2013-11-27 16:02:59


I'm afraid the two solvers would be in the same comm_world if launched that way

Sent from my iPhone

> On Nov 27, 2013, at 11:58 AM, Gus Correa <gus_at_[hidden]> wrote:
>
> Hi Ola, Ralph
>
> I may be wrong, but I'd guess launching the two solvers
> in MPMD/MIMD mode would work smoothly with the torque PBS_NODEFILE,
> in a *single* Torque job.
> If I understood Ola right, that is what he wants.
>
> Say, something like this (for one 32-core node):
>
> #PBS -l nodes=1:ppn=32
> ...
> mpiexec -np 8 ./solver1 : -np 24 ./solver2
>
> I am assuming the two executables never talk to each other, right?
> They solve the same problem with different methods, in a completely
> independent and "embarrassingly parallel" fashion, and could run
> concurrently.
>
> Is that right?
> Or did I misunderstand Ola's description, and they work in a staggered sequence to each other?
> [first s1, then s2, then s1 again, then s2 once more...]
> I am a bit confused by Ola's use of the words "loosely coupled" in his description, which might indicate cooperation to solve the same problem,
> rather than independent work on two instances of the same problem.
>
> Ralph: Does the MPI model assume that MPMD/MIMD executables
> have to necessarily communicate with each other,
> or perhaps share a common MPI_COMM_WORLD?
> [I guess not.]
>
> Anyway, just a guess,
> Gus Correa
>
>> On 11/27/2013 10:23 AM, Ralph Castain wrote:
>> Are you wanting to run the solvers on different nodes within the
>> allocation? Or on different cores across all nodes?
>>
>> For different nodes, you can just use -host to specify which nodes you
>> want that specific mpirun to use, or a hostfile should also be fine. The
>> FAQ's comment was aimed at people who were giving us the PBS_NODEFILE as
>> the hostfile - which could confuse older versions of OMPI into using the
>> rsh launcher instead of Torque. Remember that we have the relative node
>> syntax so you don't actually have to name the nodes - helps if you want
>> to execute batch scripts and won't know the node names in advance.
>>
>> For different cores across all nodes, you would need to use some binding
>> trickery that may not be in the 1.4 series, so you might need to update
>> to the 1.6 series. You have two options: (a) have Torque bind your
>> mpirun to specific cores (I believe it can do that), or (b) use
>> --slot-list to specify which cores that particular mpirun is to use. You
>> can then separate the two solvers but still run on all the nodes, if
>> that is of concern.
>>
>> HTH
>> Ralph
>>
>>
>>
>> On Wed, Nov 27, 2013 at 6:10 AM, <Ola.Widlund_at_[hidden]
>> <mailto:Ola.Widlund_at_[hidden]>> wrote:
>>
>> Hi,
>>
>> We have an in-house application where we run two solvers in a
>> loosely coupled manner: The first solver runs a timestep, then the
>> second solver does work on the same timestep, etc. As the two
>> solvers never execute at the same time, we would like to run the two
>> solvers in the same allocation (launching mpirun once for each of
>> them). RAM is not an issue, so there should not be any risk of
>> excessive swapping degrading performance.
>>
>> We use openmpi-1.4.5 compiled with torque integration. The torque
>> integration means we do not give a hostfile to mpirun, it will
>> itself query torque for the allocation info.
>>
>> Question:
>>
>> Can we force one of the solvers to run in a *subset* of the full
>> allocation? How do we do that? I read in the FAQ that providing a
>> hostfile to mpirun in this case (when it's not needed due to torque
>> integration) would cause a lot of problems...
>>
>> Thanks in advance,
>>
>> Ola
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users