Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi+torque: How run job in a subset of the allocation?
From: Gus Correa (gus_at_[hidden])
Date: 2013-11-27 13:58:14


Hi Ola, Ralph

I may be wrong, but I'd guess launching the two solvers
in MPMD/MIMD mode would work smoothly with the torque PBS_NODEFILE,
in a *single* Torque job.
If I understood Ola right, that is what he wants.

Say, something like this (for one 32-core node):

#PBS -l nodes=1:ppn=32
...
mpiexec -np 8 ./solver1 : -np 24 ./solver2

I am assuming the two executables never talk to each other, right?
They solve the same problem with different methods, in a completely
independent and "embarrassingly parallel" fashion, and could run
concurrently.

Is that right?
Or did I misunderstand Ola's description, and they work in a staggered
sequence to each other?
[first s1, then s2, then s1 again, then s2 once more...]
I am a bit confused by Ola's use of the words "loosely coupled" in his
description, which might indicate cooperation to solve the same problem,
rather than independent work on two instances of the same problem.

Ralph: Does the MPI model assume that MPMD/MIMD executables
have to necessarily communicate with each other,
or perhaps share a common MPI_COMM_WORLD?
[I guess not.]

Anyway, just a guess,
Gus Correa

On 11/27/2013 10:23 AM, Ralph Castain wrote:
> Are you wanting to run the solvers on different nodes within the
> allocation? Or on different cores across all nodes?
>
> For different nodes, you can just use -host to specify which nodes you
> want that specific mpirun to use, or a hostfile should also be fine. The
> FAQ's comment was aimed at people who were giving us the PBS_NODEFILE as
> the hostfile - which could confuse older versions of OMPI into using the
> rsh launcher instead of Torque. Remember that we have the relative node
> syntax so you don't actually have to name the nodes - helps if you want
> to execute batch scripts and won't know the node names in advance.
>
> For different cores across all nodes, you would need to use some binding
> trickery that may not be in the 1.4 series, so you might need to update
> to the 1.6 series. You have two options: (a) have Torque bind your
> mpirun to specific cores (I believe it can do that), or (b) use
> --slot-list to specify which cores that particular mpirun is to use. You
> can then separate the two solvers but still run on all the nodes, if
> that is of concern.
>
> HTH
> Ralph
>
>
>
> On Wed, Nov 27, 2013 at 6:10 AM, <Ola.Widlund_at_[hidden]
> <mailto:Ola.Widlund_at_[hidden]>> wrote:
>
> Hi,
>
> We have an in-house application where we run two solvers in a
> loosely coupled manner: The first solver runs a timestep, then the
> second solver does work on the same timestep, etc. As the two
> solvers never execute at the same time, we would like to run the two
> solvers in the same allocation (launching mpirun once for each of
> them). RAM is not an issue, so there should not be any risk of
> excessive swapping degrading performance.
>
> We use openmpi-1.4.5 compiled with torque integration. The torque
> integration means we do not give a hostfile to mpirun, it will
> itself query torque for the allocation info.
>
> Question:
>
> Can we force one of the solvers to run in a *subset* of the full
> allocation? How do we do that? I read in the FAQ that providing a
> hostfile to mpirun in this case (when it's not needed due to torque
> integration) would cause a lot of problems...
>
> Thanks in advance,
>
> Ola
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users