I take it this is using OMPI 1.2.x? If so, there really isn't a way to
do this in that series.
If they are using 1.3 (in some pre-release form), then there are two
1. they could use the sequential mapper by specifying "-mca rmaps
seq". This mapper takes a hostfile and maps one process to each entry,
in rank order. So they could specify that we only map to half of the
actual number of cores on a particular node
2. they could use the rank_file mapper that allows you to specify what
cores are to be used by what rank. I am less familiar with this option
and there isn't a lot of documentation on how to use it - but you may
have to provide a fairly comprehensive map file since your nodes are
not all the same.
I have been asked by some other folks to provide a mapping option "--
stride x" that would cause the default round-robin mapper to step
across the specified number of slots. So a stride of 2 would
automatically cause byslot mapping to increment by 2 instead of the
current stride of 1. I doubt that will be in 1.3.0, but it will show
up in later releases.
On Oct 25, 2008, at 3:36 PM, Brock Palen wrote:
> We have a user with a code that uses threaded solvers inside each
> MPI rank. They would like to run two threads per process.
> The question is how to launch this? The default -byslot puts all
> the processes on the first sets of cpus not leaving any cpus for the
> second thread for each process. And half the cpus are wasted.
> The -bynode option works in theory, if all our nodes had the same
> number of core (they do not).
> So right now the user did:
> #PBS -l nodes=22:ppn=2
> export OMP_NUM_THREADS=2
> mpirun -np 22 app
> Which made me aware of the problem.
> How can I basically tell OMPI that a 'slot' is two cores on the
> same machine? This needs to work inside out torque based queueing
> Sorry If I was not clear about my goal.
> Brock Palen
> Center for Advanced Computing
> users mailing list