I believe I answered much of this the other day - did it get lost in
As for using TM with a hostfile - this is an unfortunately bug in the
1.2 series. You can't - you'll have to move to 1.3 to do so. When you
do, note the changed handling of hostfiles as specified on the wiki:
> I take it this is using OMPI 1.2.x? If so, there really isn't a way
> to do this in that series.
> If they are using 1.3 (in some pre-release form), then there are two
> 1. they could use the sequential mapper by specifying "-mca rmaps
> seq". This mapper takes a hostfile and maps one process to each
> entry, in rank order. So they could specify that we only map to half
> of the actual number of cores on a particular node
> 2. they could use the rank_file mapper that allows you to specify
> what cores are to be used by what rank. I am less familiar with this
> option and there isn't a lot of documentation on how to use it - but
> you may have to provide a fairly comprehensive map file since your
> nodes are not all the same.
> I have been asked by some other folks to provide a mapping option "--
> stride x" that would cause the default round-robin mapper to step
> across the specified number of slots. So a stride of 2 would
> automatically cause byslot mapping to increment by 2 instead of the
> current stride of 1. I doubt that will be in 1.3.0, but it will show
> up in later releases.
On Oct 30, 2008, at 7:46 AM, Brock Palen wrote:
> Any thoughts on this?
> We are looking writing a script that parses $PBS_NODEFILE to create
> a machinefile and using -machinefile
> When we do that though we have to disable tm to avoid an error (-mca
> pls ^tm) this is far from preferable.
> Any ideas to tell mpirun to only launch on half the cpus given to it
> by PBS, but each cpu must have adjacent to it another cpu in the
> same node?
> Brock Palen
> Center for Advanced Computing
> On Oct 25, 2008, at 5:36 PM, Brock Palen wrote:
>> We have a user with a code that uses threaded solvers inside each
>> MPI rank. They would like to run two threads per process.
>> The question is how to launch this? The default -byslot puts all
>> the processes on the first sets of cpus not leaving any cpus for
>> the second thread for each process. And half the cpus are wasted.
>> The -bynode option works in theory, if all our nodes had the same
>> number of core (they do not).
>> So right now the user did:
>> #PBS -l nodes=22:ppn=2
>> export OMP_NUM_THREADS=2
>> mpirun -np 22 app
>> Which made me aware of the problem.
>> How can I basically tell OMPI that a 'slot' is two cores on the
>> same machine? This needs to work inside out torque based
>> queueing system.
>> Sorry If I was not clear about my goal.
>> Brock Palen
>> Center for Advanced Computing
>> users mailing list
> users mailing list