Am 15.11.2010 um 13:13 schrieb Chris Jewell:
> Okay so I tried what you suggested. You essentially get the requested number of bound cores on each execution node, so if I use
> $ qsub -pe openmpi 8 -binding linear:2 <myscript.com>
> then I get 2 bound cores per node, irrespective of the number of slots (and hence parallel) processes allocated by GE. This is irrespective of which setting I use for the allocation_rule.
but it should work fine with an "allocation_rule 2" then.
> My aim with this was to deal with badly behaved multithreaded algorithms
Yep, this causes sometimes the overloading of a machine. When I know that I want to compile a parallel Open MPI application, I use non-threaded versions of ATLAS, MKL or other libraries.
> which end up spreading across more cores on an execution node than the number of GE-allocated slots (thereby interfering with other GE scheduled tasks running on the same exec node). By binding a process to one or more cores, one can "box in" processes and prevent them from spawning erroneous sub-processes and threads. Unfortunately, the above solution sets the same core binding for each execution node to be the same.
>> From exploring the software (both OpenMPI and GE) further, I have two comments:
> 1) The core binding feature in GE appears to apply the requested core-binding topology to every execution node involved in a parallel job, rather than assuming that the topology requested is *per parallel process*. So, if I request 'qsub -pe mpi 8 -binding linear:1 <myscript.com>' with the intention of getting each of the 8 parallel processes to be bound to 1 core, I actually get all processes associated with the job_id on one exec node bound to 1 core. Oops!
> 2) OpenMPI has its own core-binding feature (-mca mpi_paffinity_alone 1) which works well to bind each parallel process to one processor. Unfortunately, the binding framework (hwloc) is different to that which GE uses (PLPA), resulting in binding overlaps between GE-bound tasks (eg serial and smp jobs) and OpenMPI-bound processes (ie my mpi jobs). Again, oops ;-)
> If, indeed, it is not possible currently to implement this type of core-binding in tightly integrated OpenMPI/GE, then a solution might lie in a custom script run in the parallel environment's 'start proc args'. This script would have to find out which slots are allocated where on the cluster, and write an OpenMPI rankfile.
Exactly this should work.
If you use "binding_instance" "pe" and reformat the information in the $PE_HOSTFILE to a "rankfile", it should work to get the desired allocation. Maybe you can share the script with this list once you got it working.