I am interested in running a handful of mpirun jobs in a single allocation. For example, my allocation is 2 nodes with 8 cores on each node (total of 16 cores). I want to run 2 five-rank jobs and 3 two-rank jobs simultaneously (total of 16 cores) and w/o oversubscribing any single core. I am currently using '--mca mpi_paffinity_alone 0' and that appears to work, but it looks like recent versions (1.4+) of OpenMPI have better controls for processor affinity. Is there a better choice of flags for my situation?
The bigger picture is that I am running 400-600 small unit tests in a single Torque allocation. My testing framework is aware of total available cores and the cores required per test so that the total simultaneous core count never exceeds my allocation. However, if I use any option other than '--mca mpi_paffinity_alone 0', mpirun will place multiple jobs on the same cores and leave many cores with nothing to do. Is there a good description for how mpirun assigns jobs to cores - particularly in the situation where there are multiple mpirun jobs running on the same allocation?