I am trying to map MPI processes to sockets in a somewhat compacted pattern and I am wondering the best way to do it.


Say there are 2 sockets (0 and 1) and each processor has 4 cores (0,1,2,3) and I have 4 MPI processes, each of which will use 2 OpenMP processes.


I’ve re-ordered my parallel work such that pairs of ranks (0,1 and 2,3) communicate more with each other than with other ranks.  Thus I think the best mapping would be:



0              0              0

1              0              2

2              1              0

3              1              2


My understanding is that --bysocket --bind-to-socket will give me ranks 0 and 2 on socket 0 and ranks 1 and 3 on socket 1, not what I want.


It looks like --cpus-per-proc might be what I want, i.e. seems like I might give the value 2.  But it was unclear to me whether I would also need to give --bysocket and the FAQ suggests this combination is untested.


May be a rankfile is what I need?


I would appreciate some advice on the easiest way to get this mapping.