I am trying to map MPI processes to sockets in a somewhat compacted pattern and I am wondering the best way to do it.
Say there are 2 sockets (0 and 1) and each processor has 4 cores (0,1,2,3) and I have 4 MPI processes, each of which will use 2 OpenMP processes.
I've re-ordered my parallel work such that pairs of ranks (0,1 and 2,3) communicate more with each other than with other ranks. Thus I think the best mapping would be:
RANK SOCKET CORE
0 0 0
1 0 2
2 1 0
3 1 2
My understanding is that --bysocket --bind-to-socket will give me ranks 0 and 2 on socket 0 and ranks 1 and 3 on socket 1, not what I want.
It looks like --cpus-per-proc might be what I want, i.e. seems like I might give the value 2. But it was unclear to me whether I would also need to give --bysocket and the FAQ suggests this combination is untested.
May be a rankfile is what I need?
I would appreciate some advice on the easiest way to get this mapping.