Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Optimal mapping/binding when threads are used?
From: Saliya Ekanayake (esaliya_at_[hidden])
Date: 2014-04-10 20:59:24


I am evaluating the performance of a clustering program written in Java
with MPI+threads and would like to get some insight in solving a peculiar
case. I've attached a performance graph to explain this.

In essence the tests were carried out as TxPxN, where T is threads per
process, P is processes per node, and N is number of nodes. I noticed an
inefficiency with Tx*1*xN cases in general (tall bars in graph).

To elaborate a bit further,
1. each node has 2 sockets with 4 cores each (totaling 8 cores)
2. used OpenMPI 1.7.5rc5 (later tested with 1.8 and observed the same)
3. with options
     A.) --map-by node:PE=4 and --bind-to core
     B.) --map-by node:PE=8 and --bind-to-core
     C.) --map-by socket and --bind-to none

Timing of A,B,C came out as A < B < C, so used results from option A for Tx
*1*xN in the graph.

Could you please give some suggestion that may help to speed up these Tx*1*xN
cases? Also, I expected B to perform better than A as threads could utilize
all 8 cores, but it wasn't the case.

Thank you,

[image: Inline image 1]

Saliya Ekanayake esaliya_at_[hidden]
Cell 812-391-4914 Home 812-961-6383