As I said, the degree of impact depends on the messaging pattern. If rank A typically sends/recvs with rank A+!, then you won't see much difference. However, if rank A typically sends/recvs with rank N-A, where N=#ranks in job, then you'll see a very large difference.
You might try simply changing the mapping pattern - e.g., add -bynode to your cmd line. This would make it run faster if it followed the latter example.
Yes MM... But here a single node has 16cores not 64 cores.
The 1st two jobs were with OMPI-1.4.5.
16 cores of single node - 3692.403
16 cores on two nodes (8 cores per node) - 12338.809
The 1st two jobs were with OMPI-1.6.5.
16 cores of single node - 3547.879
16 cores on two nodes (8 cores per node) - 5527.320
As others said, due to shared memory communication the single node job is running faster, but I was expecting a slight difference between 1 & 2 nodes - which is taking 60% more time here.
users mailing list