Thanks for the reply,
A few Additional questions,
1. Does OpenMPI has the optimisations required to ensure that when send/recv
is called between 2 ranks on the same node, the shared memory kind of
methods should be used?
2. If a programmer wants to implement such a logic(optimisations for local
send/recv) , what part of the code should he modify from the openMPI stack ?