Hi,
   Thanks for the reply,
A few Additional questions,
1. Does OpenMPI has the optimisations required to ensure that when send/recv is called between 2 ranks on the same node, the shared memory kind of methods should be used?
2.  If a programmer wants to implement such a logic(optimisations for local send/recv) , what part of the code should he modify from the openMPI stack ?
 
    regards,
-Chev