As I said, the degree of impact depends on the messaging pattern. If rank A typically sends/recvs with rank A+!, then you won't see much difference. However, if rank A typically sends/recvs with rank N-A, where N=#ranks in job, then you'll see a very large difference.

You might try simply changing the mapping pattern - e.g., add -bynode to your cmd line. This would make it run faster if it followed the latter example.


On Nov 2, 2013, at 12:40 AM, San B <forum.san@gmail.com> wrote:

Yes MM...  But here a single node has 16cores not 64 cores. 
The 1st two jobs were with OMPI-1.4.5.
      16 cores of single node - 3692.403
      16 cores on two nodes (8 cores per node) - 12338.809

The 1st two jobs were with OMPI-1.6.5.
      16 cores of single node - 3547.879
      16 cores on two nodes (8 cores per node) - 5527.320 

      As others said, due to shared memory communication the single node job is running faster, but I was expecting a slight difference between 1 & 2 nodes - which is taking 60% more time here.



On Thu, Oct 31, 2013 at 8:19 PM, Ralph Castain <rhc@open-mpi.org> wrote:
Yes, though the degree of impact obviously depends on the messaging pattern of the app. 

On Oct 31, 2013, at 2:50 AM, MM <finjulhich@gmail.com> wrote:

Of course, by this you mean, with the same total number of nodes, for e.g. 64 process on 1 node using shared mem, vs 64 processes spread over 2 nodes (32 each for e.g.)?


On 29 October 2013 14:37, Ralph Castain <rhc@open-mpi.org> wrote:
As someone previously noted, apps will always run slower on multiple nodes vs everything on a single node due to the shared memory vs IB differences. Nothing you can do about that one.
_______________________________________________


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users