Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Strange TCP latency results on Amazon EC2
From: Rayson Ho (raysonlogin_at_[hidden])
Date: 2012-01-13 13:40:45

On Tue, Jan 10, 2012 at 10:02 AM, Roberto Rey <eros.83_at_[hidden]> wrote:
> I'm running some tests on EC2 cluster instances with 10 Gigabit Ethernet
> hardware and I'm getting strange latency results with Netpipe and OpenMPI.

- There are 3 types of instances that can use 10 GbE. Are you using
"cc1.4xlarge", "cc2.8xlarge", or "cg1.4xlarge"??

- Did you set up a placement group??

- Also, which AMI are you using??

> I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
> outperforms raw TCP performance for small messages (40us of difference).
> Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI doing any
> optimization in BTL TCP?

It is indeed interesting!

If we can run strace with timing (like strace -tt) and compare the
difference between NPmpi & NPtcp, then we can get a better idea on
what's happening.

It is possible that one is doing more busy polling than another,
and/or triggering Xen to handle things a bit differently. Also, we
should check the socket options, and also check the system call
latency to see if the network is really accountable for the extra 40us

> The results for OpenMPI aren't so good but we must take into account the
> network virtualization overhead under Xen

If you are running Cluster Compute Instances, then you are using HVM.
If things are setup properly (HVM & placement group), then you can
even get a Top500 computer on EC2... Amazon uses similar setups for
their TOP500 submission:


Open Grid Scheduler / Grid Engine

Scalable Grid Engine Support Program