Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-10-06 12:22:59


Sorry for the delay in replying -- you sent this right before many of
us left for Europe for a conference and subsequent OMPI engineering
meetings. I'm just now getting to much of the list mail that has
piled up since then...

What you describe is darn weird. :-(

I know that this is likely to be an expected answer, but: is there
any chance you can try upgrading to a more recent version of OMPI?
Also, this may be a dumb question, but just to be sure: did you run
ompi_info and ensure that you have an openib BTL component installed?

FWIW, we do not yet have a "positive ACK" way to know which networks
you're using (I have an open ticket about it for v1.3...), meaning
that OMPI doesn't show which networks you're using. It will,
however, give you a negative ACK if you're *not* using a high-speed
network that OMPI was configured for. Specifically, if you have an
openib BTL installed and it is not used because it can't find any
active HCA ports, then the openib BTL will complain.

You can also force the use of specific networks with the "btl" MCA
parameter, such as:

     mpirun --mca btl openib,self ...

Then, if openib is not able to be used, the run will likely barf
because it won't be able to establish MPI communications.

On Sep 21, 2007, at 1:20 AM, Troy Telford wrote:

> I'm running Intel's IMB benchmark over an InfiniBand cluster;
> though other
> benchmarks that Open MPI has done fine in the past are also performing
> poorly.
>
> The cluster has DDR IB, and the fabric isn't seeing the kind of
> symbol errors
> that indicate a bad fabric; (non-mpi) bandwidth tests over the IB
> fabric are
> in the expected range.
>
> When the number of processes in IMB becomes greater than one node
> can handle,
> the bandwidth reported by IMB's 'Sendrecv', and 'Exchange' test
> drops from
> 1.9 GB/sec (4 process - or one process per core in the first node)
> to 20
> MB/sec over 8 processes (and two nodes).
>
> In other words, when we move from using shared memory and 'self' to
> an actual
> network interface, IMB reports _really_ lousy performance, lower by
> 30x than
> I've recorded for SDR IB. (For the same test with a different
> cluster using
> SDR IB & Open MPI, I've clocked ~650 MB/sec - quite a bit higher
> than 20
> MB/sec)
>
> On this cluster, however IMB's reported bandwidth remains the same
> from 2-36
> nodes, over DDR InfiniBand: ~20 MB/sec
>
> We've used the OFED 1.1.1 and 1.2 driver releases so far.
>
> the command line is pretty simple:
> mpirun -np 128 -machinefile <foo> -mca btl openib,sm,self ./IMB-MPI1
>
> As far as I'm aware, our command-line excludes TCP/IP (and hence
> ethernet)
> from being used; yet we're seeing speeds that are far below the
> abilities of
> InfiniBand.
>
> I've used Open MPI quite a bit, since before the 1.0 days; I've
> been dealing
> with IB for even longer. (And the guy I'm writing in behalf of has
> used Open
> MPI on large IB systems as well).
>
> Even when we specify that only the 'openib' module be used, we are
> seeing 20
> MB/sec.
>
> Oddly enough, the management ethernet is 10/100, and 20 MB/sec
> seems 'in the
> same ballpark' as would be reported by IMB when 10/100 ethernet is
> used.
>
> We aren't receiving any error messages from Open MPI. (As normally
> you would
> when part of the fabric is down.)
>
> So we're left a bit stumped: We're getting speeds you would expect
> from 100
> Mbit ethernet, but we're specifying the IB interface, and not
> receiving any
> errors from Open MPI. There isn't an unusual number of symbol
> errors (ie.
> errors are low, not increasing, etc.) on the IB fabric, the SM is
> up and
> operational.
>
> One more tidbit that is probably insignificant, but I'll mention
> anyway: We
> are running IBM's GPFS via IPoIB, so there is a little bit of IB
> traffic from
> GPFS - which is also a configuration we've used with no problems in
> the past.
>
> Any ideas on what I can do to verify that OpenMPI is in fact using
> the IB
> fabric?
> --
> Troy Telford
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems