Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-11-21 08:00:05


Although George fixed the MX-abort error, let me clarify the rationale
here...

You are correct that at run-time, OMPI tries to load an run every
component that it finds. So if you have BTL components build for all
interconnects, OMPI will query each of them at run-time and try to use
them.

But right now, we do not have a way to show exactly which interconnects
and which networks are actually being used. Although this is a planned
feature, for 1.0 we compromised and decided that if any of the
low-latency/high-speed network components decided that they could not
be used, they would print out a warning message. This should cover
95+% of misconfiguration cases (e.g., the user meant to be using IB,
but something went wrong and OMPI failed over to TCP).

These warnings will likely be removed (or, more specifically, only
displayed if requested) once we include the feature to display which
BTL components/networks are being used at run-time.

On Nov 17, 2005, at 1:00 PM, Troy Telford wrote:

> I wouldn't be suprised if this is simply an issue of configuration:
>
> In my test cluster, I've got Myrinet, InfiniBand, and Gigabit Ethernet
> support.
>
> My understanding is that when you use 'mpirun' without specifying an
> MCA
> (including systemwide and/or user configurations in ~/.openmpi) ,
> OpenMPI
> will simply attempt to use the modules that it can use.
>
> This mostly works; however I have found a bug in its mechanism. (By no
> means a showstopper, but mildly annoying).
>
> I have both the MX and GM BTL components installed; only one set of
> drivers can be loaded for the Myrinet hardware at a given time. If I
> have
> the 'MX' drivers installed, mpirun will flash a message to stderr about
> the GM component not being able to find hardware
> ***
> -----------------------------------------------------------------------
> ---
> [0,1,0]: Myrinet/GM on host n61 was unable to find any NICs.
> Another transport will be used instead, although this may result in
> lower performance.
> -----------------------------------------------------------------------
> ---
> ***
> -- but OpenMPI simply (for lack of a better phrase) 'fails over' and
> uses
> MX. And everything is happy.
>
> However, if I have the 'GM' drivers installed, I recieve a message that
> the MX component couldn't find Myrinet hardware, and OpenMPI aborts.
> ***
> MX:n62:mx_init:querying driver for version info:error 1:Failure
> querying
> MX driver(wrong driver?)
> last syscall error=2:No such file or directory
> MX:Aborting
> ***
>
> And if /neither/ MX nor GM is loaded (leaving me with Gigabit
> Ethernet), I
> receive both error messages (and it exits).
>
> Removing the MX components (I package it all up in RPM's; makes it
> easier
> to manage) will then allow OpenMPI to 'failover' to TCP. (Producing
> the
> same warning as when the GM component 'fails over' to MX).
>
> The openib and mvapi components seem to behave properly, failing over
> to a
> usable interface and continuing execution.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/