Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] IBCM error
From: Ralph H. Castain (rhc_at_[hidden])
Date: 2008-07-14 15:04:13


I've been quietly following this discussion, but now feel a need to jump
in here. I really must disagree with the idea of building either IBCM or
RDMACM support by default. Neither of these has been proven to reliably
work, or to be advantageous. Our own experiences in testing them have been
slightly negative at best. When the did work, they were slower, didn't
scale well, and unreliable.

I'm not trying to rain on anyone's parade. These are worthwhile in the
long term. However, they clearly need further work to be "ready for prime
time".

Accordingly, I would recommend that they -only- be built if specifically
requested. Remember, most of our users just build blindly. It makes no
sense to have them build support for what can only be classed as an
experimental capability at this time.

Also, note that the OFED install is less-than-reliable wrt IBCM and
RDMACM. We have spent considerable time chasing down installation problems
that allowed the system to build, but then caused it to crash-and-burn if
we attempted to use it. We have gained experience at knowing when/where to
look now, but that doesn't lessen the reputation impact OMPI is getting as
a "buggy, cantankerous beast" according to our sys admins.

Not a reputation we should be encouraging.

Turning this off by default allows those more adventurous souls to explore
this capability, while letting our production-oriented customers install
and run in peace.

Ralph

> On Jul 14, 2008, at 9:21 AM, Pavel Shamis (Pasha) wrote:
>
>>> Should we not even build support for it?
>> I think IBCM CPC build should be enabled by default. The IBCM is
>> supplied with OFED so it should not be any problem during install.
>
> Ok. But remember that there are at least some OS's where /dev/ucm* do
> *not* get created by default for some unknown reason (even though IBCM
> is installed).
>
>>> PRO: don't even allow the possibility of running with it, because
>>> we know that there are issues with the ibcm userspace library
>>> (i.e., reduce problem reports from users)
>>>
>>> PRO: users don't have to have libibcm installed on compute nodes
>>> (we've actually gotten some complaints about this)
>> We got compliances only for case when ompi was build on platform
>> with IBCM and after it was run on platform without IBCM. Also we
>> did not have option to disable
>> the ibcm during compilation. So actually it was no way to install
>> OMPI on compute node. We added the option and the problem was
>> resolved.
>> In most cases the OFED install is the same on all nodes and it
>> should not be any problem to build IBCM support by default.
>
>
> Ok, sounds good.
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>