Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Common initialization code for IB.
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-01-03 09:27:14


On Jan 3, 2008, at 9:03 AM, Gleb Natapov wrote:

> In Paris we've talked about putting HCA discovery and
> initialization code
> outside of openib BTL so other components that want to use IB will
> be able
> to share common code, data and registration cache. Other components
> I am
> thinking about are ofud and multicast collectives. I started to look
> at
> this and I have a couple of problems with this approach. Currently
> openib
> BTL has if_include/if_exclude parameters to control which HCAs
> should be
> used. Should we make those parameters global and initialize only HCAs
> that are not exulted by those filters, or should we initialize all
> HCAs
> and each component will have its own include/exclude filters?

Good question. I think the optimal solution would be to have one set
of globals (common_of_if_include or somesuch?) with optional per-
component overrides. E.g., tell all of OMPI to if_include mthca0, but
then tell just the multicast collectives to if_include ipath1 (for
whatever reason). This would allow fine-grained selection of which
communication types use which devices.

To minimize the repetition of code, this could be effected by having a
function in the common/of area that does all the work for the include/
exclude behavior. You can simply call it with any of the MCA param
values, such as: common_of_if_in/exclude, btl_openib_if_in/exclude,
coll_of_if_in/exclude, ... and it can return a list of ports to use.

> Another
> problem is how multicast collective knows that all processes in a
> communicator are reachable via the same network, do we have a
> mechanism
> in ompi to check this?

Good question.

Perhaps the common_of stuff could hang some data off the ompi_proc_t
that can be read by any of-like component (btl openib, coll of
multicast, etc.)...? This could contain a subnet ID, or perhaps a
reachable flag, or somesuch.

-- 
Jeff Squyres
Cisco Systems