On Jan 3, 2008, at 9:03 AM, Gleb Natapov wrote:
> In Paris we've talked about putting HCA discovery and
> initialization code
> outside of openib BTL so other components that want to use IB will
> be able
> to share common code, data and registration cache. Other components
> I am
> thinking about are ofud and multicast collectives. I started to look
> at
> this and I have a couple of problems with this approach. Currently
> openib
> BTL has if_include/if_exclude parameters to control which HCAs
> should be
> used. Should we make those parameters global and initialize only HCAs
> that are not exulted by those filters, or should we initialize all
> HCAs
> and each component will have its own include/exclude filters?
Good question. I think the optimal solution would be to have one set
of globals (common_of_if_include or somesuch?) with optional per-
component overrides. E.g., tell all of OMPI to if_include mthca0, but
then tell just the multicast collectives to if_include ipath1 (for
whatever reason). This would allow fine-grained selection of which
communication types use which devices.
To minimize the repetition of code, this could be effected by having a
function in the common/of area that does all the work for the include/
exclude behavior. You can simply call it with any of the MCA param
values, such as: common_of_if_in/exclude, btl_openib_if_in/exclude,
coll_of_if_in/exclude, ... and it can return a list of ports to use.
> Another
> problem is how multicast collective knows that all processes in a
> communicator are reachable via the same network, do we have a
> mechanism
> in ompi to check this?
Good question.
Perhaps the common_of stuff could hang some data off the ompi_proc_t
that can be read by any of-like component (btl openib, coll of
multicast, etc.)...? This could contain a subnet ID, or perhaps a
reachable flag, or somesuch.
--
Jeff Squyres
Cisco Systems
|