Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Open-MX vs OMPI 1.3 using MX internal symbols
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-01-26 09:09:55


There are several reasons these calls are there. Please read further.

On Jan 26, 2009, at 02:19 , Brice Goglin wrote:

> Hello,
>
> I am testing OpenMPI 1.3 over Open-MX. OpenMPI 1.2 works well but 1.3
> does not load. This is caused by OMPI MX components now using some MX
> internal symbols (mx_open_board, mx__get_mapper_state and
> mx__regcache_clean). This looks like an ugly hack to me :) Why don't
> you
> talk to Myricom about adding a proper interface in MX?

mx__regcache_clean is something that was added inside Open MPI by the
Myricom people. So, I guess they consider it as not ugly enough.

mx_open_board is there so we can detect as quick as possible if the
Myricom hardware is available on the machine or there are just
libraries laying around. There is no other way to do so, except
initializing the device, and then we are stuck with the current
configuration (as we cannot modify the MX behavior at runtime).

mx__get_mapper_state is there to detect multiple links and compute the
routes. There are two reasons for this:
- clusters with multiple MX interfaces. We want to have a one to one
mapping between the cards, and not to rely on the mapper to do the
right thing.
- clusters of clusters: we have to be able to figure out that even if
two computers have MX they will not necessarily be able to communicate
over it if they belong to 2 distinct clusters.

> Building OMPI directly on Open-MX will disable the mapper_state stuff
> because of missing MX internal headers. But, Open-MX is ABI compatible
> with MX.

Unfortunately we access more than just the simple interface propose in
myriexpress.h. However, Open MPI can be build without these
dependencies if the correct defines are not set. I guess this will
work in most common cases (not grids as an example).

> So building on MX and running on Open-MX requires the addition
> of these symbols in Open-MX anyway. Before I do so, I'd like to know
> why
> you actually need these symbols. Are mx_open_board and
> mx__get_mapper_state used to get a "fabric identifier" in the
> context of
> multi-clusters/grids?

Yes, you have half the answer.

> If so, assuming it will ever matter for Open-MX,
> is it ok to just have mx__get_mapper_state report the MAC address of
> the
> my mapper node and nothing else in the mapper_state structure?

Yes, the only thing we need is an unique identifier per cluster. We
use the last 6 digits from the mapper MAC address.

> Then, I guess mx__regcache_clean is called when the OMPI free hook
> wants to
> clear the MX regcache, right?

As we don't really have access to the MX memory registration (which is
good), we need sometimes to force the cleanup. This is why we're using
this function.

> Also, is there any plan to use any other MX internal symbols in the
> future releases?

Depend on the bugs we're running into. So far so good, but there is no
way to guarantee we will not need additional symbols.

> By the way, is there a way to get more details from OMPI when it fails
> to load a component because of missing symbols like this?
> LD_DEBUG=verbose isn't very convenient :)

mca_component_show_load_errors is what you need there. Set it to
something high depending on the level of verbosity you want to have.

   george.

>
>
> thanks,
> Brice Goglin
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel