Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] totalview and message queue, empty windows
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2010-02-02 10:17:28


Hi DevL, what compiler and options are you using to build OMPI. I am
seeing something similar (Warning messages and the Message Queue window
having bizarre values) when building with the Pathscale compiler but I
don't see this with SunStudio, gcc, Intel or PGI.

However, I do see pending receives though there is no specific
information on the actual communicators (name, size, rank). It looks
like some of the type symbols are not being kept in the .so.

--td
>
> On 28 Jan 2010, at 21:04, DevL wrote:
>
> > Hi,
> > it looks that there is an issue with totalview and
> > openmpi
> >
> > message queue just empty and output shows:
> > WARNING: Field mtc_ndims_or_nnodes of type mca_topo_base_comm_1_0_0_t
> not found!
> > WARNING: Field mtc_dims_or_index of type mca_topo_base_comm_1_0_0_t
> not found!
> > WARNING: Field mtc_periods_or_edges of type
> mca_topo_base_comm_1_0_0_t not found!
> > WARNING: Field mtc_reorder of type mca_topo_base_comm_1_0_0_t not found!
> > WARNING: Field mtc_ndims_or_nnodes of type mca_topo_base_comm_1_0_0_t
> not found!
> > WARNING: Field mtc_dims_or_index of type mca_topo_base_comm_1_0_0_t
> not found!
> > WARNING: Field mtc_periods_or_edges of type
> mca_topo_base_comm_1_0_0_t not found!
> > WARNING: Field mtc_reorder of type mca_topo_base_comm_1_0_0_t not found!
> > [
> > (Open MPI) 1.4a1r21427
> > and
> > totalview.8.7.0-7/linux-x86-64
> >
> > is this a known issue?
>
> I've not seen it before but I do know of problems with the
> mca_topo_base_comm_1_0_0_t type and the debugger plugin (which
> TotalView is calling).
>
> > and if so - how to overcome it ?
>
> I'm afraid I don't know.
>
> The Debugger plugin looks for the type (it's a struct) and then looks
> for some offsets within the struct. I've seen it fail to find the
> struct completely whereas this error appears to claim it can't find
> the entries within the struct. Perhaps the difference is that I found
> the problem using padb and you are using TotalView.
>
> You could try the attached patch which allows the code to continue if
> the type isn't found, if you are seeing a different symptom of the
> same error then it might work for you.
>
>
> As to the cause I've no idea, I've only seen it once or twice in the
> last six months and not on installations I've installed myself, I've
> never been able to find out the underlying cause and why some machines
> report this error and some don't.
>
> Ashley,
>
> --
> Ashley Pittman, Bath, UK.
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
>