Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OMPI 1.3 - PERUSE peruse_comm_spec_t peer Negative Value
From: Kiril Dichev (dichev_at_[hidden])
Date: 2009-03-25 13:29:21


Hi,

at least for the specific test program I used, the negative values for
the peer attribute disappeared after George's modifications in 20844.

One remark: after installation, I had to remove the '#include
"ompi_config.h"' line in the "include/peruse.h" header to get PERUSE
applications to compile. Otherwise I got a missing header error message
for ompi_config.h.

Regards,
Kiril

On Mon, 2009-03-23 at 16:34 -0400, George Bosilca wrote:
> You are absolutely right, the peer should never be set to -1 on any of
> the PERUSE callbacks. I checked the code this morning and figure out
> what was the problem. We report the peer and the tag attached to a
> request before setting the right values (some code moved around). I
> submitted a patch and created a "move request" to have this correction
> as soon as possible on one of our stable releases. The move request
> can be followed using our TRAC system and the following link (https://svn.open-mpi.org/trac/ompi/ticket/1845
> ). If you want to play with this change please update your Open MPI
> installation to a nightly build or a fresh checkout from the SVN with
> at least revision 20844 (a nightly including this change will be
> posted on our website tomorrow morning).
>
> Thanks,
> george.
>
> On Mar 23, 2009, at 13:23 , Samuel K. Gutierrez wrote:
>
> > Hi Kiril,
> >
> > Appreciate the quick response.
> >
> >> Hi Samuel,
> >>
> >> On Sat, 21 Mar 2009 18:18:54 -0600 (MDT)
> >> "Samuel K. Gutierrez" <samuel_at_[hidden]> wrote:
> >>> Hi All,
> >>>
> >>> I'm writing a simple profiling library which utilizes
> >>> PERUSE. My callback
> >>
> >> So am I :)
> >>
> >>> function counts communication events (see example code
> >>> below). I noticed
> >>> that in OMPI v1.3 spec->peer is sometimes a negative
> >>> value (OMPI v1.2.6
> >>> did not exhibit this behavior). I added some boundary
> >>> checks, but it
> >>> seems as if this is a bug? I hope I'm not missing
> >>> something...
> >>
> >> It took me quite some time to reproduce the error - I also
> >
> > Sorry about that - I should have provided more information.
> >
> >> got peer value "-1" for the Peruse peruse_comm_spec_t
> >> struct. I only managed to reproduce this with
> >> communication of a process with itself, which is an
> >> unusual scenario. Anyway, for all the tests I did, the
> >> error happened only when:
> >>
> >> -a process communicates with itself
> >> -the MPI receive call is made
> >> -the Peruse event "PERUSE_COMM_MSG_REMOVE_FROM_UNEX_Q" is
> >> triggered
> >
> > That's interesting... Nice work!
> >
> >>
> >>
> >> The file ompi/mca/pml/ob1/pml_ob1_recvreq.c seems to be
> >> the place where the above event is called with a wrong
> >> value of the peer attribute.
> >>
> >> I will let you know if I find something.
> >
> > I will also take a look.
> >
> >>
> >>
> >> Best regards,
> >> Kiril
> >>
> >>>
> >>> The peruse test provided in the OMPI v1.3 source
> >>> exhibits similar behavior:
> >>> mpirun -np 2 ./mpi_peruse | grep peer:-1
> >>>
> >>> int callback(peruse_event_h event_h, MPI_Aint unique_id,
> >>> peruse_comm_spec_t *spec, void *param) {
> >>> if (spec->peer == rank) {
> >>> return MPI_SUCCESS;
> >>> }
> >>> rrCounts[spec->peer]++;
> >>> return MPI_SUCCESS;
> >>> }
> >>>
> >>>
> >>> Any insight is greatly appreciated.
> >>>
> >>> Thanks,
> >>>
> >>> Samuel K. Gutierrez
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >>
> >
> > Appreciate the help,
> >
> > Samuel K. Gutierrez
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel