Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OMPI 1.3 - PERUSE peruse_comm_spec_t peer Negative Value
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-03-23 16:34:28


You are absolutely right, the peer should never be set to -1 on any of
the PERUSE callbacks. I checked the code this morning and figure out
what was the problem. We report the peer and the tag attached to a
request before setting the right values (some code moved around). I
submitted a patch and created a "move request" to have this correction
as soon as possible on one of our stable releases. The move request
can be followed using our TRAC system and the following link (https://svn.open-mpi.org/trac/ompi/ticket/1845
). If you want to play with this change please update your Open MPI
installation to a nightly build or a fresh checkout from the SVN with
at least revision 20844 (a nightly including this change will be
posted on our website tomorrow morning).

   Thanks,
     george.

On Mar 23, 2009, at 13:23 , Samuel K. Gutierrez wrote:

> Hi Kiril,
>
> Appreciate the quick response.
>
>> Hi Samuel,
>>
>> On Sat, 21 Mar 2009 18:18:54 -0600 (MDT)
>> "Samuel K. Gutierrez" <samuel_at_[hidden]> wrote:
>>> Hi All,
>>>
>>> I'm writing a simple profiling library which utilizes
>>> PERUSE. My callback
>>
>> So am I :)
>>
>>> function counts communication events (see example code
>>> below). I noticed
>>> that in OMPI v1.3 spec->peer is sometimes a negative
>>> value (OMPI v1.2.6
>>> did not exhibit this behavior). I added some boundary
>>> checks, but it
>>> seems as if this is a bug? I hope I'm not missing
>>> something...
>>
>> It took me quite some time to reproduce the error - I also
>
> Sorry about that - I should have provided more information.
>
>> got peer value "-1" for the Peruse peruse_comm_spec_t
>> struct. I only managed to reproduce this with
>> communication of a process with itself, which is an
>> unusual scenario. Anyway, for all the tests I did, the
>> error happened only when:
>>
>> -a process communicates with itself
>> -the MPI receive call is made
>> -the Peruse event "PERUSE_COMM_MSG_REMOVE_FROM_UNEX_Q" is
>> triggered
>
> That's interesting... Nice work!
>
>>
>>
>> The file ompi/mca/pml/ob1/pml_ob1_recvreq.c seems to be
>> the place where the above event is called with a wrong
>> value of the peer attribute.
>>
>> I will let you know if I find something.
>
> I will also take a look.
>
>>
>>
>> Best regards,
>> Kiril
>>
>>>
>>> The peruse test provided in the OMPI v1.3 source
>>> exhibits similar behavior:
>>> mpirun -np 2 ./mpi_peruse | grep peer:-1
>>>
>>> int callback(peruse_event_h event_h, MPI_Aint unique_id,
>>> peruse_comm_spec_t *spec, void *param) {
>>> if (spec->peer == rank) {
>>> return MPI_SUCCESS;
>>> }
>>> rrCounts[spec->peer]++;
>>> return MPI_SUCCESS;
>>> }
>>>
>>>
>>> Any insight is greatly appreciated.
>>>
>>> Thanks,
>>>
>>> Samuel K. Gutierrez
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
> Appreciate the help,
>
> Samuel K. Gutierrez
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel