Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Galen Mark Shipman (gshipman_at_[hidden])
Date: 2005-10-19 10:23:36


> On Wed, Oct 19, 2005 at 09:05:41AM -0600, Galen Mark Shipman wrote:
>> We changed things a bit in the mpool, the deregister will remove the
>> registration from the cache and then call release, if the reference
>> count
>> is <=0, the memory is then deregistered, otherwise it will be
>> deregistered
>> later via another release call. The BTL module increments the reference
>> count on the registration so it should not be deregestered until the the
>> registration's reference count is decremented in btl_free.
>>
>> Are you seeing an actual vapi error with the attached code? If so it is
>> probably a reference count issue that we need to deal with.
> I see error with openib btl. I haven't check mvapi. I see that openib
> deregister and mvapi deregister implemented differently. Does openib lag
> behind mvapi?
>
Yes, openib is currently a bit behind, I should be able to fix this up in
the next day or two. Sorry about that, and thanks for the catch.

Galen

>
>>
>> Thanks,
>>
>> Galen
>>
>> > Hello Galen,
>> >
>> > It seams this issue is still present and can be easily triggered.
>> > (see attached program). Do you have plans to fix it?
>> >
>> > On Wed, Sep 21, 2005 at 12:06:18PM -0600, Galen M. Shipman wrote:
>> >> Gleb,
>> >>
>> >> >
>> >> > Gleb Natapov wrote:
>> >> >
>> >> >> Hello Galen,
>> >> >>
>> >> >> Finally I've got some time to look through the new code.
>> >> >> I have couple of notes. In pml_ob1_rdma.c you try to merge
>> >> >> registrations in the number of places. The code looks like this:
>> >> >> btl_mpool->mpool_deregister(btl_mpool, reg);
>> >> >> btl_mpool->mpool_register(btl_mpool,
>> >> >> new_base,
>> >> >> new_len,
>> >> >> MCA_MPOOL_FLAGS_CACHE,
>> >> >> &reg);
>> >> >> How do you know reg is not in use? You can't deregister it if
>> >> >> somebody
>> >> >> is using the registration!
>> >> >>
>> >> >
>> >> > Good catch... this should check the reference count and
>> >> > only deregister when the reference count actually goes to zero...
>> >> > _______________________________________________
>> >> > devel mailing list
>> >> > devel_at_[hidden]
>> >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >> >
>> >>
>> >> Yes, this was a good catch.. This was causing all sorts of fun for
>> us!
>> >> Thanks,
>> >>
>> >
>> > --
>> > Gleb.
>> >
>
> --
> Gleb.
>