Well, looks like we found the problem.

The memory free callback was incorrect. We were just looking for the base address of the free in the tree. Here is why this didn't work 
Probably wouldn't compile but works for an explanation: 

buf = malloc(40*1024);  /* malloc  10 pages */ 

 /* send the second half of the buffer to the peer */ 
/* note that leave_pinned will register and cache only what it sees in the send */ 
MPI_Send(buf+5*4*1024, 5*4*1024, ........ );   

/* free the buffer, mpi will try to find the registration in the tree 
    based on the address, buf,, but won't find it so the registration 
    remains */ 

So since the registration is  left in the tree, a future malloc may obtain a virtual address that is within the base and bound of the registration. When this memory is later freed we try to deregister the entire registration, part of which might be in use by another buffer, it could even be in the process of an RDMA operation. 

Anyway, I have modified the code and we are now passing a smaller linpack run with leave_pinned and the mem hooks enabled without using any mallopt trickiness. 



On Sep 25, 2005, at 10:58 AM, Galen M. Shipman wrote:

Well, after adding a bunch of debugging  output, I have found the following. 

With both leave_pinned and use_mem_hook enabled on a linpack run we get the assertion error on the memory callback in linpack. That is to say, there is a free occurring in the middle of a registration. 
At the point of assert we have NOT resized any registrations. 
The existing registrations in the tree are: 

Existing registrations:
Tyring to free

When we get the assert, we are trying to free: 247917216, which is in the middle of the registration. Note we have NOT resized any registrations so I am confident there is not an issue with either the tree or the resize at least as far as linpack is concerned. 
Here is the callstack: 

Note that the free occurs in the ATLAS libraries, I will look into re-building linpack with another BLAS library to see what happens. Any other suggestions? 



