Hi everybody!
We are working on a new, experimental interconnection network (the EXTOLL network) and I am currently working on a MTL
component for that hardware. Actually, it works quite good :-)
Recently I included the RDMA mpool component for memory registration caching into my code. Again, this works
quite nicely and provides bandwidth improvements for large messages.
I observed one problem though:
If I turn on mpi_leave_pinned (and thus the registration cache is actually used), I see occasional memory corruption
issues for example when I call MPI_Allreduce often.
Debugging with valgrind did not lead to any clues, since OMPI refuses to run in that case. If I turn off
mpi_leave_pinned, everything seems to be fine.
I tested on version 1.3.3 and 1.3.4rc1.
Do you have any suggestions how to investigate this situation?
Thanks,
Mondrian Nuessle
--
Dr. Mondrian Nuessle
Phone: +49 621 181 2717 University of Heidelberg
Fax: +49 621 181 2713 Computer Architecture Group
mailto:nuessle_at_[hidden] http://ra.ziti.uni-heidelberg.de
|