I haven't followed OFED development for a long time, so I don't know if there is a buggy OFED in RHEL 5.4.
If you're doing development with the internals Open MPI (or if it'll be necessary to dive into the internals for debugging a custom device/driver), you might want to move this discussion to the devel list, not the user's list.
Open MPI does have a few open tickets about what happens when registered memory is exhausted. We just recently committed some improvements to this (although the problem is not fully solved) on the v1.4 and v1.5 branches. Open MPI v1.4.3 is pretty old, actually. Could you try upgrading to Open MPI v1.4.5, or the latest v1.5.5rc?
On Feb 27, 2012, at 2:10 AM, Venkateswara Rao Dokku wrote:
> We are facing a problem while running the IMB [Intel MPI Benchmark] tests on Centos 6.0.
> All the tests [PingPong, Exchange.. etc] stalls after some time with no errors.
> Our's is a customized OFED stack[Our own Driver specific library and Kernel drivers for the h/w], we use IMB tests for testing the same.
> We have already tested the same stack on RHEL5.4 and it was fine.
> Tests sends few packets and it is observed that acknowledgement for all those packets are received. But no more Send Work Queue entries added for the driver to process.
> Test does not return at all, just stalls there after sending few packets.
> Observed only in Centos 6/RHEL 6.
> Versions of packages installed :
> OpenMPI - 1.4.3
> LibIbVerbs - 1.1.4
> LibIbUmad - 1.3.6
> IMB - 3.2.2
> Please confirm if the versions are compatible with RHEL6. If not, Please suggest the appropriate packages.
> Please respond ASAP. Any help will be appreciated.
> Thanks & Regards,
> D.Venkateswara Rao,
> Software Engineer,One Convergence Devices Pvt Ltd.,
> Jubille Hills,Hyderabad.
> users mailing list
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/