On Jul 12, 2012, at 12:04 PM, Paul Kapinos wrote:
> (cross-post to 'users' and 'devel' mailing lists)
Sorry for the delay in replying here; I got slammed with some deadlines this week...
The short version is that the issue has been confirmed. One root cause is Mellanox significantly decreasing default amounts of registered memory allowed on a node (!); it would be great if Mellanox could comment on that, since we've now seen several users impacted by this. You can tweak this by changing some ConnectX HCA module parameters.
We're working on ways to make Open MPI work correctly, even in the face of tiny amounts of registered memory. I'm not sure we have it right yet, but there are some things I can ask you to try. I'll reply in more detail on devel.
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/