On Jun 15, 2012, at 11:26 AM, Daniels, Marcus G wrote:
>> Were there any clues in /var/log/messages or dmesg?
> Thanks. I found a suggestion from Nathan Hjelm to add "options mlx4_core log_mtts_per_seg=X" (where X is 5 in my case).
> Offline suggestions (which also included that) were also add "--mca mpi_leave_pinned 0" to the mpirun line and to double check my locked memory limits.
Setting leave_pinned to 0 will likely decrease your overall registered memory usage, but only over time. If you're not making it through MPI_INIT, then setting leave_pinned to 0 won't help.
> The only thing I find works reliably is to use "-npernode 32" instead of "-npernode 48". Unfortunately my system has 48 processor node.
Well, that's a bummer. You've somehow got some restrictions on how much registered memory you can set. You probably want to check with your IB vendor for further advice here.
One other thing you might want to try is to change Open MPI's receive queues to all be SRQ (as opposed to PP). See this FAQ item:
FWIW, in my regression testing, I run this set of RQ's as one of my tests:
--mca btl_openib_receive_queues S,128,256:S,2048,256:S,12288,256:S,65536,256
You may want to tweak these values to fit your applications, etc.
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/