On Jun 15, 2012, at 8:02 AM, Jeff Squyres wrote:
> Were there any clues in /var/log/messages or dmesg?
Thanks. I found a suggestion from Nathan Hjelm to add "options mlx4_core log_mtts_per_seg=X" (where X is 5 in my case).
Offline suggestions (which also included that) were also add "--mca mpi_leave_pinned 0" to the mpirun line and to double check my locked memory limits.
The only thing I find works reliably is to use "-npernode 32" instead of "-npernode 48". Unfortunately my system has 48 processor node.
I've got lots of headroom on real memory.