I am running into similar issues with both Mellanox and IBM HCAs.

On a node installed with RHEL6.2 and MLNX_OFED-1.5.3-3.0.0, there is a significant hit to locked memory when going with the device's max_cqe.  Here, for comparison's sake is the memory utilization for a simple MPI process when using the new cq_size default, and when restricting it to 1500:

cq_size = max_cqe:
VmPeak:   348736 kB
VmSize:   348352 kB
VmLck:    292096 kB
VmHWM:    304896 kB
VmRSS:    304896 kB
VmData:   333504 kB

cq_size = 1500
VmPeak:    86720 kB
VmSize:    86336 kB
VmLck:     30080 kB
VmHWM:     42880 kB
VmRSS:     42880 kB
VmData:    71488 kB


For our Power systems using the IBM eHCA, the default value exhausts memory and we can't even run.

--Brad


On Fri, Jul 6, 2012 at 5:21 AM, TERRY DONTJE <terry.dontje@oracle.com> wrote:


On 7/5/2012 5:47 PM, Shamis, Pavel wrote:
I mentioned on the call that for Mellanox devices (+OFA verbs) this resource is really cheap. Do you run mellanox hca + OFA verbs ?
(I'll reply because I know Terry is offline for the rest of the day)

Yes, he does.
I asked because SUN used to have own verbs driver.
I noticed this on a Solaris machine, I am not sure I have the same set up for Linux but I'll look and see if I can reproduce the same issue on Linux.

--td


      
The heart of the question: is it incorrect to assume that we'll consume (num CQE * CQE size) registered memory for each QP opened?
QP or CQ ?  I think you don't want to assume anything there. Verbs (user and kernel) do their own magic there.
I think Mellanox should address this question.

Regards,
Pasha
_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.dontje@oracle.com




_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel