This question was answered by Yevgeny Kliteynik from Mellanox on the developers list. The amount of registerable memory should be about twice the size of the physical memory because of the way physical memory is being registered with InfiniBand HCAs, not because of possible overcommitment. You can read the full description here:
Hristo Iliev, Ph.D. -- High Performance Computing
RWTH Aachen University, Center for Computing and Communication
Rechen- und Kommunikationszentrum der RWTH Aachen
Seffenter Weg 23, D 52074 Aachen (Germany)
From: firstname.lastname@example.org [mailto:email@example.com] On Behalf Of Alan Wild
Sent: Thursday, October 18, 2012 5:47 AM
Subject: [OMPI users] openmpi-1.6.2 and registerable memory
I recently installed 1.6.2 on our cluster only to be introduced to the new warning messages concerning registerable memory and physical memory. OpenMPI is indicating:
Registerable memory: 32768 MiB
Total memory: 48434 MiB
Which is clearly less than the "3/4 total memory" that produces the warning. However, our systems 1) have swap completely disabled and 2) we've set the Linux kernel's vm behavior to disable overcommits. (i.e. /proc/sys/vm/overcommit_memory == 2). So I'm not sure the guidance of setting Registerable memory to twice physical memory makes sense. Worse still, I don't believe I can increase the log_num_mtt (or log_mtts_per_seg) as the any increase in these values would push cause registerable memory to double (and exceed total memory).
OR... am I misunderstanding the situation? (Maybe it would be okay to have more registerable memory if the drivers will properly handle the failed malloc once they try and allocated memory beynd the physical memory).
So, in light of our vm and swap setting, would it still be appropriate to increase log_num_mtt? If not, can we at least get a setting to suppress the warning message or (can the 3/4 threshold be lowered slightly perhaps 67% of total memory)?
Changing the vm or swap behavior is probably out of the question on our systems. Our system stability improved dramatically when we went to these settings (over the Linux default) as our systems would never OOM properly.