I recently installed 1.6.2 on our cluster only to be introduced to the new warning messages concerning registerable memory and physical memory. OpenMPI is indicating:
Registerable memory: 32768 MiB
Total memory: 48434 MiB
Which is clearly less than the "3/4 total memory" that produces the warning. However, our systems 1) have swap completely disabled and 2) we've set the Linux kernel's vm behavior to disable overcommits. (i.e. /proc/sys/vm/overcommit_memory == 2). So I'm not sure the guidance of setting Registerable memory to twice physical memory makes sense. Worse still, I don't believe I can increase the log_num_mtt (or log_mtts_per_seg) as the any increase in these values would push cause registerable memory to double (and exceed total memory).
OR... am I misunderstanding the situation? (Maybe it would be okay to have more registerable memory if the drivers will properly handle the failed malloc once they try and allocated memory beynd the physical memory).
So, in light of our vm and swap setting, would it still be appropriate to increase log_num_mtt? If not, can we at least get a setting to suppress the warning message or (can the 3/4 threshold be lowered slightly perhaps 67% of total memory)?
Changing the vm or swap behavior is probably out of the question on our systems. Our system stability improved dramatically when we went to these settings (over the Linux default) as our systems would never OOM properly.