I recently installed 1.6.2 on our cluster only to be introduced to the new
warning messages concerning registerable memory and physical memory.
OpenMPI is indicating:
Registerable memory: 32768 MiB
Total memory: 48434 MiB
Which is clearly less than the "3/4 total memory" that produces the
warning. However, our systems 1) have swap completely disabled and 2)
we've set the Linux kernel's vm behavior to disable overcommits. (i.e.
/proc/sys/vm/overcommit_memory == 2). So I'm not sure the guidance of
setting Registerable memory to twice physical memory makes sense. Worse
still, I don't believe I can increase the log_num_mtt (or log_mtts_per_seg)
as the any increase in these values would push cause registerable memory to
double (and exceed total memory).
OR... am I misunderstanding the situation? (Maybe it would be okay to have
more registerable memory if the drivers will properly handle the failed
malloc once they try and allocated memory beynd the physical memory).
So, in light of our vm and swap setting, would it still be appropriate to
increase log_num_mtt? If not, can we at least get a setting to suppress
the warning message or (can the 3/4 threshold be lowered slightly
perhaps 67% of total memory)?
Changing the vm or swap behavior is probably out of the question on our
systems. Our system stability improved dramatically when we went to these
settings (over the Linux default) as our systems would never OOM properly.