This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but
not on 1.4.5 (tcp btl is always fine). The application is VASP and
only one specific dataset is identified during the testing, and the OS
is SL 6.2 with kernel 2.6.32-220.23.1.el6.x86_64. The issue is that
when a certain type of load is put on OMPI 1.6.x, khugepaged thread
always runs with 100% CPU load, and it looks to me like that OMPI is
waiting for some memory to be available thus appears to be hung.
Reducing the per node processes would sometimes ease the problem a bit
but not always. So I did some further testing by playing around with
the kernel transparent hugepage support.
1. Disable transparent hugepage support completely (echo never
>/sys/kernel/mm/redhat_transparent_hugepage/enabled). This would allow
the program to progress as normal (as in 1.4.5). Total run time for an
iteration is 3036.03 s.
2. Disable VM defrag effort (echo never
>/sys/kernel/mm/redhat_transparent_hugepage/defrag). This allows the
program to run as well, but the performance is horrible. The same
iteration takes 4967.40 s.
3. Disable defrag in khugepaged (echo no
allows the program to run, and the performance is worse than #1 but
better than #2. The same iteration takes 3348.10 s.
4. Disable both VM defrag and khugepaged defrag (#2 + #3). Similar
performance as #3.
So my question is, looks to me like this has to do with the memory
management in the openib btl, are we using huge pages in 1.6.x? If
that is true, is there a better way to resolve or workaround it within
OMPI itself without disabling transparent hugepage support? We'd like
to keep the hugepage support if possible. Also is this related to the
register memory imbalance issue that Jeff was mentioning recently
because we definitely have this issue with this dataset from the
symptoms that I can tell, but I wouldn't expect it to hang on
khugepaged, or is this just a corner case?
Thanks and any advice is appreciated.