Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OMPI 1.6.x Hang on khugepaged 100% CPU time
From: Yong Qin (yong.qin_at_[hidden])
Date: 2012-09-04 12:21:27

On Tue, Sep 4, 2012 at 5:42 AM, Yevgeny Kliteynik
<kliteyn_at_[hidden]> wrote:
> On 8/30/2012 10:28 PM, Yong Qin wrote:
>> On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres<jsquyres_at_[hidden]> wrote:
>>> On Aug 29, 2012, at 2:25 PM, Yong Qin wrote:
>>>> This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but
>>>> not on 1.4.5 (tcp btl is always fine). The application is VASP and
>>>> only one specific dataset is identified during the testing, and the OS
>>>> is SL 6.2 with kernel 2.6.32-220.23.1.el6.x86_64. The issue is that
>>>> when a certain type of load is put on OMPI 1.6.x, khugepaged thread
>>>> always runs with 100% CPU load, and it looks to me like that OMPI is
>>>> waiting for some memory to be available thus appears to be hung.
>>>> Reducing the per node processes would sometimes ease the problem a bit
>>>> but not always. So I did some further testing by playing around with
>>>> the kernel transparent hugepage support.
>>>> 1. Disable transparent hugepage support completely (echo never
>>>>> /sys/kernel/mm/redhat_transparent_hugepage/enabled). This would allow
>>>> the program to progress as normal (as in 1.4.5). Total run time for an
>>>> iteration is 3036.03 s.
>>> I'll admit that we have not tested using transparent hugepages. I wonder if there's some kind of bad interaction going on here...
>> The transparent hugepage is "transparent", which means it is
>> automatically applied to all applications unless it is explicitly told
>> otherwise. I highly suspect that it is not working properly in this
>> case.
> Like Jeff said - I don't think we've ever tested OMPI with transparent
> huge pages.

Thanks. But have you tested OMPI under RHEL 6 or its variants (CentOS
6, SL 6)? THP is on by default in RHEL 6 so no matter you want it or
not it's there.

>>> What exactly does changing this setting do?
>> Here ( is a pretty good documentation
>> on what these settings would do to the behaviour of the THP. I don't
>> think I can explain it better than the article so I will leave it to
>> you to digest. :)
>>>> 2. Disable VM defrag effort (echo never
>>>>> /sys/kernel/mm/redhat_transparent_hugepage/defrag). This allows the
>>>> program to run as well, but the performance is horrible. The same
>>>> iteration takes 4967.40 s.
>>>> 3. Disable defrag in khugepaged (echo no
>>>>> /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag). This
>>>> allows the program to run, and the performance is worse than #1 but
>>>> better than #2. The same iteration takes 3348.10 s.
>>>> 4. Disable both VM defrag and khugepaged defrag (#2 + #3). Similar
>>>> performance as #3.
>>>> So my question is, looks to me like this has to do with the memory
>>>> management in the openib btl, are we using huge pages in 1.6.x? If
>>>> that is true, is there a better way to resolve or workaround it within
>>>> OMPI itself without disabling transparent hugepage support? We'd like
>>>> to keep the hugepage support if possible.
>>> Mellanox -- can you comment on this?
> Actually, I don't think that THP were really tested with OFED.
> I can think of lots of ways thing can go wrong there.
> This might be a good question to address to Linux-RDMA mailing list.

This is quite useful information. I guess we will just turn off THP
support for now.

> -- YK