Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OMPI 1.6.x Hang on khugepaged 100% CPU time
From: Yong Qin (yong.qin_at_[hidden])
Date: 2012-08-30 15:28:56


On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> On Aug 29, 2012, at 2:25 PM, Yong Qin wrote:
>
>> This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but
>> not on 1.4.5 (tcp btl is always fine). The application is VASP and
>> only one specific dataset is identified during the testing, and the OS
>> is SL 6.2 with kernel 2.6.32-220.23.1.el6.x86_64. The issue is that
>> when a certain type of load is put on OMPI 1.6.x, khugepaged thread
>> always runs with 100% CPU load, and it looks to me like that OMPI is
>> waiting for some memory to be available thus appears to be hung.
>> Reducing the per node processes would sometimes ease the problem a bit
>> but not always. So I did some further testing by playing around with
>> the kernel transparent hugepage support.
>>
>> 1. Disable transparent hugepage support completely (echo never
>>> /sys/kernel/mm/redhat_transparent_hugepage/enabled). This would allow
>> the program to progress as normal (as in 1.4.5). Total run time for an
>> iteration is 3036.03 s.
>
> I'll admit that we have not tested using transparent hugepages. I wonder if there's some kind of bad interaction going on here...

The transparent hugepage is "transparent", which means it is
automatically applied to all applications unless it is explicitly told
otherwise. I highly suspect that it is not working properly in this
case.

>
> What exactly does changing this setting do?

Here (http://lwn.net/Articles/423592/) is a pretty good documentation
on what these settings would do to the behaviour of the THP. I don't
think I can explain it better than the article so I will leave it to
you to digest. :)

>
>> 2. Disable VM defrag effort (echo never
>>> /sys/kernel/mm/redhat_transparent_hugepage/defrag). This allows the
>> program to run as well, but the performance is horrible. The same
>> iteration takes 4967.40 s.
>>
>> 3. Disable defrag in khugepaged (echo no
>>> /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag). This
>> allows the program to run, and the performance is worse than #1 but
>> better than #2. The same iteration takes 3348.10 s.
>>
>> 4. Disable both VM defrag and khugepaged defrag (#2 + #3). Similar
>> performance as #3.
>>
>> So my question is, looks to me like this has to do with the memory
>> management in the openib btl, are we using huge pages in 1.6.x? If
>> that is true, is there a better way to resolve or workaround it within
>> OMPI itself without disabling transparent hugepage support? We'd like
>> to keep the hugepage support if possible.
>
> Mellanox -- can you comment on this?

THP is useful on large memory applications, which we have a lot here.
So having it working would definitely benefit us. But if there is no
work around from OMPI side, it is apparently more important to have
the application to run than just lose a few percent of performance, I
guess we will have to turn it off.

>
>> Also is this related to the
>> register memory imbalance issue that Jeff was mentioning recently
>> (http://blogs.cisco.com/performance/registered-memory-imbalances/)
>> because we definitely have this issue with this dataset from the
>> symptoms that I can tell, but I wouldn't expect it to hang on
>> khugepaged, or is this just a corner case?
>
> It *could* be... but I really have no idea (haven't thought about huge page support w.r.t. registered memory exhaustion / imbalance). Mellanox?
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Thanks,

Yong Qin