Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OMPI 1.6.x Hang on khugepaged 100% CPU time
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-08-30 08:12:16


On Aug 29, 2012, at 2:25 PM, Yong Qin wrote:

> This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but
> not on 1.4.5 (tcp btl is always fine). The application is VASP and
> only one specific dataset is identified during the testing, and the OS
> is SL 6.2 with kernel 2.6.32-220.23.1.el6.x86_64. The issue is that
> when a certain type of load is put on OMPI 1.6.x, khugepaged thread
> always runs with 100% CPU load, and it looks to me like that OMPI is
> waiting for some memory to be available thus appears to be hung.
> Reducing the per node processes would sometimes ease the problem a bit
> but not always. So I did some further testing by playing around with
> the kernel transparent hugepage support.
>
> 1. Disable transparent hugepage support completely (echo never
>> /sys/kernel/mm/redhat_transparent_hugepage/enabled). This would allow
> the program to progress as normal (as in 1.4.5). Total run time for an
> iteration is 3036.03 s.

I'll admit that we have not tested using transparent hugepages. I wonder if there's some kind of bad interaction going on here...

What exactly does changing this setting do?

> 2. Disable VM defrag effort (echo never
>> /sys/kernel/mm/redhat_transparent_hugepage/defrag). This allows the
> program to run as well, but the performance is horrible. The same
> iteration takes 4967.40 s.
>
> 3. Disable defrag in khugepaged (echo no
>> /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag). This
> allows the program to run, and the performance is worse than #1 but
> better than #2. The same iteration takes 3348.10 s.
>
> 4. Disable both VM defrag and khugepaged defrag (#2 + #3). Similar
> performance as #3.
>
> So my question is, looks to me like this has to do with the memory
> management in the openib btl, are we using huge pages in 1.6.x? If
> that is true, is there a better way to resolve or workaround it within
> OMPI itself without disabling transparent hugepage support? We'd like
> to keep the hugepage support if possible.

Mellanox -- can you comment on this?

> Also is this related to the
> register memory imbalance issue that Jeff was mentioning recently
> (http://blogs.cisco.com/performance/registered-memory-imbalances/)
> because we definitely have this issue with this dataset from the
> symptoms that I can tell, but I wouldn't expect it to hang on
> khugepaged, or is this just a corner case?

It *could* be... but I really have no idea (haven't thought about huge page support w.r.t. registered memory exhaustion / imbalance). Mellanox?

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/