On Aug 24, 2012, at 10:28 AM, Brock Palen wrote:
> I grabbed the new OMPI 1.6.1 and ran my test that would cause a hang with 1.6.0 with low registered memory. From reading the release notes rather than hang I would expect:
> * lower performance/fall back to send/receive.
> * a notice of failed to allocate registered memory
> In my case I still get a hang, is this expected?
It can still happen, yes. The short version is that there are cases that can't easily be fixed in the 1.6 series that involve lazy creation of QPs. Do you see errors about OMPI failing to create CQ's or QP'?
> This is running with default registered memory limits and I do appreciate the message that I only have 4GB of registered memory of my 48. We will also be fixing our load to raise this value, which should make this issue moot.
Did you get a warning about being able to register too little memory?
> Honestly I think what I would want is for MPI to blow up saying "can't allocate registered memory, fatal, contact your admin", rather than fall back to send/receive and just be slower.
Right now we should be just warning if we can't register 3/4 of your physical memory (we can't really test for anything more than that). But it doesn't abort.
We could add a tunable that makes it abort in this case, if you think that would be useful.
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/