Fixed in r19001. Please re-test; it fixes the problem for me (i.e.,
no need to manually specify sched_yield=0).
BTW, this never came up before because:
- the ODLS used to use paffinity, but before PLPA supported the
topology stuff and therefore always returned the number of processors
- when we updated PLPA, the ODLS wasn't using paffinity anymore
- we only re-updated ODLS to use paffinity recently, and that's when
this problem surfaced
On Jul 23, 2008, at 11:32 AM, Jeff Squyres wrote:
> On Jul 23, 2008, at 10:37 AM, Terry Dontje wrote:
>> This seems to work for me too. What is interesting is my
>> experiments have shown that if you run on RH5.1 you don't need to
>> set mpi_yield_when_idle to 0.
> Yes, this makes sense -- on RHEL5.1, it's a much newer Linux kernel
> and PLPA works as expected there. So ODLS uses the values that PLPA
> passes back and all is good.
> On older Linux kernels, we're effectively returning "not supported"
> from paffinity, and therefore ODLS (rightly) assumes that it can't
> know anything and puts us into the "oversubscribed" state.
> I'm working on a fix.
> Jeff Squyres
> Cisco Systems
> devel mailing list