Jeff Squyres schrieb:
> On Oct 31, 2007, at 1:18 AM, Murat Knecht wrote:
>> Yes I am, (master and child 1 running on the same machine).
>> But knowing the oversubscribing issue, I am using
>> mpi_yield_when_idle which should fix precisely this problem, right?
> It won't *fix* the problem -- you're still oversubscribing the nodes,
> so things will run slowly. But it should help, in that the processes
> will yield regularly.
Yes. I meant "solving the blocking problem by letting others get some
CPU time" by "fix".
> What version of OMPI are you using?
I am using 1.2.4
>> I did give both machines multiple slots, so OpenMPI
>> "knows" that the possibility for more oversubscription may arise.
> I'm not sure what you mean by this -- you should not "lie" to OMPI
> and tell it that it has more slots than it physically does. But keep
> in mind that, as I described in my first mail, OMPI does not
> currently re-compute the number of processes on a host as you spawn
> (which can lead to the oversubscription problem). If you're
> explicitly setting yield_when_idle, that *may* help, but we may or
> may not be explicitly propoagating that value to spawned
> processes... I'll have to check.
In the hostfile I specified for each host the number of physically
available cores. Together with the "yield" setting I hoped the
oversubscription would be recognised even if the "oversubscribing"
processes are dynamically started.
I re-checked the high/low parameter, but it does seem alright. Would be
kind of awkward for this to be the reason, as the problem seems to
depend on the host and the order.