On Oct 31, 2007, at 1:18 AM, Murat Knecht wrote:
Yes I am, (master and child 1 running on the same machine).
But knowing the oversubscribing issue, I am using
mpi_yield_when_idle which should fix precisely this problem, right?
It won't *fix* the problem -- you're still oversubscribing the nodes,
so things will run slowly. But it should help, in that the processes
will yield regularly.
Yes. I meant "solving the blocking problem by letting others get some
CPU time" by "fix".
I did give both machines multiple slots, so OpenMPI
"knows" that the possibility for more oversubscription may arise.
I'm not sure what you mean by this -- you should not "lie" to OMPI
and tell it that it has more slots than it physically does. But keep
in mind that, as I described in my first mail, OMPI does not
currently re-compute the number of processes on a host as you spawn
(which can lead to the oversubscription problem). If you're
explicitly setting yield_when_idle, that *may* help, but we may or
may not be explicitly propoagating that value to spawned
processes... I'll have to check.
In the hostfile I specified for each host the number of physically
available cores. Together with the "yield" setting I hoped the
oversubscription would be recognised even if the "oversubscribing"
processes are dynamically started.