Jeff Squyres schrieb:
On Oct 31, 2007, at 1:18 AM, Murat Knecht wrote:

Yes I am, (master and child 1 running on the same machine).
But knowing the oversubscribing issue, I am using  
mpi_yield_when_idle which should fix precisely this problem, right?

It won't *fix* the problem -- you're still oversubscribing the nodes,  
so things will run slowly.  But it should help, in that the processes  
will yield regularly.
Yes. I meant "solving the blocking problem by letting others get some CPU time" by "fix".

What version of OMPI are you using?
I am using 1.2.4

I did give both machines multiple slots, so OpenMPI
"knows" that the possibility for more oversubscription may arise.

I'm not sure what you mean by this -- you should not "lie" to OMPI  
and tell it that it has more slots than it physically does.  But keep  
in mind that, as I described in my first mail, OMPI does not  
currently re-compute the number of processes on a host as you spawn  
(which can lead to the oversubscription problem).  If you're  
explicitly setting yield_when_idle, that *may* help, but we may or  
may not be explicitly propoagating that value to spawned  
processes...  I'll have to check.
In the hostfile I specified for each host the number of physically available cores. Together with the "yield" setting I hoped the oversubscription would be recognised even if the "oversubscribing" processes are dynamically started.
I re-checked the high/low parameter, but it does seem alright. Would be kind of awkward for this to be the reason, as the problem seems to depend on the host and the order.