Jeff Squyres schrieb:
On Oct 31, 2007, at 1:18 AM, Murat Knecht wrote:

  
Yes I am, (master and child 1 running on the same machine).
But knowing the oversubscribing issue, I am using  
mpi_yield_when_idle which should fix precisely this problem, right?
    

It won't *fix* the problem -- you're still oversubscribing the nodes,  
so things will run slowly.  But it should help, in that the processes  
will yield regularly.
  
Yes. I meant "solving the blocking problem by letting others get some CPU time" by "fix".

What version of OMPI are you using?
  
I am using 1.2.4

I did give both machines multiple slots, so OpenMPI
"knows" that the possibility for more oversubscription may arise.
    

I'm not sure what you mean by this -- you should not "lie" to OMPI  
and tell it that it has more slots than it physically does.  But keep  
in mind that, as I described in my first mail, OMPI does not  
currently re-compute the number of processes on a host as you spawn  
(which can lead to the oversubscription problem).  If you're  
explicitly setting yield_when_idle, that *may* help, but we may or  
may not be explicitly propoagating that value to spawned  
processes...  I'll have to check.
  
In the hostfile I specified for each host the number of physically available cores. Together with the "yield" setting I hoped the oversubscription would be recognised even if the "oversubscribing" processes are dynamically started.
I re-checked the high/low parameter, but it does seem alright. Would be kind of awkward for this to be the reason, as the problem seems to depend on the host and the order.

Thanks,
Murat