Yes I am, (master and child 1 running on the same machine).
But knowing the oversubscribing issue, I am using mpi_yield_when_idle which should fix precisely this problem, right?
Or is the option ignored,when initially there is no second process? I did give both machines multiple slots, so OpenMPI
"knows" that the possibility for more oversubscription may arise.
Confused,
Murat


Jeff Squyres schrieb:
Are you perchance oversubscribing your nodes?

Open MPI does not currently handle well when you initially  
undersubscribe your nodes but then, due to spawning, oversubscribe  
your nodes.  In this case, OMPI will be aggressively polling in all  
processes, not realizing that the node is now oversubscribed and it  
should be yielding the processor so that other processes can run.

On Oct 30, 2007, at 10:57 AM, Murat Knecht wrote:

  
Hi,

does someone know whether there is a special requirement on the  
order of
spawning processes and the consequent merge of the intercommunicators?
I have two hosts, let's name them local and remote, and a parent  
process
on local that goes on spawning one process on each one of the two  
nodes.
After each spawn the parent process and all existing childs  
participate
in merging the created Intercommunicator into an Intracommunicator  
that
connects - in the end - alls three processes.

The weird thing is though, when I spawn them in the order local,  
remote
at the second, the last spawn all three processes block when
encountering MPI_Merge. Though, when I switch the order around to
spawning first the process on remote and then on local, everything  
works
out: The two processes are spawned and the Intracommunicators created
from the Merge. Everything goes well, too, if I decide to spawn both
processes on either one of the machines. (The existing children are
informed via a message that they shall participate in the Spawn and
Merge since these are collective operations.)

Is there some implicit developer-level knowledge that explains why the
order defines the outcome? Logically, there ought to be no difference.
Btw, I work with two Linux nodes and an ordinary Ethernet-TCP  
connection
between them.

Thanks,
Murat
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users