Yes I am, (master and child 1 running on the same machine).
But knowing the oversubscribing issue, I am using mpi_yield_when_idle
which should fix precisely this problem, right?
Or is the option ignored,when initially there is no second process? I
did give both machines multiple slots, so OpenMPI
"knows" that the possibility for more oversubscription may arise.
Jeff Squyres schrieb:
> Are you perchance oversubscribing your nodes?
> Open MPI does not currently handle well when you initially
> undersubscribe your nodes but then, due to spawning, oversubscribe
> your nodes. In this case, OMPI will be aggressively polling in all
> processes, not realizing that the node is now oversubscribed and it
> should be yielding the processor so that other processes can run.
> On Oct 30, 2007, at 10:57 AM, Murat Knecht wrote:
>> does someone know whether there is a special requirement on the
>> order of
>> spawning processes and the consequent merge of the intercommunicators?
>> I have two hosts, let's name them local and remote, and a parent
>> on local that goes on spawning one process on each one of the two
>> After each spawn the parent process and all existing childs
>> in merging the created Intercommunicator into an Intracommunicator
>> connects - in the end - alls three processes.
>> The weird thing is though, when I spawn them in the order local,
>> at the second, the last spawn all three processes block when
>> encountering MPI_Merge. Though, when I switch the order around to
>> spawning first the process on remote and then on local, everything
>> out: The two processes are spawned and the Intracommunicators created
>> from the Merge. Everything goes well, too, if I decide to spawn both
>> processes on either one of the machines. (The existing children are
>> informed via a message that they shall participate in the Spawn and
>> Merge since these are collective operations.)
>> Is there some implicit developer-level knowledge that explains why the
>> order defines the outcome? Logically, there ought to be no difference.
>> Btw, I work with two Linux nodes and an ordinary Ethernet-TCP
>> between them.
>> users mailing list