Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Murat Knecht (murat.knecht_at_[hidden])
Date: 2007-10-31 04:18:21

Yes I am, (master and child 1 running on the same machine).
But knowing the oversubscribing issue, I am using mpi_yield_when_idle
which should fix precisely this problem, right?
Or is the option ignored,when initially there is no second process? I
did give both machines multiple slots, so OpenMPI
"knows" that the possibility for more oversubscription may arise.

Jeff Squyres schrieb:
> Are you perchance oversubscribing your nodes?
> Open MPI does not currently handle well when you initially
> undersubscribe your nodes but then, due to spawning, oversubscribe
> your nodes. In this case, OMPI will be aggressively polling in all
> processes, not realizing that the node is now oversubscribed and it
> should be yielding the processor so that other processes can run.
> On Oct 30, 2007, at 10:57 AM, Murat Knecht wrote:
>> Hi,
>> does someone know whether there is a special requirement on the
>> order of
>> spawning processes and the consequent merge of the intercommunicators?
>> I have two hosts, let's name them local and remote, and a parent
>> process
>> on local that goes on spawning one process on each one of the two
>> nodes.
>> After each spawn the parent process and all existing childs
>> participate
>> in merging the created Intercommunicator into an Intracommunicator
>> that
>> connects - in the end - alls three processes.
>> The weird thing is though, when I spawn them in the order local,
>> remote
>> at the second, the last spawn all three processes block when
>> encountering MPI_Merge. Though, when I switch the order around to
>> spawning first the process on remote and then on local, everything
>> works
>> out: The two processes are spawned and the Intracommunicators created
>> from the Merge. Everything goes well, too, if I decide to spawn both
>> processes on either one of the machines. (The existing children are
>> informed via a message that they shall participate in the Spawn and
>> Merge since these are collective operations.)
>> Is there some implicit developer-level knowledge that explains why the
>> order defines the outcome? Logically, there ought to be no difference.
>> Btw, I work with two Linux nodes and an ordinary Ethernet-TCP
>> connection
>> between them.
>> Thanks,
>> Murat
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]