On Wed, Jul 29, 2009 at 3:42 PM, Ralph Castain
<rhc@open-mpi.org> wrote:
It sounds like perhaps IOF messages aren't getting relayed along the daemons. Note that the daemon on each node does have to be able to send TCP messages to all other nodes, not just mpirun.
Couple of things you can do to check:
1. -mca routed direct - this will send all messages direct instead of across the daemons
2. --leave-session-attached - will allow you to see any errors reported by the daemons, including those from attempting to relay messages
Ralph
Ralph, thanks for the quick response.
With
-mca routed direct
it works correctly.
With this:
mpirun -H 10.1.2.126,10.1.2.122,10.1.2.123,10.1.2.125 --leave-session-attached -np 4 /home/doriad/MPITest/hello-mpi
I still get no output nor errors from the daemons.
Is there a downside to using 'mca routed direct'? Or should I fix whatever is causing this daemon issue? You have any other tests for me to try to see what's wrong?
Thanks,
David
_______________________________________________