I've found, in ifconfig, that each node has 2 interfaces, eth0 and eth1. I've run mpiexec with parameter --mca btl_tcp_if_include eth0 (or eth1) to see if there was some issues between nodes. Here are the results :
- node1,node2 works with eth1, not with eth0.
- node1,node3 works with eth1, not with eth0.
- node2,node3 does not work with eth1, but works with eth0.
- node1,node2,node3 works with eth1 (!), not with eth0.
These tests even work with activated firewalls.
Actually, order of nodes is important, as `mpiexec --mca btl_tcp_if_include eth0 --host node1,node2 ./ring_c` does not work, but `mpiexec --mca btl_tcp_if_include eth0 --host node2,node1 ./ring_c` works. Same thing append if I change order when launching the 3 processes (putting node2 at the first position). I find that a little bit disturbing, but I guess the network configuration is guilty.
Thanks a lot Jeff Squyres, your hints helped me to find the source of the problem. As it must often happen, the problem didn't come from OpenMPI but from network configuration.
I'll ask my sysadmin to help me configuring the interfaces, so as it to work without defining mca parameter.
Thank you one more time.
> What's the output from ifconfig on all nodes?
>For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/