Hi,
I have to spawn multiple slaves processes on a cluster, from a unique master process.
The open mpi distribution I use is 1.1.2.
I'm using a HP cluster, with 2 ethernet NICs on each machine.
My problem was a freeze of master when calling mpi_call_spawn_multiple, and of slaves when calling MPI_Init. This appened when I tried to spawn on multiple hosts (worked well on a unique host).
After working on the problem, I discovered that when I disabled eth1 on the hosts, everything got fine...
The same behavior appens fortunately when I use the "--mca btl_tcp_if_include eth0" parameter.
what is strange is that the problem stays if I use one of the followings :
"--mca btl_tcp_if_include eth1"
"--mca btl_tcp_if_exclude eth1"
"--mca btl_tcp_if_exclude eth0"
Is it impossible to use 2 Ethernet NICs at the same time for MPI applications ?
Will I have to always use eth0, and not eth1 for MPI communications ?
thanks,
Laurent.
|