having trouble with running a MPI program on a linux (centos 5.7) cluster.
my cluster has 16 nodes and 12 cpu cores for each node.
each node has 2 connections to a switch, eth0 and eth2.
ip addresses of the nodes are set as :
eth0 : 192.168.1.1/16
eth2 : 192.168.1.101/106
i would like to use eth2 for MPI communications.
i tried to run a program as :
mpiexec --mca btl_tcp_if_include eth2 --mca btl_tcp_if_exclude lo,eth0 -hostfile hostfile -n 192 ./my_program
the file 'hostfile' has lines such as:
and /etc/hosts file has lines such as:
but the program just simply hangs/stalls at MPI_Bcast(...) or MPI_Barrier(...).
MPI_Init(), MPI_Comm_rank(), and MPI_Comm_size() report exact results.
if the program is run when only the eth0 is set up (ifconfig eth2 down for all nodes and use another hostfile that contains node001 - node016), it runs just fine.
any help would be appreciated.
thanks in advance.
-- K. H. Pae