I have been trying to compile a molecular dynamics program
with the Openmpi 1.2.5 included in OFED 1.3. I am running Fedora Core 6;
the output of uname –r is 2.6.18-1.2798.fc6. I’ve traced the
problems I’ve been having back to openmpi because I’m unable to run
the test programs such as glob on more than one node. I currently have 2
nodes connected to an infiniband switch with opensm running on node1. The
nodes can ping each other and I am able to ssh between them without a password.
My openmpi-default-hostfile includes the following:
node1 slots=2 max-slots=4
node2 slots=4 max-slots=4
When I run “mpirun -np 4 --debug-daemons ./glob”
I get:
Daemon [0,0,1] checking in as pid 21341 on host node1
And the program appears to hang. Once I CTRL+C it a
couple of times I get the contents of error.txt
Per the instructions in the FAQ I’ve included the
output of “ibv_devinfo”, “ifconfig”, and “ulimit –l”
in the infiniband_info.txt file. The results of “ompi_info –all is
in the ompi_info.txt file.
I’ve been tearing my hear out over this, any help
would be greatly appreciated.
James Rudd
JLC-Biomedical/Biotechnology Research Institute
North Carolina Central University
700 George Street
Durham, NC 27707
Phone: (919) 530-7015
Email: jrudd@nccu.edu
http://ariel.acc.nccu.edu/Academics/BBRI/personnel/rudd.htm