Everything is working properly now. I needed to reinstall Linux on
one of my nodes after a botched attempt at a network install - mpirun
... hostname worked, but my application hung and gave a connect()
At this point I decided to give up and try mpich instead. During the
mpich sanity checking, there was a more verbose error message
regarding the failed node, so I reinstalled the OS, reconfigured my
environment variables for OpenMPI and everything is now working.
Thanks for the help and support so far,
On 2/7/07, Mark Kosmowski <mark.kosmowski_at_[hidden]> wrote:
> Dear Open-MPI list:
> I'm trying to run two (soon to be three) dual opteron machines as a
> cluster (network of workstations - they each have a disk and OS). I
> can ssh between machines with no password. My open-mpi code compiled
> fine and works great as an SMP program (using both processors on one
> machine). However, I am not able to run my open-mpi program parallel
> between the two computers.
> For SMP work I use:
> mpirun -np 2 myprogram inputfile >outputfile
> For cluster work I have tried:
> mpirun --hostfile myhostfile -np 4 myprogram inputfile >outputfile
> which does not write to the output file.
> I have also tried:
> mpirun --hostfile myhostfile -np 4 `myprogram inputfile >outputfile`
> which just ran serially on the initial machine.
> The open-mpi executable and libraries are on the head node NFS shared
> to the slave node. Both computers can run open-mpi [the open-mpi
> application] as an SMP program with no problems. When I am trying to
> run the open-mpi program with both computers, I am using a directory
> that is an NFS share to the other computer.
> I am running OpenSUSE 10.2 on both machines. I compiled with gcc 41 /
> ifort 9.1.
> I am using a gigabit network.
> My hostfile specifies slots=2 max-slots=2 for each computer. The
> computers are identified in the hostfile using the /etc/hosts alias.
> The only config.log that I found was in the directory I used to make
> open-mpi; since everything works as SMP, I am not including that file
> with this initial message.
> What should I be trying to do next to remedy this issue?
> Any help would be appreciated.
> Mark Kosmowski