Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Mark Kosmowski (mark.kosmowski_at_[hidden])
Date: 2007-02-07 15:43:58


Dear Open-MPI list:

I'm trying to run two (soon to be three) dual opteron machines as a
cluster (network of workstations - they each have a disk and OS). I
can ssh between machines with no password. My open-mpi code compiled
fine and works great as an SMP program (using both processors on one
machine). However, I am not able to run my open-mpi program parallel
between the two computers.

For SMP work I use:

mpirun -np 2 myprogram inputfile >outputfile

For cluster work I have tried:

mpirun --hostfile myhostfile -np 4 myprogram inputfile >outputfile

which does not write to the output file.

I have also tried:

mpirun --hostfile myhostfile -np 4 `myprogram inputfile >outputfile`

which just ran serially on the initial machine.

The open-mpi executable and libraries are on the head node NFS shared
to the slave node. Both computers can run open-mpi [the open-mpi
application] as an SMP program with no problems. When I am trying to
run the open-mpi program with both computers, I am using a directory
that is an NFS share to the other computer.

I am running OpenSUSE 10.2 on both machines. I compiled with gcc 41 /
ifort 9.1.

I am using a gigabit network.

My hostfile specifies slots=2 max-slots=2 for each computer. The
computers are identified in the hostfile using the /etc/hosts alias.

The only config.log that I found was in the directory I used to make
open-mpi; since everything works as SMP, I am not including that file
with this initial message.

What should I be trying to do next to remedy this issue?

Any help would be appreciated.

Thanks,

Mark Kosmowski