On 7/17/07, Bill Johnstone <beejstone3_at_[hidden]> wrote:
> Hello all.
> I could really use help trying to figure out why mpirun is hanging as
> detailed in my previous message yesterday, 16 July. Since there's been
> no response, please allow me to give a short summary.
> -Open MPI 1.2.3 on GNU/Linux, 2.6.21 kernel, gcc 4.1.2, bash 3.2.15 is
> default shell
> -Open MPI installed to /usr/local, which is in non-interactive session
> -Systems are AMD64, using ethernet as interconnect, on private IP
> mpirun hangs whenever I invoke any process running on a remote node.
> It runs a job fine if I invoke it so that it only runs on the local
> node. Ctrl+C never successfully cancels an mpirun job -- I have to use
> kill -9.
> I'm asking for help trying to figure what steps have been taken by
> mpirun, and how I can figure out where things are getting stuck /
> crashing. What could be happening on the remote nodes? What debugging
> steps can I take?
> Without MPI running, the cluster is of no use, so I would really
> appreciate some help here.
1- Check to make sure that there are no firewalls blocking traffic
between the nodes.
2 - Check to make sure that all nodes have the openmpi installed
and have the very same executable you are trying to run on the same
path, have all permissions correctly.
3- Check to make sure that all nodes have the same interface, i.e. eth0 .
That's all i can think of for very quick checks for now. Hope it's
one of this.
> Need Mail bonding?
> Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
> users mailing list