Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: G.O. (gurhan.ozen_at_[hidden])
Date: 2007-07-17 13:23:02

On 7/17/07, Bill Johnstone <beejstone3_at_[hidden]> wrote:
> Hello all.
> I could really use help trying to figure out why mpirun is hanging as
> detailed in my previous message yesterday, 16 July. Since there's been
> no response, please allow me to give a short summary.
> -Open MPI 1.2.3 on GNU/Linux, 2.6.21 kernel, gcc 4.1.2, bash 3.2.15 is
> default shell
> -Open MPI installed to /usr/local, which is in non-interactive session
> path
> -Systems are AMD64, using ethernet as interconnect, on private IP
> network
> mpirun hangs whenever I invoke any process running on a remote node.
> It runs a job fine if I invoke it so that it only runs on the local
> node. Ctrl+C never successfully cancels an mpirun job -- I have to use
> kill -9.
> I'm asking for help trying to figure what steps have been taken by
> mpirun, and how I can figure out where things are getting stuck /
> crashing. What could be happening on the remote nodes? What debugging
> steps can I take?
> Without MPI running, the cluster is of no use, so I would really
> appreciate some help here.

    1- Check to make sure that there are no firewalls blocking traffic
between the nodes.
    2 - Check to make sure that all nodes have the openmpi installed
and have the very same executable you are trying to run on the same
path, have all permissions correctly.
    3- Check to make sure that all nodes have the same interface, i.e. eth0 .

   That's all i can think of for very quick checks for now. Hope it's
one of this.

> ____________________________________________________________________________________
> Need Mail bonding?
> Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
> _______________________________________________
> users mailing list
> users_at_[hidden]