You can start by adding --debug-daemons and --debug to your mpirun
command line. This will generate a lot of output related to the
operations done internally by the launcher. If you send this output
to the list we might be able to help you a little bit more.
On Jul 17, 2007, at 1:12 PM, Bill Johnstone wrote:
> Hello all.
> I could really use help trying to figure out why mpirun is hanging as
> detailed in my previous message yesterday, 16 July. Since there's
> no response, please allow me to give a short summary.
> -Open MPI 1.2.3 on GNU/Linux, 2.6.21 kernel, gcc 4.1.2, bash 3.2.15 is
> default shell
> -Open MPI installed to /usr/local, which is in non-interactive session
> -Systems are AMD64, using ethernet as interconnect, on private IP
> mpirun hangs whenever I invoke any process running on a remote node.
> It runs a job fine if I invoke it so that it only runs on the local
> node. Ctrl+C never successfully cancels an mpirun job -- I have to
> kill -9.
> I'm asking for help trying to figure what steps have been taken by
> mpirun, and how I can figure out where things are getting stuck /
> crashing. What could be happening on the remote nodes? What
> steps can I take?
> Without MPI running, the cluster is of no use, so I would really
> appreciate some help here.
> Need Mail bonding?
> Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
> users mailing list