On Monday 29 October 2007 18:27, Jorge Parra wrote:
> When running openMPI my system freezes when initializing MPI (function
> MPI_init). This happens only when I try to run the process in multiples
> nodes in my cluster. Running multiple instances of the testing code
> locally (i.e ./mpirun -np 2 greetings) is succesful.
would it be possible to repeat the tests with the latest Open MPI-1.2.4
Even though nothing in Open MPI should make Your system freeze.
Could You check the logs on the nodes and possibly have a dmesg created just
before the MPI_Init...
> - rsh runs well, and is configured to full access. (i.e. rsh
> "192.168.1.103 date" is succesful, so they are "rsh AFRLMPPBM2 date" or
> "rsh AFRLMPPBM2.MPPdomain.com"). Security is not an issue in this system.
> - uname -n and hostname return a valid hostname
> - The testing code (attached to this email) is run (and fails) as:
> ./mpirun --hostfile /root/hostfile -np 2 greetings . The hostfile has the
> names of the localnode (first entry:AFRLMPPBM1) and the remote node
> (second entry: AFRLMPPBM2). This file is also attached to this email.
> - The environment variables seem to be properly set (see env.log attached
> file). Local mpi programs (i.e. ./mpirun -np 2 greetings) run well.
> -.profile has the path information for both the executables and the
> - orted runs in the remote node, however it does not print anything in
> console. The only output in the remote node is:
> pam_rhosts_auth: user root has a `+' user entry
> pam_rhosts_auth: allowed to root_at_[hidden] as root
> PAM_unix: (rsh) session opened for user root by (uid=0)
> in.rshd: root_at_[hidden] as root: cmd='( ! [ -e
> ./.profile ]
> || . ./.profile; orted --bootproxy 1 --name 0.0.1 --num_procs 3
You're running as root? Why is that?
> Then the remote process returns command prompt. However orted is in the
> background. The local process is frozen, and just prints: "Calling init",
> which is just before MPI_Init (see greetings.c).
> I believe the COMM WORLD cannot be correctly initialized. However I can't
> see which part of my configuration is wrong.
> Any help is greatly appreciated.
With best regards,
Dipl.-Inf. Rainer Keller http://www.hlrs.de/people/keller
HLRS Tel: ++49 (0)711-685 6 5858
Nobelstrasse 19 Fax: ++49 (0)711-685 6 5832
70550 Stuttgart email: keller_at_[hidden]
"Emails save time, not printing them saves trees!"