Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Rainer Keller (keller_at_[hidden])
Date: 2007-10-30 04:26:52

Hello Jorge,
On Monday 29 October 2007 18:27, Jorge Parra wrote:
> When running openMPI my system freezes when initializing MPI (function
> MPI_init). This happens only when I try to run the process in multiples
> nodes in my cluster. Running multiple instances of the testing code
> locally (i.e ./mpirun -np 2 greetings) is succesful.
would it be possible to repeat the tests with the latest Open MPI-1.2.4

Even though nothing in Open MPI should make Your system freeze.
Could You check the logs on the nodes and possibly have a dmesg created just
before the MPI_Init...

> - rsh runs well, and is configured to full access. (i.e. rsh
> " date" is succesful, so they are "rsh AFRLMPPBM2 date" or
> "rsh"). Security is not an issue in this system.
> - uname -n and hostname return a valid hostname
> - The testing code (attached to this email) is run (and fails) as:
> ./mpirun --hostfile /root/hostfile -np 2 greetings . The hostfile has the
> names of the localnode (first entry:AFRLMPPBM1) and the remote node
> (second entry: AFRLMPPBM2). This file is also attached to this email.
> - The environment variables seem to be properly set (see env.log attached
> file). Local mpi programs (i.e. ./mpirun -np 2 greetings) run well.
> -.profile has the path information for both the executables and the
> libraries
> - orted runs in the remote node, however it does not print anything in
> console. The only output in the remote node is:
> pam_rhosts_auth[235]: user root has a `+' user entry
> pam_rhosts_auth[235]: allowed to root_at_[hidden] as root
> PAM_unix[235]: (rsh) session opened for user root by (uid=0)
> in.rshd[236]: root_at_[hidden] as root: cmd='( ! [ -e
> ./.profile ]
> || . ./.profile; orted --bootproxy 1 --name 0.0.1 --num_procs 3
You're running as root? Why is that?

> Then the remote process returns command prompt. However orted is in the
> background. The local process is frozen, and just prints: "Calling init",
> which is just before MPI_Init (see greetings.c).
> I believe the COMM WORLD cannot be correctly initialized. However I can't
> see which part of my configuration is wrong.
> Any help is greatly appreciated.

With best regards,

Dipl.-Inf. Rainer Keller
 HLRS                          Tel: ++49 (0)711-685 6 5858
 Nobelstrasse 19                  Fax: ++49 (0)711-685 6 5832
 70550 Stuttgart                    email: keller_at_[hidden]     
 Germany                             AIM/Skype:rusraink
"Emails save time, not printing them saves trees!"