Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jorge Parra (jeparra_at_[hidden])
Date: 2007-11-02 13:20:36


Hi, just trying to join the threads...

If someone has this same problem, you can check also:
[OMPI users] Error initializing openMPI

I am also attaching to this email the requested information (file
ompi-output.tar.gz).

Thanks Jeff, Thanks Rainer.

On Tue, 30 Oct 2007, Jeff Squyres wrote:

> On Oct 30, 2007, at 9:42 AM, Jorge Parra wrote:
>
>> Thank you for your reply. Linux does not freeze. The one that
>> freezes is
>> OpenMPI. Sorry for my unaccurate choice of words that led to
>> confusion.
>> Therefore dmesg does not show anything abnormal (I attached to this
>> email
>> a full dmesg log, captured when openmpi freezes).
>>
>> When openmpi ferezes I can, from another terminal, see that the
>> node on
>> which openmpi is originaly run (the local one) has two processes:
>> orted
>> and mpirun. The remote node has one: orted. This seems to be normal.
>> However, in both nodes there are not any openmpi activity. There is
>> only
>> an initial "calling init" printout in the local node (I included it in
>> the greetins.c program for testing purposes).
>>
>> Unfortunately, I have not been able to compile openmpi 1.2.4 or any
>> of the
>> 1.2 trunk versions. Trunks 1.0 and 1.1 copiled well in my system. I
>> already opened a case for this, but I received a message that the
>> person
>> it was assigned is in paternal leave. So I think I need to wait a
>> bit for
>> help on that :). So I am stuck with version 1.1.5.
>
> Are you referring to this thread:
>
> http://www.open-mpi.org/community/lists/users/2007/10/4218.php
>
> There's currently only one person on paternal leave, and although he
> is the powerpc guy :-), he's not really the build system guy (I'm
> kinda *guessing* that either OMPI or libltdl is choosing to build or
> link the wrong object -- but that's a SWAG without seeing any
> additional information).
>
> I sent you a reply on 24 Oct asking for a bit more information:
>
> http://www.open-mpi.org/community/lists/users/2007/10/4310.php
>
>> I am running openmpi as root because my system has some special
>> conditions. This is an attempt to make an embedded Massive Parallel
>> Processor (MPP), so the nodes are running embedded versions of linux,
>> where normally there is just one user (root). Since this is an
>> isolated
>> system, I did not thing this could be a problem (I don't care about
>> security issues too).
>>
>> Again, thank you for all your help,
>>
>> Jorge
>>
>>
>>
>> On Tue, 30 Oct 2007, Rainer Keller wrote:
>>
>>> Hello Jorge,
>>> On Monday 29 October 2007 18:27, Jorge Parra wrote:
>>>> When running openMPI my system freezes when initializing MPI
>>>> (function
>>>> MPI_init). This happens only when I try to run the process in
>>>> multiples
>>>> nodes in my cluster. Running multiple instances of the testing code
>>>> locally (i.e ./mpirun -np 2 greetings) is succesful.
>>> would it be possible to repeat the tests with the latest Open
>>> MPI-1.2.4
>>> version?
>>>
>>> Even though nothing in Open MPI should make Your system freeze.
>>> Could You check the logs on the nodes and possibly have a dmesg
>>> created just
>>> before the MPI_Init...
>>>
>>>> - rsh runs well, and is configured to full access. (i.e. rsh
>>>> "192.168.1.103 date" is succesful, so they are "rsh AFRLMPPBM2
>>>> date" or
>>>> "rsh AFRLMPPBM2.MPPdomain.com"). Security is not an issue in this
>>>> system.
>>>>
>>>> - uname -n and hostname return a valid hostname
>>>>
>>>> - The testing code (attached to this email) is run (and fails) as:
>>>> ./mpirun --hostfile /root/hostfile -np 2 greetings . The hostfile
>>>> has the
>>>> names of the localnode (first entry:AFRLMPPBM1) and the remote node
>>>> (second entry: AFRLMPPBM2). This file is also attached to this
>>>> email.
>>>>
>>>> - The environment variables seem to be properly set (see env.log
>>>> attached
>>>> file). Local mpi programs (i.e. ./mpirun -np 2 greetings) run well.
>>>>
>>>> -.profile has the path information for both the executables and the
>>>> libraries
>>>>
>>>> - orted runs in the remote node, however it does not print
>>>> anything in
>>>> console. The only output in the remote node is:
>>>>
>>>> pam_rhosts_auth[235]: user root has a `+' user entry
>>>> pam_rhosts_auth[235]: allowed to root_at_[hidden] as
>>>> root
>>>> PAM_unix[235]: (rsh) session opened for user root by (uid=0)
>>>> in.rshd[236]: root_at_[hidden] as root: cmd='( ! [ -e
>>>> ./.profile ]
>>>>
>>>> || . ./.profile; orted --bootproxy 1 --name 0.0.1 --num_procs 3
>>> You're running as root? Why is that?
>>>
>>>> Then the remote process returns command prompt. However orted is
>>>> in the
>>>> background. The local process is frozen, and just prints:
>>>> "Calling init",
>>>> which is just before MPI_Init (see greetings.c).
>>>>
>>>> I believe the COMM WORLD cannot be correctly initialized. However
>>>> I can't
>>>> see which part of my configuration is wrong.
>>>>
>>>> Any help is greatly appreciated.
>>>
>>> With best regards,
>>> Rainer
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>