Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jorge Parra (jeparra_at_[hidden])
Date: 2007-10-31 11:29:21


Hi Jeff,

Sorry I did not see your post. Attached to this email are the outputs
requested by the help page. It is a compressed tar file containing the
output of .configure and the output of "make all". Please let me know if
more information is needed.

Thank you for your help,

Jorge

On Tue, 30 Oct 2007, Jeff Squyres wrote:

> On Oct 30, 2007, at 9:42 AM, Jorge Parra wrote:
>
>> Thank you for your reply. Linux does not freeze. The one that
>> freezes is
>> OpenMPI. Sorry for my unaccurate choice of words that led to
>> confusion.
>> Therefore dmesg does not show anything abnormal (I attached to this
>> email
>> a full dmesg log, captured when openmpi freezes).
>>
>> When openmpi ferezes I can, from another terminal, see that the
>> node on
>> which openmpi is originaly run (the local one) has two processes:
>> orted
>> and mpirun. The remote node has one: orted. This seems to be normal.
>> However, in both nodes there are not any openmpi activity. There is
>> only
>> an initial "calling init" printout in the local node (I included it in
>> the greetins.c program for testing purposes).
>>
>> Unfortunately, I have not been able to compile openmpi 1.2.4 or any
>> of the
>> 1.2 trunk versions. Trunks 1.0 and 1.1 copiled well in my system. I
>> already opened a case for this, but I received a message that the
>> person
>> it was assigned is in paternal leave. So I think I need to wait a
>> bit for
>> help on that :). So I am stuck with version 1.1.5.
>
> Are you referring to this thread:
>
> http://www.open-mpi.org/community/lists/users/2007/10/4218.php
>
> There's currently only one person on paternal leave, and although he
> is the powerpc guy :-), he's not really the build system guy (I'm
> kinda *guessing* that either OMPI or libltdl is choosing to build or
> link the wrong object -- but that's a SWAG without seeing any
> additional information).
>
> I sent you a reply on 24 Oct asking for a bit more information:
>
> http://www.open-mpi.org/community/lists/users/2007/10/4310.php
>
>> I am running openmpi as root because my system has some special
>> conditions. This is an attempt to make an embedded Massive Parallel
>> Processor (MPP), so the nodes are running embedded versions of linux,
>> where normally there is just one user (root). Since this is an
>> isolated
>> system, I did not thing this could be a problem (I don't care about
>> security issues too).
>>
>> Again, thank you for all your help,
>>
>> Jorge
>>
>>
>>
>> On Tue, 30 Oct 2007, Rainer Keller wrote:
>>
>>> Hello Jorge,
>>> On Monday 29 October 2007 18:27, Jorge Parra wrote:
>>>> When running openMPI my system freezes when initializing MPI
>>>> (function
>>>> MPI_init). This happens only when I try to run the process in
>>>> multiples
>>>> nodes in my cluster. Running multiple instances of the testing code
>>>> locally (i.e ./mpirun -np 2 greetings) is succesful.
>>> would it be possible to repeat the tests with the latest Open
>>> MPI-1.2.4
>>> version?
>>>
>>> Even though nothing in Open MPI should make Your system freeze.
>>> Could You check the logs on the nodes and possibly have a dmesg
>>> created just
>>> before the MPI_Init...
>>>
>>>> - rsh runs well, and is configured to full access. (i.e. rsh
>>>> "192.168.1.103 date" is succesful, so they are "rsh AFRLMPPBM2
>>>> date" or
>>>> "rsh AFRLMPPBM2.MPPdomain.com"). Security is not an issue in this
>>>> system.
>>>>
>>>> - uname -n and hostname return a valid hostname
>>>>
>>>> - The testing code (attached to this email) is run (and fails) as:
>>>> ./mpirun --hostfile /root/hostfile -np 2 greetings . The hostfile
>>>> has the
>>>> names of the localnode (first entry:AFRLMPPBM1) and the remote node
>>>> (second entry: AFRLMPPBM2). This file is also attached to this
>>>> email.
>>>>
>>>> - The environment variables seem to be properly set (see env.log
>>>> attached
>>>> file). Local mpi programs (i.e. ./mpirun -np 2 greetings) run well.
>>>>
>>>> -.profile has the path information for both the executables and the
>>>> libraries
>>>>
>>>> - orted runs in the remote node, however it does not print
>>>> anything in
>>>> console. The only output in the remote node is:
>>>>
>>>> pam_rhosts_auth[235]: user root has a `+' user entry
>>>> pam_rhosts_auth[235]: allowed to root_at_[hidden] as
>>>> root
>>>> PAM_unix[235]: (rsh) session opened for user root by (uid=0)
>>>> in.rshd[236]: root_at_[hidden] as root: cmd='( ! [ -e
>>>> ./.profile ]
>>>>
>>>> || . ./.profile; orted --bootproxy 1 --name 0.0.1 --num_procs 3
>>> You're running as root? Why is that?
>>>
>>>> Then the remote process returns command prompt. However orted is
>>>> in the
>>>> background. The local process is frozen, and just prints:
>>>> "Calling init",
>>>> which is just before MPI_Init (see greetings.c).
>>>>
>>>> I believe the COMM WORLD cannot be correctly initialized. However
>>>> I can't
>>>> see which part of my configuration is wrong.
>>>>
>>>> Any help is greatly appreciated.
>>>
>>> With best regards,
>>> Rainer
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>