Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] getting opal_init:startup:internal-failure
From: E.O. (ooyama.eiichi_at_[hidden])
Date: 2013-04-29 08:12:30


I tried configuring/building an OMPI on the remote host but I was not able
to...
The remote host (host2) doesn't have any development tools, such as gcc,
make, etc...

Since I am able to run an MPI hello_c binary on the remote host, I believe
the host has all the necessary libraries needed for MPI. I am also able to
run an MPI hello_c binary on host1 from host2.

[root_at_host2 tmp]# mpirun -host localhost /tmp/hello.out
Hello World from processor host2, rank 0 out of 1 processors
[root_at_host2 tmp]# mpirun -host host2 /tmp/hello.out
Hello World from processor host2, rank 0 out of 1 processors
[root_at_host2 tmp]# mpirun -host host1 /tmp/hello.out
Hello World from processor host1, rank 0 out of 1 processors
[root_at_host2 tmp]#

However I still can't run hello_c binary on host2 from host1

[root_at_host1 tmp]# mpirun -host host2 /tmp/hello.out
--------------------------------------------------------------------------
Sorry! You were supposed to get help about:
    opal_init:startup:internal-failure
But I couldn't open the help file:
    //share/openmpi/help-opal-runtime.txt: No such file or directory.
Sorry!
--------------------------------------------------------------------------
[host2:02499] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
runtime/orte_init.c at line 79
[host2:02499] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
orted/orted_main.c at line 358
--------------------------------------------------------------------------
A daemon (pid 17710) died unexpectedly with status 255 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
[root_at_host1 tmp]#

If I set -prefix=/myname, it returns a different output

[root_at_host1 tmp]# mpirun -prefix=/myname -host host2 /tmp/hello.out
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: -prefix=/myname
Node: host1

while attempting to start process rank 0.
--------------------------------------------------------------------------
[root_at_host1 tmp]#

Do you still want me to try building OMPI on the remote host?

eiichi

On Sun, Apr 28, 2013 at 12:24 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> If you configure/build OMPI on the remote node using the same configure
> options you used on host1, does the problem go away?
>
>
> On Apr 28, 2013, at 8:58 AM, E.O. <ooyama.eiichi_at_[hidden]> wrote:
>
> Thank you Ralph!
> I ran it with "-prefix" option but I got this...
>
> [root_at_host1 tmp]# mpirun -prefix /myname -np 4 -host host2 ./hello.out
> --------------------------------------------------------------------------
> mpirun was unable to launch the specified application as it could not
> access
> or execute an executable:
>
> Executable: -prefix=/myname
> Node: host1
>
> while attempting to start process rank 0.
> --------------------------------------------------------------------------
> [root_at_host1 tmp]#
>
> I also updated PATH in the remote host (host2) to include /myname.
> But it didn't seem change anything...
>
> eiichi
>
>
>
>
> On Sun, Apr 28, 2013 at 11:48 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> The problem is likely that your path variables aren't being set properly
>> on the remote machine when mpirun launches the remote daemon. You might
>> check to see that your default shell rc file is also setting those values
>> correctly. Alternatively, modify your mpirun cmd line a bit by adding
>>
>> mpirun -prefix /myname ...
>>
>> so it will set the remove prefix and see if that helps. If it does, you
>> can add --enable-orterun-prefix-by-default to your configure line so mpirun
>> always adds it.
>>
>>
>> On Apr 28, 2013, at 7:56 AM, "E.O." <ooyama.eiichi_at_[hidden]> wrote:
>>
>> > Hello
>> >
>> > I have five linux machines (one is redhat and the other are busybox)
>> > I downloaded openmpi-1.6.4.tar.gz into my main redhat machine and
>> configure'ed/compiled it successfully.
>> > ./configure --prefix=/myname
>> > I installed it to /myname directory successfully. I am able to run a
>> simple hallo.c on my redhat machine.
>> >
>> > [root_at_host1 /tmp] # mpirun -np 4 ./hello.out
>> > I am parent
>> > I am a child
>> > I am a child
>> > I am a child
>> > [root_at_host1 /tmp] #
>> >
>> > Then, I sent entire /myname directory to the another machine (host2).
>> > [root_at_host1 /] # tar zcf - myname | ssh host2 "(cd /; tar zxf -)"
>> >
>> > and ran mpirun for the host (host2).
>> >
>> > [root_at_host1 tmp]# mpirun -np 4 -host host2 ./hello.out
>> >
>> --------------------------------------------------------------------------
>> > Sorry! You were supposed to get help about:
>> > opal_init:startup:internal-failure
>> > But I couldn't open the help file:
>> > //share/openmpi/help-opal-runtime.txt: No such file or directory.
>> Sorry!
>> >
>> --------------------------------------------------------------------------
>> > [host2:26294] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
>> runtime/orte_init.c at line 79
>> > [host2:26294] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
>> orted/orted_main.c at line 358
>> >
>> --------------------------------------------------------------------------
>> > A daemon (pid 23691) died unexpectedly with status 255 while attempting
>> > to launch so we are aborting.
>> >
>> > There may be more information reported by the environment (see above).
>> >
>> > This may be because the daemon was unable to find all the needed shared
>> > libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>> the
>> > location of the shared libraries on the remote nodes and this will
>> > automatically be forwarded to the remote nodes.
>> >
>> --------------------------------------------------------------------------
>> >
>> --------------------------------------------------------------------------
>> > mpirun noticed that the job aborted, but has no info as to the process
>> > that caused that situation.
>> >
>> --------------------------------------------------------------------------
>> > [root_at_host1 tmp]#
>> >
>> > I set those environment variables
>> >
>> > [root_at_host1 tmp]# echo $LD_LIBRARY_PATH
>> > /myname/lib/
>> > [root_at_host1 tmp]# echo $OPAL_PREFIX
>> > /myname/
>> > [root_at_host1 tmp]#
>> >
>> > [root_at_host2 /] # ls -la /myname/lib/libmpi.so.1
>> > lrwxrwxrwx 1 root root 15 Apr 28 10:21
>> /myname/lib/libmpi.so.1 -> libmpi.so.1.0.7
>> > [root_at_host2 /] #
>> >
>> > If I ran the ./hello.out binary inside host2, it works fine
>> >
>> > [root_at_host1 tmp]# ssh host2
>> > [root_at_host2 /] # /tmp/hello.out
>> > I am parent
>> > [root_at_host2 /] #
>> >
>> > Can someone help me figure out why I cannot run hello.out in host2 from
>> host1 ?
>> > Am I missing any env variables ?
>> >
>> > Thank you,
>> >
>> > Eiichi
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>