Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] getting opal_init:startup:internal-failure
From: E.O. (ooyama.eiichi_at_[hidden])
Date: 2013-04-29 11:45:12


Thank you!
I agree that using NFS to share the home directory now..
I wanted to use --preload-binary option.
eiichi

On Mon, Apr 29, 2013 at 10:15 AM, Jeff Squyres (jsquyres) <
jsquyres_at_[hidden]> wrote:

> FWIW, to avoid using the --prefix option, you can set your PATH /
> LD_LIBRARY_PATH to point to the Open MPI installation on all nodes.
>
> Many organizations opt to have NFS-shared home directories, so that when
> you modify your "main" shell startup file (e.g., .bashrc) to point PATH and
> LD_LIBRARY_PATH to your Open MPI installation, it effectively modifies it
> for all nodes in the cluster.
>
>
>
> On Apr 29, 2013, at 8:56 AM, E.O. <ooyama.eiichi_at_[hidden]> wrote:
>
> > It works!!!
> > By putting two dash'es and no equal sign, it worked fine!!
> >
> > [root_at_host1 tmp]# mpirun --prefix /myname --host host2 /tmp/hello.out
> > Hello World from processor host2, rank 0 out of 1 processors
> > [root_at_host1 tmp]#
> >
> > It looks like one dash "-prefix" also works if I don't put an equal
> sign..
> >
> > Thank you very much!!
> >
> > Eiichi
> >
> >
> >
> > On Mon, Apr 29, 2013 at 8:29 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> > Hmm....okay. No, let's not bother to install a bunch of stuff you don't
> otherwise need.
> >
> > I probably mis-typed the "prefix" option - it has two dashes in front of
> it and no equal sign:
> >
> > mpirun --prefix ./myname ...
> >
> > I suspect you only put one dash, and the equal sign was a definite
> problem, which is why it gave you an error.
> >
> >
> > On Apr 29, 2013, at 5:12 AM, E.O. <ooyama.eiichi_at_[hidden]> wrote:
> >
> >> I tried configuring/building an OMPI on the remote host but I was not
> able to...
> >> The remote host (host2) doesn't have any development tools, such as
> gcc, make, etc...
> >>
> >> Since I am able to run an MPI hello_c binary on the remote host, I
> believe the host has all the necessary libraries needed for MPI. I am also
> able to run an MPI hello_c binary on host1 from host2.
> >>
> >> [root_at_host2 tmp]# mpirun -host localhost /tmp/hello.out
> >> Hello World from processor host2, rank 0 out of 1 processors
> >> [root_at_host2 tmp]# mpirun -host host2 /tmp/hello.out
> >> Hello World from processor host2, rank 0 out of 1 processors
> >> [root_at_host2 tmp]# mpirun -host host1 /tmp/hello.out
> >> Hello World from processor host1, rank 0 out of 1 processors
> >> [root_at_host2 tmp]#
> >>
> >> However I still can't run hello_c binary on host2 from host1
> >>
> >> [root_at_host1 tmp]# mpirun -host host2 /tmp/hello.out
> >>
> --------------------------------------------------------------------------
> >> Sorry! You were supposed to get help about:
> >> opal_init:startup:internal-failure
> >> But I couldn't open the help file:
> >> //share/openmpi/help-opal-runtime.txt: No such file or directory.
> Sorry!
> >>
> --------------------------------------------------------------------------
> >> [host2:02499] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 79
> >> [host2:02499] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 358
> >>
> --------------------------------------------------------------------------
> >> A daemon (pid 17710) died unexpectedly with status 255 while attempting
> >> to launch so we are aborting.
> >>
> >> There may be more information reported by the environment (see above).
> >>
> >> This may be because the daemon was unable to find all the needed shared
> >> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
> >> location of the shared libraries on the remote nodes and this will
> >> automatically be forwarded to the remote nodes.
> >>
> --------------------------------------------------------------------------
> >>
> --------------------------------------------------------------------------
> >> mpirun noticed that the job aborted, but has no info as to the process
> >> that caused that situation.
> >>
> --------------------------------------------------------------------------
> >> [root_at_host1 tmp]#
> >>
> >>
> >> If I set -prefix=/myname, it returns a different output
> >>
> >> [root_at_host1 tmp]# mpirun -prefix=/myname -host host2 /tmp/hello.out
> >>
> --------------------------------------------------------------------------
> >> mpirun was unable to launch the specified application as it could not
> access
> >> or execute an executable:
> >>
> >> Executable: -prefix=/myname
> >> Node: host1
> >>
> >> while attempting to start process rank 0.
> >>
> --------------------------------------------------------------------------
> >> [root_at_host1 tmp]#
> >>
> >> Do you still want me to try building OMPI on the remote host?
> >>
> >> eiichi
> >>
> >>
> >>
> >> On Sun, Apr 28, 2013 at 12:24 PM, Ralph Castain <rhc_at_[hidden]>
> wrote:
> >> If you configure/build OMPI on the remote node using the same configure
> options you used on host1, does the problem go away?
> >>
> >>
> >> On Apr 28, 2013, at 8:58 AM, E.O. <ooyama.eiichi_at_[hidden]> wrote:
> >>
> >>> Thank you Ralph!
> >>> I ran it with "-prefix" option but I got this...
> >>>
> >>> [root_at_host1 tmp]# mpirun -prefix /myname -np 4 -host host2 ./hello.out
> >>>
> --------------------------------------------------------------------------
> >>> mpirun was unable to launch the specified application as it could not
> access
> >>> or execute an executable:
> >>>
> >>> Executable: -prefix=/myname
> >>> Node: host1
> >>>
> >>> while attempting to start process rank 0.
> >>>
> --------------------------------------------------------------------------
> >>> [root_at_host1 tmp]#
> >>>
> >>> I also updated PATH in the remote host (host2) to include /myname.
> >>> But it didn't seem change anything...
> >>>
> >>> eiichi
> >>>
> >>>
> >>>
> >>>
> >>> On Sun, Apr 28, 2013 at 11:48 AM, Ralph Castain <rhc_at_[hidden]>
> wrote:
> >>> The problem is likely that your path variables aren't being set
> properly on the remote machine when mpirun launches the remote daemon. You
> might check to see that your default shell rc file is also setting those
> values correctly. Alternatively, modify your mpirun cmd line a bit by adding
> >>>
> >>> mpirun -prefix /myname ...
> >>>
> >>> so it will set the remove prefix and see if that helps. If it does,
> you can add --enable-orterun-prefix-by-default to your configure line so
> mpirun always adds it.
> >>>
> >>>
> >>> On Apr 28, 2013, at 7:56 AM, "E.O." <ooyama.eiichi_at_[hidden]> wrote:
> >>>
> >>> > Hello
> >>> >
> >>> > I have five linux machines (one is redhat and the other are busybox)
> >>> > I downloaded openmpi-1.6.4.tar.gz into my main redhat machine and
> configure'ed/compiled it successfully.
> >>> > ./configure --prefix=/myname
> >>> > I installed it to /myname directory successfully. I am able to run a
> simple hallo.c on my redhat machine.
> >>> >
> >>> > [root_at_host1 /tmp] # mpirun -np 4 ./hello.out
> >>> > I am parent
> >>> > I am a child
> >>> > I am a child
> >>> > I am a child
> >>> > [root_at_host1 /tmp] #
> >>> >
> >>> > Then, I sent entire /myname directory to the another machine (host2).
> >>> > [root_at_host1 /] # tar zcf - myname | ssh host2 "(cd /; tar zxf -)"
> >>> >
> >>> > and ran mpirun for the host (host2).
> >>> >
> >>> > [root_at_host1 tmp]# mpirun -np 4 -host host2 ./hello.out
> >>> >
> --------------------------------------------------------------------------
> >>> > Sorry! You were supposed to get help about:
> >>> > opal_init:startup:internal-failure
> >>> > But I couldn't open the help file:
> >>> > //share/openmpi/help-opal-runtime.txt: No such file or
> directory. Sorry!
> >>> >
> --------------------------------------------------------------------------
> >>> > [host2:26294] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
> runtime/orte_init.c at line 79
> >>> > [host2:26294] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
> orted/orted_main.c at line 358
> >>> >
> --------------------------------------------------------------------------
> >>> > A daemon (pid 23691) died unexpectedly with status 255 while
> attempting
> >>> > to launch so we are aborting.
> >>> >
> >>> > There may be more information reported by the environment (see
> above).
> >>> >
> >>> > This may be because the daemon was unable to find all the needed
> shared
> >>> > libraries on the remote node. You may set your LD_LIBRARY_PATH to
> have the
> >>> > location of the shared libraries on the remote nodes and this will
> >>> > automatically be forwarded to the remote nodes.
> >>> >
> --------------------------------------------------------------------------
> >>> >
> --------------------------------------------------------------------------
> >>> > mpirun noticed that the job aborted, but has no info as to the
> process
> >>> > that caused that situation.
> >>> >
> --------------------------------------------------------------------------
> >>> > [root_at_host1 tmp]#
> >>> >
> >>> > I set those environment variables
> >>> >
> >>> > [root_at_host1 tmp]# echo $LD_LIBRARY_PATH
> >>> > /myname/lib/
> >>> > [root_at_host1 tmp]# echo $OPAL_PREFIX
> >>> > /myname/
> >>> > [root_at_host1 tmp]#
> >>> >
> >>> > [root_at_host2 /] # ls -la /myname/lib/libmpi.so.1
> >>> > lrwxrwxrwx 1 root root 15 Apr 28 10:21
> /myname/lib/libmpi.so.1 -> libmpi.so.1.0.7
> >>> > [root_at_host2 /] #
> >>> >
> >>> > If I ran the ./hello.out binary inside host2, it works fine
> >>> >
> >>> > [root_at_host1 tmp]# ssh host2
> >>> > [root_at_host2 /] # /tmp/hello.out
> >>> > I am parent
> >>> > [root_at_host2 /] #
> >>> >
> >>> > Can someone help me figure out why I cannot run hello.out in host2
> from host1 ?
> >>> > Am I missing any env variables ?
> >>> >
> >>> > Thank you,
> >>> >
> >>> > Eiichi
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > users mailing list
> >>> > users_at_[hidden]
> >>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>