Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpi problems,
From: Nehemiah Dacres (dacresni_at_[hidden])
Date: 2011-04-04 10:56:09


that's an excellent suggestion

On Mon, Apr 4, 2011 at 9:45 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:

> As Ralph indicated, he'll add the hostname to the error message (but that
> might be tricky; that error message is coming from rsh/ssh...).
>
> In the meantime, you might try (csh style):
>
> foreach host (`cat list`)
> echo $host
> ls -l /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted
> end
>
>
>
> On Apr 4, 2011, at 10:24 AM, Nehemiah Dacres wrote:
>
> > I have installed it via a symlink on all of the nodes, I can go 'tentakel
> which mpirun ' and it finds it' I'll check the library paths but isn't there
> a way to find out which nodes are returning the error?
> >
> >
> > On Thu, Mar 31, 2011 at 7:30 AM, Jeff Squyres <jsquyres_at_[hidden]>
> wrote:
> > The error message seems to imply that you don't have OMPI installed on
> all your nodes (because it didn't find /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted
> on a remote node).
> >
> >
> > On Mar 30, 2011, at 4:24 PM, Nehemiah Dacres wrote:
> >
> > > I am trying to figure out why my jobs aren't getting distributed and
> need some help. I have an install of sun cluster tools on Rockscluster 5.2
> (essentially centos4u2). this user's account has its home dir shared via
> nfs. I am getting some strange errors. here's an example run
> > >
> > >
> > > [jian_at_therock ~]$ /opt/SUNWhpc/HPC8.2.1c/sun/bin/mpirun -np 3
> -hostfile list ./job2.sh
> > > bash: /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted: No such file or directory
> > >
> --------------------------------------------------------------------------
> > > A daemon (pid 20362) died unexpectedly with status 127 while attempting
> > > to launch so we are aborting.
> > >
> > > There may be more information reported by the environment (see above).
> > >
> > > This may be because the daemon was unable to find all the needed shared
> > > libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
> > > location of the shared libraries on the remote nodes and this will
> > > automatically be forwarded to the remote nodes.
> > >
> --------------------------------------------------------------------------
> > >
> --------------------------------------------------------------------------
> > > mpirun noticed that the job aborted, but has no info as to the process
> > > that caused that situation.
> > >
> --------------------------------------------------------------------------
> > > mpirun: clean termination accomplished
> > >
> > > [jian_at_therock ~]$ /opt/SUNWhpc/HPC8.2.1c/sun/
> > > bin/ examples/ instrument/ man/
> > > etc/ include/ lib/ share/
> > > [jian_at_therock ~]$ /opt/SUNWhpc/HPC8.2.1c/sun/bin/orte
> > > orte-clean orted orte-iof orte-ps orterun
> > > [jian_at_therock ~]$ /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted
> > > [therock.slu.loc:20365] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
> in file runtime/orte_init.c at line 125
> > >
> --------------------------------------------------------------------------
> > > It looks like orte_init failed for some reason; your parallel process
> is
> > > likely to abort. There are many reasons that a parallel process can
> > > fail during orte_init; some of which are due to configuration or
> > > environment problems. This failure appears to be an internal failure;
> > > here's some additional information (which may only be relevant to an
> > > Open MPI developer):
> > >
> > > orte_ess_base_select failed
> > > --> Returned value Not found (-13) instead of ORTE_SUCCESS
> > >
> --------------------------------------------------------------------------
> > > [therock.slu.loc:20365] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
> in file orted/orted_main.c at line 325
> > > [jian_at_therock ~]$
> > >
> > >
> > > --
> > > Nehemiah I. Dacres
> > > System Administrator
> > > Advanced Technology Group Saint Louis University
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> > Nehemiah I. Dacres
> > System Administrator
> > Advanced Technology Group Saint Louis University
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Nehemiah I. Dacres
System Administrator
Advanced Technology Group Saint Louis University