Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] mpi problems,
From: Nehemiah Dacres (dacresni_at_[hidden])
Date: 2011-04-04 10:24:02


I have installed it via a symlink on all of the nodes, I can go 'tentakel
which mpirun ' and it finds it' I'll check the library paths but isn't there
a way to find out which nodes are returning the error?

On Thu, Mar 31, 2011 at 7:30 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:

> The error message seems to imply that you don't have OMPI installed on all
> your nodes (because it didn't find /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted on a
> remote node).
>
>
> On Mar 30, 2011, at 4:24 PM, Nehemiah Dacres wrote:
>
> > I am trying to figure out why my jobs aren't getting distributed and need
> some help. I have an install of sun cluster tools on Rockscluster 5.2
> (essentially centos4u2). this user's account has its home dir shared via
> nfs. I am getting some strange errors. here's an example run
> >
> >
> > [jian_at_therock ~]$ /opt/SUNWhpc/HPC8.2.1c/sun/bin/mpirun -np 3 -hostfile
> list ./job2.sh
> > bash: /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted: No such file or directory
> >
> --------------------------------------------------------------------------
> > A daemon (pid 20362) died unexpectedly with status 127 while attempting
> > to launch so we are aborting.
> >
> > There may be more information reported by the environment (see above).
> >
> > This may be because the daemon was unable to find all the needed shared
> > libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
> > location of the shared libraries on the remote nodes and this will
> > automatically be forwarded to the remote nodes.
> >
> --------------------------------------------------------------------------
> >
> --------------------------------------------------------------------------
> > mpirun noticed that the job aborted, but has no info as to the process
> > that caused that situation.
> >
> --------------------------------------------------------------------------
> > mpirun: clean termination accomplished
> >
> > [jian_at_therock ~]$ /opt/SUNWhpc/HPC8.2.1c/sun/
> > bin/ examples/ instrument/ man/
> > etc/ include/ lib/ share/
> > [jian_at_therock ~]$ /opt/SUNWhpc/HPC8.2.1c/sun/bin/orte
> > orte-clean orted orte-iof orte-ps orterun
> > [jian_at_therock ~]$ /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted
> > [therock.slu.loc:20365] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
> file runtime/orte_init.c at line 125
> >
> --------------------------------------------------------------------------
> > It looks like orte_init failed for some reason; your parallel process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during orte_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> >
> > orte_ess_base_select failed
> > --> Returned value Not found (-13) instead of ORTE_SUCCESS
> >
> --------------------------------------------------------------------------
> > [therock.slu.loc:20365] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
> file orted/orted_main.c at line 325
> > [jian_at_therock ~]$
> >
> >
> > --
> > Nehemiah I. Dacres
> > System Administrator
> > Advanced Technology Group Saint Louis University
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Nehemiah I. Dacres
System Administrator
Advanced Technology Group Saint Louis University