Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] mtt IBM SPAWN error
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2008-06-30 09:30:17


I saw it. But I think it something else, since it works if I run it with
hostlist

#mpirun -np 3 -H witch2,witch3 dynamic/spawn
#

On Mon, Jun 30, 2008 at 4:03 PM, Ralph H Castain <rhc_at_[hidden]> wrote:

> Well, that error indicates that it was unable to launch the daemon on
> witch3
> for some reason. If you look at the error reported by bash, you will see
> that the ³orted² binary wasn¹t found!
>
> Sounds like a path error ­ you might check to see if witch3 has the
> binaries
> installed, and if they are where you told the system to look...
>
> Ralph
>
>
>
> On 6/30/08 5:21 AM, "Lenny Verkhovsky" <lenny.verkhovsky_at_[hidden]> wrote:
>
> > I am not familiar with spawn test of IBM, but maybe this is right
> behavior,
> > if spawn test allocates 3 ranks on the node, and then allocates another 3
> > then this test suppose to fail due to max_slots=4.
> >
> > But it fails with the fallowing hostfile as well BUT WITH A DIFFERENT
> ERROR.
> >
> > #cat hostfile2
> > witch2 slots=4 max_slots=4
> > witch3 slots=4 max_slots=4
> > witch1:/home/BENCHMARKS/IBM #
> /home/USERS/lenny/OMPI_ORTE_18772/bin/mpirun -np
> > 3 -hostfile hostfile2 dynamic/spawn
> > bash: orted: command not found
> > [witch1:22789]
> >
> --------------------------------------------------------------------------
> > A daemon (pid 22791) died unexpectedly with status 127 while attempting
> > to launch so we are aborting.
> > There may be more information reported by the environment (see above).
> > This may be because the daemon was unable to find all the needed shared
> > libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
> > location of the shared libraries on the remote nodes and this will
> > automatically be forwarded to the remote nodes.
> >
> --------------------------------------------------------------------------
> > [witch1:22789]
> >
> --------------------------------------------------------------------------
> > mpirun was unable to cleanly terminate the daemons on the nodes shown
> > below. Additional manual cleanup may be required - please refer to
> > the "orte-clean" tool for assistance.
> >
> --------------------------------------------------------------------------
> > witch3 - daemon did not report back when launched
> >
> > On Mon, Jun 30, 2008 at 9:38 AM, Lenny Verkhovsky <
> lenny.verkhovsky_at_[hidden]>
> > wrote:
> >> Hi,
> >> trying to run mtt I failed to run IBM spawn test. It fails only when
> using
> >> hostfile, and not when using host list.
> >> ( OMPI from TRUNK )
> >>
> >> This is working :
> >> #mpirun -np 3 -H witch2 dynamic/spawn
> >>
> >> This Fails:
> >> # cat hostfile
> >> witch2 slots=4 max_slots=4
> >> #mpirun -np 3 -hostfile hostfile dynamic/spawn
> >> [witch1:12392]
> >>
> --------------------------------------------------------------------------
> >> There are not enough slots available in the system to satisfy the 3
> slots
> >> that were requested by the application:
> >> dynamic/spawn
> >>
> >> Either request fewer slots for your application, or make more slots
> available
> >> for use.
> >>
> --------------------------------------------------------------------------
> >> [witch1:12392]
> >>
> --------------------------------------------------------------------------
> >> A daemon (pid unknown) died unexpectedly on signal 1 while attempting
> to
> >> launch so we are aborting.
> >>
> >> There may be more information reported by the environment (see above).
> >>
> >> This may be because the daemon was unable to find all the needed shared
> >> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
> >> location of the shared libraries on the remote nodes and this will
> >> automatically be forwarded to the remote nodes.
> >>
> --------------------------------------------------------------------------
> >> mpirun: clean termination accomplished
> >>
> >>
> >> Using hostfile1 also works
> >> #cat hostfile1
> >> witch2
> >> witch2
> >> witch2
> >>
> >>
> >> Best Regards
> >> Lenny.
> >>
> >
>
>
>
>