Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] mtt IBM SPAWN error
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2008-06-30 07:21:30


I am not familiar with spawn test of IBM, but maybe this is right behavior,
if spawn test allocates 3 ranks on the node, and then allocates another 3
then this test suppose to fail due to max_slots=4.

But it fails with the fallowing hostfile as well BUT WITH A DIFFERENT ERROR.

#cat hostfile2
witch2 slots=4 max_slots=4
witch3 slots=4 max_slots=4
witch1:/home/BENCHMARKS/IBM # /home/USERS/lenny/OMPI_ORTE_18772/bin/mpirun
-np 3 -hostfile hostfile2 dynamic/spawn
bash: orted: command not found
[witch1:22789]
--------------------------------------------------------------------------
A daemon (pid 22791) died unexpectedly with status 127 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
[witch1:22789]
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
        witch3 - daemon did not report back when launched

On Mon, Jun 30, 2008 at 9:38 AM, Lenny Verkhovsky <
lenny.verkhovsky_at_[hidden]> wrote:

> Hi,
> trying to run mtt I failed to run IBM spawn test. It fails only when using
> hostfile, and not when using host list.
> ( OMPI from TRUNK )
>
> This is working :
> #mpirun -np 3 -H witch2 dynamic/spawn
>
> This Fails:
> # cat hostfile
> witch2 slots=4 max_slots=4
>
> #mpirun -np 3 -hostfile hostfile dynamic/spawn
> [witch1:12392]
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 3 slots
> that were requested by the application:
> dynamic/spawn
>
> Either request fewer slots for your application, or make more slots
> available
> for use.
> --------------------------------------------------------------------------
> [witch1:12392]
> --------------------------------------------------------------------------
> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
> launch so we are aborting.
>
> There may be more information reported by the environment (see above).
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
>
>
> Using hostfile1 also works
> #cat hostfile1
> witch2
> witch2
> witch2
>
>
> Best Regards
> Lenny.
>