Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] mtt IBM SPAWN error
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2008-09-04 10:31:34


isn't it related to https://svn.open-mpi.org/trac/ompi/ticket/1469 ?

On 6/30/08, Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]> wrote:
>
> I am not familiar with spawn test of IBM, but maybe this is right
> behavior,
> if spawn test allocates 3 ranks on the node, and then allocates another 3
> then this test suppose to fail due to max_slots=4.
>
> But it fails with the fallowing hostfile as well BUT WITH A DIFFERENT
> ERROR.
>
> #cat hostfile2
> witch2 slots=4 max_slots=4
> witch3 slots=4 max_slots=4
> witch1:/home/BENCHMARKS/IBM # /home/USERS/lenny/OMPI_ORTE_18772/bin/mpirun
> -np 3 -hostfile hostfile2 dynamic/spawn
> bash: orted: command not found
> [witch1:22789]
> --------------------------------------------------------------------------
> A daemon (pid 22791) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
> There may be more information reported by the environment (see above).
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> [witch1:22789]
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
> --------------------------------------------------------------------------
> witch3 - daemon did not report back when launched
>
> On Mon, Jun 30, 2008 at 9:38 AM, Lenny Verkhovsky <
> lenny.verkhovsky_at_[hidden]> wrote:
>
>> Hi,
>> trying to run mtt I failed to run IBM spawn test. It fails only when using
>> hostfile, and not when using host list.
>> ( OMPI from TRUNK )
>>
>> This is working :
>> #mpirun -np 3 -H witch2 dynamic/spawn
>>
>> This Fails:
>> # cat hostfile
>> witch2 slots=4 max_slots=4
>>
>> #mpirun -np 3 -hostfile hostfile dynamic/spawn
>> [witch1:12392]
>> --------------------------------------------------------------------------
>> There are not enough slots available in the system to satisfy the 3 slots
>> that were requested by the application:
>> dynamic/spawn
>>
>> Either request fewer slots for your application, or make more slots
>> available
>> for use.
>> --------------------------------------------------------------------------
>> [witch1:12392]
>> --------------------------------------------------------------------------
>> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
>> launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>>
>>
>> Using hostfile1 also works
>> #cat hostfile1
>> witch2
>> witch2
>> witch2
>>
>>
>> Best Regards
>> Lenny.
>>
>
>