Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] mtt IBM SPAWN error
From: Ralph H Castain (rhc_at_[hidden])
Date: 2008-06-30 09:03:27


Well, that error indicates that it was unable to launch the daemon on witch3
for some reason. If you look at the error reported by bash, you will see
that the ³orted² binary wasn¹t found!

Sounds like a path error ­ you might check to see if witch3 has the binaries
installed, and if they are where you told the system to look...

Ralph

On 6/30/08 5:21 AM, "Lenny Verkhovsky" <lenny.verkhovsky_at_[hidden]> wrote:

> I am not familiar with spawn test of IBM, but maybe this is right behavior,
> if spawn test allocates 3 ranks on the node, and then allocates another 3
> then this test suppose to fail due to max_slots=4.
>
> But it fails with the fallowing hostfile as well BUT WITH A DIFFERENT ERROR.
>
> #cat hostfile2
> witch2 slots=4 max_slots=4
> witch3 slots=4 max_slots=4
> witch1:/home/BENCHMARKS/IBM # /home/USERS/lenny/OMPI_ORTE_18772/bin/mpirun -np
> 3 -hostfile hostfile2 dynamic/spawn
> bash: orted: command not found
> [witch1:22789]
> --------------------------------------------------------------------------
> A daemon (pid 22791) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
> There may be more information reported by the environment (see above).
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> [witch1:22789]
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
> --------------------------------------------------------------------------
> witch3 - daemon did not report back when launched
>
> On Mon, Jun 30, 2008 at 9:38 AM, Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]>
> wrote:
>> Hi,
>> trying to run mtt I failed to run IBM spawn test. It fails only when using
>> hostfile, and not when using host list.
>> ( OMPI from TRUNK )
>>
>> This is working :
>> #mpirun -np 3 -H witch2 dynamic/spawn
>>
>> This Fails:
>> # cat hostfile
>> witch2 slots=4 max_slots=4
>> #mpirun -np 3 -hostfile hostfile dynamic/spawn
>> [witch1:12392]
>> --------------------------------------------------------------------------
>> There are not enough slots available in the system to satisfy the 3 slots
>> that were requested by the application:
>> dynamic/spawn
>>
>> Either request fewer slots for your application, or make more slots available
>> for use.
>> --------------------------------------------------------------------------
>> [witch1:12392]
>> --------------------------------------------------------------------------
>> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
>> launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>>
>>
>> Using hostfile1 also works
>> #cat hostfile1
>> witch2
>> witch2
>> witch2
>>
>>
>> Best Regards
>> Lenny.
>>
>