Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] --rankfile
From: Nulik Nol (nuliknol_at_[hidden])
Date: 2009-08-19 09:26:24


thanks a lot, it worked.

On Wed, Aug 19, 2009 at 1:27 AM, jody<jody.xha_at_[hidden]> wrote:
> Hi
> I had a similar problem.
> Following a suggestion from Lenny,
> i removed the "max-slots" entries from
> my hostsfile and it worked.
>
> It seems that there still are some minor bugs in the rankfile mechanism.
> See the post
>
> http://www.open-mpi.org/community/lists/users/2009/08/10384.php
>
>
> Jody
>
>
> On Tue, Aug 18, 2009 at 10:53 PM, Nulik Nol<nuliknol_at_[hidden]> wrote:
>> Hi,
>> i get this error when i use --rankfile,
>> "There are not enough slots available in the system to satisfy the 2 slots"
>> what could be the problem? I have tried using '*' for 'slot' param and
>> many other configs without any luck. Wihtout --rankfile everything
>> works fine. Will appriciate any help.
>>
>> master waver # cat neat.hostfile
>> n64 max-slots=1 slots=1
>> master max-slots=1 slots=1
>> master waver # cat neat.rankfile
>> rank 0=n64 slot=0
>> rank 1=master slot=0
>> master waver # mpirun --rankfile neat.rankfile --hostfile
>> neat.hostfile -n 2 /tmp/neat
>> --------------------------------------------------------------------------
>> There are not enough slots available in the system to satisfy the 2 slots
>> that were requested by the application:
>> /tmp/neat
>>
>> Either request fewer slots for your application, or make more slots available
>> for use.
>>
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
>> launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>>
>> master waver # mpirun --hostfile neat.hostfile -n 2 /tmp/neat
>> entering master main loop
>> recieved msg from 1
>> unknown message 0
>> ^Cmpirun: killing job...
>>
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 1 with PID 13064 on node master
>> exited on signal 0 (Unknown signal 0).
>> --------------------------------------------------------------------------
>> 2 total processes killed (some possibly by mpirun during cleanup)
>> mpirun: clean termination accomplished
>>
>> master waver #
>>
>>
>> --
>> ==================================
>> The power of zero is infinite
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
==================================
The power of zero is infinite