Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] --rankfile
From: jody (jody.xha_at_[hidden])
Date: 2009-08-19 02:27:14


Hi
I had a similar problem.
Following a suggestion from Lenny,
i removed the "max-slots" entries from
my hostsfile and it worked.

It seems that there still are some minor bugs in the rankfile mechanism.
See the post

http://www.open-mpi.org/community/lists/users/2009/08/10384.php

Jody

On Tue, Aug 18, 2009 at 10:53 PM, Nulik Nol<nuliknol_at_[hidden]> wrote:
> Hi,
> i get this error when i use --rankfile,
> "There are not enough slots available in the system to satisfy the 2 slots"
> what could be the problem? I have tried using '*' for 'slot' param and
> many other configs without any luck. Wihtout --rankfile everything
> works fine. Will appriciate any help.
>
> master waver # cat neat.hostfile
> n64 max-slots=1 slots=1
> master max-slots=1 slots=1
> master waver # cat neat.rankfile
> rank 0=n64 slot=0
> rank 1=master slot=0
> master waver # mpirun  --rankfile neat.rankfile --hostfile
> neat.hostfile -n 2 /tmp/neat
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 2 slots
> that were requested by the application:
>    /tmp/neat
>
> Either request fewer slots for your application, or make more slots available
> for use.
>
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
> launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
>
> master waver # mpirun   --hostfile neat.hostfile -n 2 /tmp/neat
> entering master main loop
> recieved msg from 1
> unknown message 0
> ^Cmpirun: killing job...
>
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 13064 on node master
> exited on signal 0 (Unknown signal 0).
> --------------------------------------------------------------------------
> 2 total processes killed (some possibly by mpirun during cleanup)
> mpirun: clean termination accomplished
>
> master waver #
>
>
> --
> ==================================
> The power of zero is infinite
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>