Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] rank file error: Rankfile claimed...
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2009-08-17 05:20:34


Hi
This message means
that you are trying to use host "plankton", that was not allocated via
hostfile or hostlist.
But according to the files and command line, everything seems fine.
Can you try using "plankton.uzh.ch" hostname instead of "plankton".
thanks
Lenny.

On Mon, Aug 17, 2009 at 10:36 AM, jody <jody.xha_at_[hidden]> wrote:

> Hi
>
> When i use a rankfile, i get an error message which i don't understand:
>
> [jody_at_plankton tests]$ mpirun -np 3 -rf rankfile -hostfile testhosts
> ./HelloMPI
> --------------------------------------------------------------------------
> Rankfile claimed host plankton that was not allocated or
> oversubscribed it's slots:
>
> --------------------------------------------------------------------------
> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
> file rmaps_rank_file.c at line 108
> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
> file base/rmaps_base_map_job.c at line 87
> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
> file base/plm_base_launch_support.c at line 77
> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
> file plm_rsh_module.c at line 990
> --------------------------------------------------------------------------
> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
> launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
>
>
>
> With out the '-rf rankfile' option everything works as expected.
>
> My hostfile :
> [jody_at_plankton tests]$ cat testhosts
> # The following node is a quad-processor machine, and we absolutely
> # want to disallow over-subscribing it:
> plankton slots=3 max-slots=3
> # The following nodes are dual-processor machines:
> nano_00 slots=2 max-slots=2
> nano_01 slots=2 max-slots=2
> nano_02 slots=2 max-slots=2
> nano_03 slots=2 max-slots=2
> nano_04 slots=2 max-slots=2
> nano_05 slots=2 max-slots=2
> nano_06 slots=2 max-slots=2
>
> my rank file:
> [jody_at_plankton neander]$ cat rankfile
> rank 0=nano_00 slot=1
> rank 1=plankton slot=0
> rank 2=nano_01 slot=1
>
> my Open MPI version: 1.3.2
>
> i get the same error if i use a rankfile which has a single line
> rank 0=plankton slot=0
> (plankton is my local machine) and call mpirun with np 1
>
> What does the "Rankfile claimed..." message mean?
> Did i make an error in my rankfile?
> If yes, what would be the correct way to write it?
>
> Thank You
> Jody
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>