Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] rank file error: Rankfile claimed...
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2009-08-17 06:05:23


I think it has something to do with your environment, /etc/hosts, IT setup,
hostname function return value e.t.c
I am not sure if it has something to do with Open MPI at all.
Lenny.
On Mon, Aug 17, 2009 at 12:59 PM, jody <jody.xha_at_[hidden]> wrote:

> Hi Lenny
>
> Thanks - using the full names makes it work!
> Is there a reason why the rankfile option treats
> host names differently than the hostfile option?
>
> Thanks
> Jody
>
>
>
> On Mon, Aug 17, 2009 at 11:20 AM, Lenny
> Verkhovsky<lenny.verkhovsky_at_[hidden]> wrote:
> > Hi
> > This message means
> > that you are trying to use host "plankton", that was not allocated via
> > hostfile or hostlist.
> > But according to the files and command line, everything seems fine.
> > Can you try using "plankton.uzh.ch" hostname instead of "plankton".
> > thanks
> > Lenny.
> > On Mon, Aug 17, 2009 at 10:36 AM, jody <jody.xha_at_[hidden]> wrote:
> >>
> >> Hi
> >>
> >> When i use a rankfile, i get an error message which i don't understand:
> >>
> >> [jody_at_plankton tests]$ mpirun -np 3 -rf rankfile -hostfile testhosts
> >> ./HelloMPI
> >>
> --------------------------------------------------------------------------
> >> Rankfile claimed host plankton that was not allocated or
> >> oversubscribed it's slots:
> >>
> >>
> --------------------------------------------------------------------------
> >> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
> >> file rmaps_rank_file.c at line 108
> >> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
> >> file base/rmaps_base_map_job.c at line 87
> >> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
> >> file base/plm_base_launch_support.c at line 77
> >> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
> >> file plm_rsh_module.c at line 990
> >>
> --------------------------------------------------------------------------
> >> A daemon (pid unknown) died unexpectedly on signal 1 while attempting
> to
> >> launch so we are aborting.
> >>
> >> There may be more information reported by the environment (see above).
> >>
> >> This may be because the daemon was unable to find all the needed shared
> >> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
> >> location of the shared libraries on the remote nodes and this will
> >> automatically be forwarded to the remote nodes.
> >>
> --------------------------------------------------------------------------
> >>
> --------------------------------------------------------------------------
> >> mpirun noticed that the job aborted, but has no info as to the process
> >> that caused that situation.
> >>
> --------------------------------------------------------------------------
> >> mpirun: clean termination accomplished
> >>
> >>
> >>
> >> With out the '-rf rankfile' option everything works as expected.
> >>
> >> My hostfile :
> >> [jody_at_plankton tests]$ cat testhosts
> >> # The following node is a quad-processor machine, and we absolutely
> >> # want to disallow over-subscribing it:
> >> plankton slots=3 max-slots=3
> >> # The following nodes are dual-processor machines:
> >> nano_00 slots=2 max-slots=2
> >> nano_01 slots=2 max-slots=2
> >> nano_02 slots=2 max-slots=2
> >> nano_03 slots=2 max-slots=2
> >> nano_04 slots=2 max-slots=2
> >> nano_05 slots=2 max-slots=2
> >> nano_06 slots=2 max-slots=2
> >>
> >> my rank file:
> >> [jody_at_plankton neander]$ cat rankfile
> >> rank 0=nano_00 slot=1
> >> rank 1=plankton slot=0
> >> rank 2=nano_01 slot=1
> >>
> >> my Open MPI version: 1.3.2
> >>
> >> i get the same error if i use a rankfile which has a single line
> >> rank 0=plankton slot=0
> >> (plankton is my local machine) and call mpirun with np 1
> >>
> >> What does the "Rankfile claimed..." message mean?
> >> Did i make an error in my rankfile?
> >> If yes, what would be the correct way to write it?
> >>
> >> Thank You
> >> Jody
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>