On Mon, Aug 17, 2009 at 11:20 AM, Lenny
Verkhovsky<
lenny.verkhovsky@gmail.com> wrote:
> Hi
> This message means
> that you are trying to use host "plankton", that was not allocated via
> hostfile or hostlist.
> But according to the files and command line, everything seems fine.
> Can you try using "
plankton.uzh.ch" hostname instead of "plankton".
> thanks
> Lenny.
> On Mon, Aug 17, 2009 at 10:36 AM, jody <
jody.xha@gmail.com> wrote:
>>
>> Hi
>>
>> When i use a rankfile, i get an error message which i don't understand:
>>
>> [jody@plankton tests]$ mpirun -np 3 -rf rankfile -hostfile testhosts
>> ./HelloMPI
>> --------------------------------------------------------------------------
>> Rankfile claimed host plankton that was not allocated or
>> oversubscribed it's slots:
>>
>> --------------------------------------------------------------------------
>> [
plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
>> file rmaps_rank_file.c at line 108
>> [
plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
>> file base/rmaps_base_map_job.c at line 87
>> [
plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
>> file base/plm_base_launch_support.c at line 77
>> [
plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
>> file plm_rsh_module.c at line 990
>> --------------------------------------------------------------------------
>> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
>> launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>>
>>
>>
>> With out the '-rf rankfile' option everything works as expected.
>>
>> My hostfile :
>> [jody@plankton tests]$ cat testhosts
>> # The following node is a quad-processor machine, and we absolutely
>> # want to disallow over-subscribing it:
>> plankton slots=3 max-slots=3
>> # The following nodes are dual-processor machines:
>> nano_00 slots=2 max-slots=2
>> nano_01 slots=2 max-slots=2
>> nano_02 slots=2 max-slots=2
>> nano_03 slots=2 max-slots=2
>> nano_04 slots=2 max-slots=2
>> nano_05 slots=2 max-slots=2
>> nano_06 slots=2 max-slots=2
>>
>> my rank file:
>> [jody@plankton neander]$ cat rankfile
>> rank 0=nano_00 slot=1
>> rank 1=plankton slot=0
>> rank 2=nano_01 slot=1
>>
>> my Open MPI version: 1.3.2
>>
>> i get the same error if i use a rankfile which has a single line
>> rank 0=plankton slot=0
>> (plankton is my local machine) and call mpirun with np 1
>>
>> What does the "Rankfile claimed..." message mean?
>> Did i make an error in my rankfile?
>> If yes, what would be the correct way to write it?
>>
>> Thank You
>> Jody
>> _______________________________________________
>> users mailing list
>>
users@open-mpi.org
>>
http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
>
users@open-mpi.org
>
http://www.open-mpi.org/mailman/listinfo.cgi/users
>
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users