Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] rankfile error on openmpi/1.3.3
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2009-09-01 07:18:19


I changed error message, I hope it will be more clear now.
r21919.

On Tue, Sep 1, 2009 at 2:13 PM, Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]
> wrote:

> please try using full ( drdb0235.en.desres.deshaw.com ) hostname
> in the hostfile/rankfile.
> It should help.
> Lenny.
>
> On Mon, Aug 31, 2009 at 7:43 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> I'm afraid the rank-file mapper in 1.3.3 has several known problems that
>> have been described on the list by users. We hopefully have those fixed in
>> the upcoming 1.3.4 release.
>>
>> On Aug 31, 2009, at 10:01 AM, Sacerdoti, Federico wrote:
>>
>> Hi,
>>
>> I am trying to use the rankmap to bind a 4-proc mpi job to one socket of a
>> two-socket, 8 core machine. However I'm getting a strange error.
>>
>> CMDS USED
>> orterun --hostfile hostlist.1 -n 4 --mca rmaps_rank_file_path ./rankmap.1
>> desres-netscan -o $OUTDIR
>>
>> $ cat rankmap.1
>> rank 0=drdb0235.en slot=0:0
>> rank 1=drdb0235.en slot=0:1
>> rank 2=drdb0235.en slot=0:2
>> rank 3=drdb0235.en slot=0:3
>>
>> $ cat hostlist.1
>> drdb0235.en slots=8
>> ERROR SEEN
>> --------------------------------------------------------------------------
>> Rankfile claimed host drdb0235.en that was not allocated or oversubscribed
>> it's slots:
>> --------------------------------------------------------------------------
>> [drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG: Bad
>> parameter in file rmaps_rank_file.c at line 108
>> [drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG: Bad
>> parameter in file base/rmaps_base_map_job.c at line 87
>> [drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG: Bad
>> parameter in file base/plm_base_launch_support.c at line 77
>> [drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG: Bad
>> parameter in file plm_rsh_module.c at line 985
>>
>> From looking at the code in rmaps_rank_file.c it seems the error occurs
>> when the node-gathering code wraps twice around the hostlist. However I dont
>> see why that is happening.
>>
>> If I specify 8 slots in the rankmap, I see a different error: Error,
>> invalid rank (4) in the rankfile (./rankmap.1)
>>
>> Thanks,
>> Federico
>>
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>