Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] rankfile error on openmpi/1.3.3
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2009-09-01 07:13:46


please try using full ( drdb0235.en.desres.deshaw.com ) hostname
in the hostfile/rankfile.
It should help.
Lenny.

On Mon, Aug 31, 2009 at 7:43 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> I'm afraid the rank-file mapper in 1.3.3 has several known problems that
> have been described on the list by users. We hopefully have those fixed in
> the upcoming 1.3.4 release.
>
> On Aug 31, 2009, at 10:01 AM, Sacerdoti, Federico wrote:
>
> Hi,
>
> I am trying to use the rankmap to bind a 4-proc mpi job to one socket of a
> two-socket, 8 core machine. However I'm getting a strange error.
>
> CMDS USED
> orterun --hostfile hostlist.1 -n 4 --mca rmaps_rank_file_path ./rankmap.1
> desres-netscan -o $OUTDIR
>
> $ cat rankmap.1
> rank 0=drdb0235.en slot=0:0
> rank 1=drdb0235.en slot=0:1
> rank 2=drdb0235.en slot=0:2
> rank 3=drdb0235.en slot=0:3
>
> $ cat hostlist.1
> drdb0235.en slots=8
> ERROR SEEN
> --------------------------------------------------------------------------
> Rankfile claimed host drdb0235.en that was not allocated or oversubscribed
> it's slots:
> --------------------------------------------------------------------------
> [drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG: Bad
> parameter in file rmaps_rank_file.c at line 108
> [drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG: Bad
> parameter in file base/rmaps_base_map_job.c at line 87
> [drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG: Bad
> parameter in file base/plm_base_launch_support.c at line 77
> [drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG: Bad
> parameter in file plm_rsh_module.c at line 985
>
> From looking at the code in rmaps_rank_file.c it seems the error occurs
> when the node-gathering code wraps twice around the hostlist. However I dont
> see why that is happening.
>
> If I specify 8 slots in the rankmap, I see a different error: Error,
> invalid rank (4) in the rankfile (./rankmap.1)
>
> Thanks,
> Federico
>
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>