Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] rankfile error on openmpi/1.3.3
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-08-31 12:43:54


I'm afraid the rank-file mapper in 1.3.3 has several known problems
that have been described on the list by users. We hopefully have those
fixed in the upcoming 1.3.4 release.

On Aug 31, 2009, at 10:01 AM, Sacerdoti, Federico wrote:

> Hi,
>
> I am trying to use the rankmap to bind a 4-proc mpi job to one
> socket of a two-socket, 8 core machine. However I'm getting a
> strange error.
>
> CMDS USED
> orterun --hostfile hostlist.1 -n 4 --mca rmaps_rank_file_path ./
> rankmap.1 desres-netscan -o $OUTDIR
>
> $ cat rankmap.1
> rank 0=drdb0235.en slot=0:0
> rank 1=drdb0235.en slot=0:1
> rank 2=drdb0235.en slot=0:2
> rank 3=drdb0235.en slot=0:3
>
> $ cat hostlist.1
> drdb0235.en slots=8
> ERROR SEEN
> --------------------------------------------------------------------------
> Rankfile claimed host drdb0235.en that was not allocated or
> oversubscribed it's slots:
> --------------------------------------------------------------------------
> [drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG:
> Bad parameter in file rmaps_rank_file.c at line 108
> [drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG:
> Bad parameter in file base/rmaps_base_map_job.c at line 87
> [drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG:
> Bad parameter in file base/plm_base_launch_support.c at line 77
> [drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG:
> Bad parameter in file plm_rsh_module.c at line 985
>
> From looking at the code in rmaps_rank_file.c it seems the error
> occurs when the node-gathering code wraps twice around the hostlist.
> However I dont see why that is happening.
>
> If I specify 8 slots in the rankmap, I see a different error: Error,
> invalid rank (4) in the rankfile (./rankmap.1)
>
> Thanks,
> Federico
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users