Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] why does --rankfile need hostlist?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-06-19 22:43:23


Having gone around in circles on hostfile-related issues for over five years
now, I honestly have little motivation to re-open the entire discussion
again. It doesn't seem to be that daunting a requirement for those who are
using it, so I'm inclined to just leave well enough alone.
:-)

On Fri, Jun 19, 2009 at 2:21 PM, Eugene Loh <Eugene.Loh_at_[hidden]> wrote:

> Ralph Castain wrote:
>
> The two files have a slightly different format
>
> Agreed.
>
> and completely different meaning.
>
> Somewhat agreed. They're both related to mapping processes onto a cluster.
>
> The hostfile specifies how many slots are on a node. The rankfile specifies
> a rank and what node/slot it is to be mapped onto.
>
> Agreed.
>
> Rankfiles can use relative node indexing and refer to nodes received from a
> resource manager - i.e., without any hostfile.
>
> This is the main part I'm concerned about. E.g.,
>
> % cat rankfile
> rank 0=node0 slot=0
> rank 1=node1 slot=0
> % mpirun -np 2 -rf rankfile ./a.out
> --------------------------------------------------------------------------
> Rankfile claimed host node1 that was not allocated or oversubscribed it's
> slots:
>
> --------------------------------------------------------------------------
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
> rmaps_rank_file.c at line 107
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
> base/rmaps_base_map_job.c at line 86
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
> base/plm_base_launch_support.c at line 86
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
> plm_rsh_module.c at line 1016
> % mpirun -np 2 -host node0,node1 -rf rankfile ./a.out
> 0 on node0
> 1 on node1
> done
>
> It seems to me that the rankfile has sufficient information to express what
> I want it to do. But mpirun won't accept this. To fix this, I have to,
> e.g., supply/maintain/specify redundant information in a hostfile or host
> list.
>
> So the files are intentionally quite different. Trying to combine them
> would be rather ugly.
>
> Right. And my issue is that I'm forced to use both when I only want
> rankfile functionality.
>
> On Thu, Jun 18, 2009 at 1:52 PM, Eugene Loh <Eugene.Loh_at_[hidden]> wrote:
>
>> In order to use "mpirun --rankfile", I also need to specify
>> hosts/hostlist. But that information is redundant with what I provide in
>> the rankfile. So, from a user's point of view, this strikes me as broken.
>> Yes? Should I file a ticket, or am I missing something here about this
>> functionality?
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>