Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] why does --rankfile need hostlist?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-06-19 22:43:23


Having gone around in circles on hostfile-related issues for over five years
now, I honestly have little motivation to re-open the entire discussion
again. It doesn't seem to be that daunting a requirement for those who are
using it, so I'm inclined to just leave well enough alone.
:-)

On Fri, Jun 19, 2009 at 2:21 PM, Eugene Loh <Eugene.Loh_at_[hidden]> wrote:

> Ralph Castain wrote:
>
> The two files have a slightly different format
>
> Agreed.
>
> and completely different meaning.
>
> Somewhat agreed. They're both related to mapping processes onto a cluster.
>
> The hostfile specifies how many slots are on a node. The rankfile specifies
> a rank and what node/slot it is to be mapped onto.
>
> Agreed.
>
> Rankfiles can use relative node indexing and refer to nodes received from a
> resource manager - i.e., without any hostfile.
>
> This is the main part I'm concerned about. E.g.,
>
> % cat rankfile
> rank 0=node0 slot=0
> rank 1=node1 slot=0
> % mpirun -np 2 -rf rankfile ./a.out
> --------------------------------------------------------------------------
> Rankfile claimed host node1 that was not allocated or oversubscribed it's
> slots:
>
> --------------------------------------------------------------------------
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
> rmaps_rank_file.c at line 107
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
> base/rmaps_base_map_job.c at line 86
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
> base/plm_base_launch_support.c at line 86
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
> plm_rsh_module.c at line 1016
> % mpirun -np 2 -host node0,node1 -rf rankfile ./a.out
> 0 on node0
> 1 on node1
> done
>
> It seems to me that the rankfile has sufficient information to express what
> I want it to do. But mpirun won't accept this. To fix this, I have to,
> e.g., supply/maintain/specify redundant information in a hostfile or host
> list.
>
> So the files are intentionally quite different. Trying to combine them
> would be rather ugly.
>
> Right. And my issue is that I'm forced to use both when I only want
> rankfile functionality.
>
> On Thu, Jun 18, 2009 at 1:52 PM, Eugene Loh <Eugene.Loh_at_[hidden]> wrote:
>
>> In order to use "mpirun --rankfile", I also need to specify
>> hosts/hostlist. But that information is redundant with what I provide in
>> the rankfile. So, from a user's point of view, this strikes me as broken.
>> Yes? Should I file a ticket, or am I missing something here about this
>> functionality?
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>