Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] why does --rankfile need hostlist?
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2009-06-23 10:42:38


Yeah, I've been fighting with myself on whether providing a new
interface that combines the two makes sense or just muddies the water
more. I also been wondering if providing a generalized method/option to
describe binding patterns would also mitigate this issue by removing the
need for a rankfile at all.

--td

Mike Dubman wrote:
> just an idea, maybe it is worse to provide brand new cmd line option
> to mpirun. This option will accept filename and support combined
> syntax for machinefile/hostfile (to define allocations) and rankfile
> (to define placement).
>
> YAML syntax can be used in order to describe file primitives
> (http://www.yaml.org/start.html)
>
> for example:
>
>
> $ mpirun -clusterfile /path/to/clusterfile
> $ cat clusterfile
> hostX:
> slots : int
> maxslots : int
> ranks : rankid[@socket:core]
>
>
> example of clusterfile
> ===============
>
> hostX:
> slots : 4
> maxslots : 4
> ranks : 1,16,22
>
> hostY:
> slots : 8
> maxslots : 8
> ranks : 1_at_0:*, 3_at_2-3, 4_at_0:1, 5
>
>
> By doing so, we keep backwards compatability.
> after reading clusterfile, code should perform *hostfile* and
> *rankfile* parts as today.
>
> what do you think?
> Mike
>
>
>
> On Mon, Jun 22, 2009 at 1:30 PM, Terry Dontje <Terry.Dontje_at_[hidden]
> <mailto:Terry.Dontje_at_[hidden]>> wrote:
>
> Let us think about this some more. We'll try and reply later today.
>
> --td
>
> Ralph Castain wrote:
>
> Had a chance to think about how this might be done, and looked
> at it for awhile after getting home. I -think- I found a way
> to do it, but there are a couple of caveats:
>
> 1. Len's point about oversubscribing without warning would
> definitely hold true - this would positively be a "user
> beware" option
>
> 2. there could be no RM-provided allocation, hostfile, or
> -host options specified. Basically, I would be adding the
> "read rankfile" option to the end of the current allocation
> determination procedure
>
> I would still allow more procs than shown in the rankfile
> (mapping the rest bynode on the nodes specified in the
> rankfile - can't do byslot because I don't know how many slots
> are on each node), which means the only change in behavior
> would be the forced bynode mapping of unspecified procs.
>
> So use of this option will entail some risks and a slight
> difference in behavior, but would relieve you from the burden
> of having to provide a hostfile. I'm not personally convinced
> it is worth the risk and probable user complaints of "it
> didn't work", but since we don't use this option, I don't have
> a strong opinion on the matter.
>
> Let's just avoid going back-and-forth over wanting it, or how
> it should be implemented - let's get it all ironed out, and
> then implement it once, like we finally did at the end with
> the whole hostfile thing.
>
> Let me know if you want me to do this - it obviously isn't at
> the top of my priority list, but still could be done in the
> next few weeks.
>
> Ralph
>
>
> On Jun 21, 2009, at 9:00 AM, Lenny Verkhovsky wrote:
>
> Sorry for the delay in response, I totally agree with
> Ralph that it's not as easy as it seems, 1. rankfile
> mapper uses already allocated machines ( by scheduler or
> hostfile ), by using rankfile as a hostfile we can run
> into problem where trying to use unallocated nodes, what
> can hang the run.
> 2. we can't define in rankfile number of slots on each
> machine, which means oversubscribing can take place
> without any warning.
> 3. I personally dont see any problem using hostfile, even
> if it has redundant info, hostfile and rankfile belong to
> different layers in the system and solve different
> problems. The original hostfile ( if I recall correctly )
> could bind rank to the node, but the syntax wasn't very
> flexible and clear.
> Lenny.
>
> On Sun, Jun 21, 2009 at 5:15 PM, Ralph Castain
> <rhc_at_[hidden] <mailto:rhc_at_[hidden]>
> <mailto:rhc_at_[hidden] <mailto:rhc_at_[hidden]>>> wrote:
>
> Let me suggest a two-step process, then:
>
> 1. let's change the error message as this is easily
> done and thus
> can be done now
>
> 2. I can look at how to eat the rankfile as a hostfile.
> This may
> not even be possible - the problem is that the entire
> system is
> predicated on certain ordering due to our framework
> architecture.
> So we get an allocation, and then do a mapping against that
> allocation, filtering the allocation through hostfiles,
> -host,
> and other options.
>
> By the time we reach the rankfile mapper, we have already
> determined that we don't have an allocation and have to
> abort. It
> is the rankfile mapper itself that looks for the -rankfile
> option, so the system can have no knowledge that
> someone has
> specified that option before that point - and thus,
> even if I
> could parse the rankfile, I don't know it was given!
>
> What will take time is to figure out a way to either:
>
> (a) allow us to run the mapper even though we don't
> have any
> nodes we know about, and allow the mapper to insert the
> nodes
> itself - without causing non-rankfile uses to break
> (which could
> be a major feat); or
>
> (b) have the overall system check for the rankfile
> option and
> pass it as a hostfile as well, assuming that a hostfile
> wasn't
> also given, no RM-based allocation exists, etc. - which
> breaks
> our abstraction rules and also opens a possible can of
> worms.
>
> Either way, I also then have to teach the hostfile
> parser how to
> realize it is a rankfile format and convert the info in
> it into
> what we expected to receive from a hostfile - another
> non-trivial
> problem.
>
> I'm willing to give it a try - just trying to make
> clear why my
> response was negative. It isn't as simple as it
> sounds...which is
> why Len and I didn't pursue it when this was originally
> developed.
>
> Ralph
>
>
> On Sun, Jun 21, 2009 at 5:28 AM, Terry Dontje
> <Terry.Dontje_at_[hidden] <mailto:Terry.Dontje_at_[hidden]>
> <mailto:Terry.Dontje_at_[hidden]
> <mailto:Terry.Dontje_at_[hidden]>>> wrote:
>
> Being a part of these discussions I can understand your
> reticence to reopen this discussion. However, I
> think this
> is a major usability issue with this feature which
> actually
> is fairly important in order to get things to run
> performant.
> Which IMO is important.
>
> That being said I think there are one of two things
> that
> could be done to mitigate the issue.
>
> 1. To eliminate the element of surprise by
> changing mpirun
> to eat rankfile without the hostfile.
> 2. To change the error message to something
> understandable
> by the user such that they
> know they might be missing the hostfile option.
>
> Again I understand this topic is frustrating and
> there are
> some boundaries with the design that make these two
> option
> orthogonal to each other but I really believe we
> need to make
> the rankfile option something that is easily usable
> by our users.
>
>
> --td
>
> Ralph Castain wrote:
>
> Having gone around in circles on
> hostfile-related issues
> for over five years now, I honestly have little
> motivation to re-open the entire discussion
> again. It
> doesn't seem to be that daunting a requirement
> for those
> who are using it, so I'm inclined to just leave
> well
> enough alone.
>
> :-)
>
>
> On Fri, Jun 19, 2009 at 2:21 PM, Eugene Loh
> <Eugene.Loh_at_[hidden] <mailto:Eugene.Loh_at_[hidden]>
> <mailto:Eugene.Loh_at_[hidden] <mailto:Eugene.Loh_at_[hidden]>>
> <mailto:Eugene.Loh_at_[hidden]
> <mailto:Eugene.Loh_at_[hidden]> <mailto:Eugene.Loh_at_[hidden]
> <mailto:Eugene.Loh_at_[hidden]>>>>
>
> wrote:
>
> Ralph Castain wrote:
>
> The two files have a slightly different
> format
>
> Agreed.
>
> and completely different meaning.
>
> Somewhat agreed. They're both related to
> mapping
> processes onto a
> cluster.
>
> The hostfile specifies how many slots
> are on a
> node. The rankfile
> specifies a rank and what node/slot it
> is to be
> mapped onto.
>
> Agreed.
>
> Rankfiles can use relative node indexing
> and refer
> to nodes
> received from a resource manager - i.e.,
> without
> any hostfile.
>
> This is the main part I'm concerned about.
> E.g.,
>
> % cat rankfile
> rank 0=node0 slot=0
> rank 1=node1 slot=0
> % mpirun -np 2 -rf rankfile ./a.out
>
> --------------------------------------------------------------------------
> Rankfile claimed host node1 that was not
> allocated or
> oversubscribed it's slots:
>
>
> --------------------------------------------------------------------------
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
> parameter in file
> rmaps_rank_file.c at line 107
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
> parameter in file
> base/rmaps_base_map_job.c at line 86
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
> parameter in file
> base/plm_base_launch_support.c at line 86
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad
> parameter in file
> plm_rsh_module.c at line 1016
> % mpirun -np 2 -host node0,node1 -rf
> rankfile ./a.out
> 0 on node0
> 1 on node1
> done
>
> It seems to me that the rankfile has sufficient
> information to
> express what I want it to do. But mpirun
> won't accept
> this. To
> fix this, I have to, e.g.,
> supply/maintain/specify
> redundant
> information in a hostfile or host list.
>
> So the files are intentionally quite
> different.
> Trying to combine
> them would be rather ugly.
>
> Right. And my issue is that I'm forced to
> use both
> when I only
> want rankfile functionality.
>
> On Thu, Jun 18, 2009 at 1:52 PM, Eugene Loh
> <Eugene.Loh_at_[hidden]
> <mailto:Eugene.Loh_at_[hidden]> <mailto:Eugene.Loh_at_[hidden]
> <mailto:Eugene.Loh_at_[hidden]>>
> <mailto:Eugene.Loh_at_[hidden]
> <mailto:Eugene.Loh_at_[hidden]>
> <mailto:Eugene.Loh_at_[hidden]
> <mailto:Eugene.Loh_at_[hidden]>>>> wrote:
>
> In order to use "mpirun --rankfile",
> I also
> need to specify
> hosts/hostlist. But that information is
> redundant with what
> I provide in the rankfile. So, from
> a user's
> point of view,
> this strikes me as broken. Yes?
> Should I
> file a ticket, or
> am I missing something here about this
> functionality?
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> <mailto:devel_at_[hidden]> <mailto:devel_at_[hidden]
> <mailto:devel_at_[hidden]>>
> <mailto:devel_at_[hidden]
> <mailto:devel_at_[hidden]> <mailto:devel_at_[hidden]
> <mailto:devel_at_[hidden]>>>
>
>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> ------------------------------------------------------------------------
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden] <mailto:devel_at_[hidden]>
> <mailto:devel_at_[hidden] <mailto:devel_at_[hidden]>>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden] <mailto:devel_at_[hidden]>
> <mailto:devel_at_[hidden] <mailto:devel_at_[hidden]>>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden] <mailto:devel_at_[hidden]>
> <mailto:devel_at_[hidden] <mailto:devel_at_[hidden]>>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden] <mailto:devel_at_[hidden]>
> <mailto:devel_at_[hidden] <mailto:devel_at_[hidden]>>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden] <mailto:devel_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden] <mailto:devel_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>