Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] why does --rankfile need hostlist?
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2009-06-22 04:22:32


I personally prefer the way it's now.
This way guaranties me total control over mapping and allocating slots.
When I am using rankfile mapper, I know exactly what and where I am putting,
OS can easily oversubscribe my CPU with unmapped by rankfile processes. I am
also not sure how it will effect users that have schedulers.
I am also not sure that users, who got used to work with hostfile would
change their scripts according to the mapper.
Lenny.

On Mon, Jun 22, 2009 at 1:23 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> Had a chance to think about how this might be done, and looked at it for
> awhile after getting home. I -think- I found a way to do it, but there are a
> couple of caveats:
> 1. Len's point about oversubscribing without warning would definitely hold
> true - this would positively be a "user beware" option
>
> 2. there could be no RM-provided allocation, hostfile, or -host options
> specified. Basically, I would be adding the "read rankfile" option to the
> end of the current allocation determination procedure
>
> I would still allow more procs than shown in the rankfile (mapping the rest
> bynode on the nodes specified in the rankfile - can't do byslot because I
> don't know how many slots are on each node), which means the only change in
> behavior would be the forced bynode mapping of unspecified procs.
>
> So use of this option will entail some risks and a slight difference in
> behavior, but would relieve you from the burden of having to provide a
> hostfile. I'm not personally convinced it is worth the risk and probable
> user complaints of "it didn't work", but since we don't use this option, I
> don't have a strong opinion on the matter.
>
> Let's just avoid going back-and-forth over wanting it, or how it should be
> implemented - let's get it all ironed out, and then implement it once, like
> we finally did at the end with the whole hostfile thing.
>
> Let me know if you want me to do this - it obviously isn't at the top of my
> priority list, but still could be done in the next few weeks.
>
> Ralph
>
>
> On Jun 21, 2009, at 9:00 AM, Lenny Verkhovsky wrote:
>
> Sorry for the delay in response,
> I totally agree with Ralph that it's not as easy as it seems,
> 1. rankfile mapper uses already allocated machines ( by scheduler or
> hostfile ), by using rankfile as a hostfile we can run into problem where
> trying to use unallocated nodes, what can hang the run.
> 2. we can't define in rankfile number of slots on each machine, which means
> oversubscribing can take place without any warning.
> 3. I personally dont see any problem using hostfile, even if it has
> redundant info, hostfile and rankfile belong to different layers in the
> system and solve different problems. The original hostfile ( if I recall
> correctly ) could bind rank to the node, but the syntax wasn't very flexible
> and clear.
> Lenny.
>
> On Sun, Jun 21, 2009 at 5:15 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> Let me suggest a two-step process, then:
>> 1. let's change the error message as this is easily done and thus can be
>> done now
>>
>> 2. I can look at how to eat the rankfile as a hostfile. This may not even
>> be possible - the problem is that the entire system is predicated on certain
>> ordering due to our framework architecture. So we get an allocation, and
>> then do a mapping against that allocation, filtering the allocation through
>> hostfiles, -host, and other options.
>>
>> By the time we reach the rankfile mapper, we have already determined that
>> we don't have an allocation and have to abort. It is the rankfile mapper
>> itself that looks for the -rankfile option, so the system can have no
>> knowledge that someone has specified that option before that point - and
>> thus, even if I could parse the rankfile, I don't know it was given!
>>
>> What will take time is to figure out a way to either:
>>
>> (a) allow us to run the mapper even though we don't have any nodes we know
>> about, and allow the mapper to insert the nodes itself - without causing
>> non-rankfile uses to break (which could be a major feat); or
>>
>> (b) have the overall system check for the rankfile option and pass it as a
>> hostfile as well, assuming that a hostfile wasn't also given, no RM-based
>> allocation exists, etc. - which breaks our abstraction rules and also opens
>> a possible can of worms.
>>
>> Either way, I also then have to teach the hostfile parser how to realize
>> it is a rankfile format and convert the info in it into what we expected to
>> receive from a hostfile - another non-trivial problem.
>>
>> I'm willing to give it a try - just trying to make clear why my response
>> was negative. It isn't as simple as it sounds...which is why Len and I
>> didn't pursue it when this was originally developed.
>>
>> Ralph
>>
>>
>> On Sun, Jun 21, 2009 at 5:28 AM, Terry Dontje <Terry.Dontje_at_[hidden]>wrote:
>>
>>> Being a part of these discussions I can understand your reticence to
>>> reopen this discussion. However, I think this is a major usability issue
>>> with this feature which actually is fairly important in order to get things
>>> to run performant. Which IMO is important.
>>>
>>> That being said I think there are one of two things that could be done to
>>> mitigate the issue.
>>>
>>> 1. To eliminate the element of surprise by changing mpirun to eat
>>> rankfile without the hostfile.
>>> 2. To change the error message to something understandable by the user
>>> such that they
>>> know they might be missing the hostfile option.
>>>
>>> Again I understand this topic is frustrating and there are some
>>> boundaries with the design that make these two option orthogonal to each
>>> other but I really believe we need to make the rankfile option something
>>> that is easily usable by our users.
>>>
>>>
>>> --td
>>>
>>> Ralph Castain wrote:
>>>
>>>> Having gone around in circles on hostfile-related issues for over five
>>>> years now, I honestly have little motivation to re-open the entire
>>>> discussion again. It doesn't seem to be that daunting a requirement for
>>>> those who are using it, so I'm inclined to just leave well enough alone.
>>>>
>>>> :-)
>>>>
>>>>
>>>> On Fri, Jun 19, 2009 at 2:21 PM, Eugene Loh <Eugene.Loh_at_[hidden]<mailto:
>>>> Eugene.Loh_at_[hidden]>> wrote:
>>>>
>>>> Ralph Castain wrote:
>>>>
>>>>> The two files have a slightly different format
>>>>>
>>>> Agreed.
>>>>
>>>>> and completely different meaning.
>>>>>
>>>> Somewhat agreed. They're both related to mapping processes onto a
>>>> cluster.
>>>>
>>>> The hostfile specifies how many slots are on a node. The rankfile
>>>>> specifies a rank and what node/slot it is to be mapped onto.
>>>>>
>>>> Agreed.
>>>>
>>>> Rankfiles can use relative node indexing and refer to nodes
>>>>> received from a resource manager - i.e., without any hostfile.
>>>>>
>>>> This is the main part I'm concerned about. E.g.,
>>>>
>>>> % cat rankfile
>>>> rank 0=node0 slot=0
>>>> rank 1=node1 slot=0
>>>> % mpirun -np 2 -rf rankfile ./a.out
>>>>
>>>> --------------------------------------------------------------------------
>>>> Rankfile claimed host node1 that was not allocated or
>>>> oversubscribed it's slots:
>>>>
>>>>
>>>> --------------------------------------------------------------------------
>>>> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>> rmaps_rank_file.c at line 107
>>>> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>> base/rmaps_base_map_job.c at line 86
>>>> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>> base/plm_base_launch_support.c at line 86
>>>> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
>>>> plm_rsh_module.c at line 1016
>>>> % mpirun -np 2 -host node0,node1 -rf rankfile ./a.out
>>>> 0 on node0
>>>> 1 on node1
>>>> done
>>>>
>>>> It seems to me that the rankfile has sufficient information to
>>>> express what I want it to do. But mpirun won't accept this. To
>>>> fix this, I have to, e.g., supply/maintain/specify redundant
>>>> information in a hostfile or host list.
>>>>
>>>> So the files are intentionally quite different. Trying to combine
>>>>> them would be rather ugly.
>>>>>
>>>> Right. And my issue is that I'm forced to use both when I only
>>>> want rankfile functionality.
>>>>
>>>> On Thu, Jun 18, 2009 at 1:52 PM, Eugene Loh <Eugene.Loh_at_[hidden]
>>>>> <mailto:Eugene.Loh_at_[hidden]>> wrote:
>>>>>
>>>>> In order to use "mpirun --rankfile", I also need to specify
>>>>> hosts/hostlist. But that information is redundant with what
>>>>> I provide in the rankfile. So, from a user's point of view,
>>>>> this strikes me as broken. Yes? Should I file a ticket, or
>>>>> am I missing something here about this functionality?
>>>>>
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden] <mailto:devel_at_[hidden]>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>