Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] The hostfile option
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2012-07-30 23:07:45


>-----Original Message-----
>From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
>On Behalf Of Ralph Castain
>Sent: Monday, July 30, 2012 9:29 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] The hostfile option
>
>
>On Jul 30, 2012, at 2:37 AM, George Bosilca wrote:
>
>> I think that as long as there is a single home area per cluster the difference
>between the different approaches might seem irrelevant to most of the
>people.
>
>Yeah, I agree - after thinking about it, it probably didn't accomplish much.
>
>>
>> My problem is twofold. First, I have a common home area across several
>different development clusters. Thus I have direct access through ssh to any
>machine. If I create a single large machinefile, it turns out that every mpirun
>will spawn a daemon on every single node, even if I only run a ping-pong test.
>
>That shouldn't happen if you specify the hosts you want to use, either via -
>host or -hostfile. I assume you are specifying nothing and so you get that
>behavior?
>
>> Second, while I usually run my apps on the same set of resources I need on
>a regular base to switch my nodes for few tests.
>>
>> What I was hoping to achieve is a machinefile containing the "default"
>development cluster (aka. the cluster where I'm almost alone so my deamons
>have minimal chances to disturb other people experiences), and then use a
>machinefile to sporadicly change the cluster where I run for smaller tests.
>Unfortunately, this doesn't work due to the filtering behavior described in my
>original email.
>
>Why not just set the default hostfile to point to the new machinefile via the "-
>-default-hostfile foo" option to mpirun, or you can use the corresponding
>MCA param?
>
>I'm not trying to re-open the hostfile discussion, but I would be interested to
>hear how you feel -hostfile should work. I kinda gather you feel it should
>override the default hostfile instead of filter it, yes? My point being that I
>don't particularly know if anyone would disagree with that behavior, so we
>might decide to modify things if you want to propose it.
>
>Ralph
>

I wrote up the whole description in the Wiki a long while ago because there was a lot of confusion about
how things should behave with a resource manager. The general thought was that folks thought of hostfile
and host as a filter when running with a resource manager.

I never wrote anything about the case you are describing, with the hostfile filtering the default hostfile.
I would have assumed that the precedence of hostfile that you desire would be the way things work.
Therefore, I am fine if we change it with respect to default hostfile and hostfile.

The wiki reference is here: https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan

>>
>>
>> On Jul 28, 2012, at 19:24 , Ralph Castain wrote:
>>
>>> It's been awhile, but I vaguely remember the discussion. IIRC, the rationale
>was that the default hostfile was equivalent to an RM allocation and should be
>treated the same. So hostfile and -host become filters in that case.
>>>
>>> FWIW, I believe the discussion was split on that question. I added a "none"
>option to the default hostfile MCA param so it would be ignored in the case
>where (a) the sys admin has given a default hostfile, but (b) someone wants
>to use hosts outside of it.
>>>
>>> MCA orte: parameter "orte_default_hostfile" (current value:
><none>, data source: default value)
>>> Name of the default hostfile (relative or absolute path, "none"
>to ignore environmental or default MCA param setting)
>>>
>>> That said, I can see a use-case argument for behaving somewhat
>differently. We've even had cases where users have gotten an allocation from
>an RM, but want to add hosts that are external to the cluster to the job.
>>>
>>> It would be rather trivial to modify the logic:
>>>
>>> 1. read the default hostfile or RM allocation for our baseline
>>>
>>> 2. remove any hosts on that list that are *not* in the given hostfile
>>>
>>> 3. add any hosts that are in the given hostfile, but weren't in the default
>hostfile
>>>
>>> And subsequently do the same for -host. I think that would retain the spirit
>of the discussion, but provide more flexibility and provide a tad more
>"expected" behavior.
>>>
>>> I don't have an iron in this fire as I don't use hostfiles, so I'm happy to
>implement whatever the community would like to see.
>>> Ralph
>>>
>>> On Jul 27, 2012, at 6:30 PM, George Bosilca wrote:
>>>
>>>> I'm somewhat puzzled by the behavior of the -hostfile in Open MPI.
>Based on the FAQ it is supposed to provide a list of resources to be used by
>the launcher (in my case ssh) to start the processes. Make sense so far.
>>>>
>>>> However, if the configuration file contain a value for
>orte_default_hostfile, then the behavior of the hostfile option change
>drastically, and the option become a filter (the machines must be on the
>original list or a cryptic error message is displayed).
>>>>
>>>> Overall, we have a well defined [mostly] consistent behavior for
>parameters in Open MPI. We have an order of precedence of sources of MCA
>parameters, clearly defined which make understanding where a value comes
>straightforward. I'm absolutely certain there was a group discussion about this
>unique "eccentricity" regarding the hostfile option, but I fail to remember
>what was the reason we decided to go this way. Can I have a quick refresh
>please?
>>>>
>>>> Thanks,
>>>> george.
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------