Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] The hostfile option
From: George Bosilca (bosilca_at_[hidden])
Date: 2012-07-31 09:30:41


On Jul 30, 2012, at 15:29 , Ralph Castain wrote:

>
> On Jul 30, 2012, at 2:37 AM, George Bosilca wrote:
>
>> I think that as long as there is a single home area per cluster the difference between the different approaches might seem irrelevant to most of the people.
>
> Yeah, I agree - after thinking about it, it probably didn't accomplish much.
>
>>
>> My problem is twofold. First, I have a common home area across several different development clusters. Thus I have direct access through ssh to any machine. If I create a single large machinefile, it turns out that every mpirun will spawn a daemon on every single node, even if I only run a ping-pong test.
>
> That shouldn't happen if you specify the hosts you want to use, either via -host or -hostfile. I assume you are specifying nothing and so you get that behavior?
>
>> Second, while I usually run my apps on the same set of resources I need on a regular base to switch my nodes for few tests.
>>
>> What I was hoping to achieve is a machinefile containing the "default" development cluster (aka. the cluster where I'm almost alone so my deamons have minimal chances to disturb other people experiences), and then use a machinefile to sporadicly change the cluster where I run for smaller tests. Unfortunately, this doesn't work due to the filtering behavior described in my original email.
>
> Why not just set the default hostfile to point to the new machinefile via the "--default-hostfile foo" option to mpirun, or you can use the corresponding MCA param?

I confirm, if instead of -machinefile I use --default-hostfile I get the behavior I expected (it overwrites the default).

> I'm not trying to re-open the hostfile discussion, but I would be interested to hear how you feel -hostfile should work. I kinda gather you feel it should override the default hostfile instead of filter it, yes? My point being that I don't particularly know if anyone would disagree with that behavior, so we might decide to modify things if you want to propose it.

Right, I would have expected to work in the same way as almost all the other MCA parameters, by overwriting the less variants with less priority. But I don't mind typing "--default-hostfile" instead of "-machinefile" to get the behavior I like.

  george.

>
> Ralph
>
>
>>
>> george.
>>
>>
>> On Jul 28, 2012, at 19:24 , Ralph Castain wrote:
>>
>>> It's been awhile, but I vaguely remember the discussion. IIRC, the rationale was that the default hostfile was equivalent to an RM allocation and should be treated the same. So hostfile and -host become filters in that case.
>>>
>>> FWIW, I believe the discussion was split on that question. I added a "none" option to the default hostfile MCA param so it would be ignored in the case where (a) the sys admin has given a default hostfile, but (b) someone wants to use hosts outside of it.
>>>
>>> MCA orte: parameter "orte_default_hostfile" (current value: <none>, data source: default value)
>>> Name of the default hostfile (relative or absolute path, "none" to ignore environmental or default MCA param setting)
>>>
>>> That said, I can see a use-case argument for behaving somewhat differently. We've even had cases where users have gotten an allocation from an RM, but want to add hosts that are external to the cluster to the job.
>>>
>>> It would be rather trivial to modify the logic:
>>>
>>> 1. read the default hostfile or RM allocation for our baseline
>>>
>>> 2. remove any hosts on that list that are *not* in the given hostfile
>>>
>>> 3. add any hosts that are in the given hostfile, but weren't in the default hostfile
>>>
>>> And subsequently do the same for -host. I think that would retain the spirit of the discussion, but provide more flexibility and provide a tad more "expected" behavior.
>>>
>>> I don't have an iron in this fire as I don't use hostfiles, so I'm happy to implement whatever the community would like to see.
>>> Ralph
>>>
>>> On Jul 27, 2012, at 6:30 PM, George Bosilca wrote:
>>>
>>>> I'm somewhat puzzled by the behavior of the -hostfile in Open MPI. Based on the FAQ it is supposed to provide a list of resources to be used by the launcher (in my case ssh) to start the processes. Make sense so far.
>>>>
>>>> However, if the configuration file contain a value for orte_default_hostfile, then the behavior of the hostfile option change drastically, and the option become a filter (the machines must be on the original list or a cryptic error message is displayed).
>>>>
>>>> Overall, we have a well defined [mostly] consistent behavior for parameters in Open MPI. We have an order of precedence of sources of MCA parameters, clearly defined which make understanding where a value comes straightforward. I'm absolutely certain there was a group discussion about this unique "eccentricity" regarding the hostfile option, but I fail to remember what was the reason we decided to go this way. Can I have a quick refresh please?
>>>>
>>>> Thanks,
>>>> george.
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel