Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [RFC] Default hostfile MCA param
From: Ralph H Castain (rhc_at_[hidden])
Date: 2008-03-04 08:40:12


On 3/4/08 5:51 AM, "Tim Prins" <tprins_at_[hidden]> wrote:

> We have used '^' elsewhere to indicate not, so maybe just have the
> syntax be if you put '^' at the beginning of a line, that node is not used.
>
> So we could have:
> n0
> n1
> ^headnode
> n3
>

That works for me and sounds like the right solution.

> I understand the idea of having a flag to indicate that all nodes below
> a certain point should be ignored, but I think this might get confusing,
> and I'm unsure how useful it would be. I just see the usefulness of this
> to block out a couple of nodes by default. Besides, if you do want to
> block out many nodes, any reasonable text editor allows you to insert
> '^' in front of any number of lines easily.
>
> Alternatively, for the particular situation that Edgar mentions, it may
> be good enough just to set rmaps_base_no_schedule_local in the mca
> params default file.
>
> One question though: If I am in a slurm allocation which contains n1,
> and there is a default hostfile that contains "^n1", will I run on 'n1'?

According to the precedence rules in the wiki, you would -not- run on n1.

>
> I'm not sure what the answer is, I know we talked about the precedence
> earlier...
>
> Tim
>
> Ralph H Castain wrote:
>> I personally have no objection, but I would ask then that the wiki be
>> modified to cover this case. All I require is that someone define the syntax
>> to be used to indicate "this is a node I do -not- want used", or
>> alternatively a flag that indicates "all nodes below are -not- to be used".
>>
>> Implementation isn't too hard once I have that...
>>
>>
>> On 3/3/08 9:44 AM, "Edgar Gabriel" <gabriel_at_[hidden]> wrote:
>>
>>> Ralph,
>>>
>>> could this mechanism be used also to exclude a node, indicating to never
>>> run a job there? Here is the problem that I face quite often: students
>>> working on the homework forget to allocate a partition on the cluster,
>>> and just type mpirun. Because of that, all jobs end up running on the
>>> front-end node.
>>>
>>> If we would have now the ability to specify in a default hostfile, to
>>> never run a job on a specified node (e.g. the front end node), users
>>> would get an error message when trying to do that. I am aware that
>>> that's a little ugly...
>>>
>>> THanks
>>> edgar
>>>
>>> Ralph Castain wrote:
>>>> I forget all the formatting we are supposed to use, so I hope you'll all
>>>> just bear with me.
>>>>
>>>> George brought up the fact that we used to have an MCA param to specify a
>>>> hostfile to use for a job. The hostfile behavior described on the wiki,
>>>> however, doesn't provide for that option. It associates a hostfile with a
>>>> specific app_context, and provides a detailed hierarchical layout of how
>>>> mpirun is to interpret that information.
>>>>
>>>> What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
>>>> to replace the deprecated capability. If found, the system's behavior will
>>>> be:
>>>>
>>>> 1. in a managed environment, the default hostfile will be used to filter
>>>> the
>>>> discovered nodes to define the available node pool. Any hostfile and/or
>>>> dash
>>>> host options provided to an app_context will be used to further filter the
>>>> node pool to define the specific nodes for use by that app_context. Thus,
>>>> nodes in the hostfile and dash host options given to an app_context -must-
>>>> also be in the default hostfile in order to be available for use by that
>>>> app_context - any nodes in the app_context options that are not in the
>>>> default hostfile will be ignored.
>>>>
>>>> 2. in an unmanaged environment, the default hostfile will be used to define
>>>> the available node pool. Any hostfile and/or dash host options provided to
>>>> an app_context will be used to filter the node pool to define the specific
>>>> nodes for use by that app_context, subject to the previous caveat. However,
>>>> add-hostfile and add-host options will add nodes to the node pool for use
>>>> -only- by the associated app_context.
>>>>
>>>>
>>>> I believe this proposed behavior is consistent with that described on the
>>>> wiki, and would be relatively easy to implement. If nobody objects, I will
>>>> do so by end-of-day 3/6.
>>>>
>>>> Comments, suggestions, objections - all are welcome!
>>>> Ralph
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel