Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [RFC] Default hostfile MCA param
From: Tim Prins (tprins_at_[hidden])
Date: 2008-03-04 07:51:57


We have used '^' elsewhere to indicate not, so maybe just have the
syntax be if you put '^' at the beginning of a line, that node is not used.

So we could have:
n0
n1
^headnode
n3

I understand the idea of having a flag to indicate that all nodes below
a certain point should be ignored, but I think this might get confusing,
and I'm unsure how useful it would be. I just see the usefulness of this
to block out a couple of nodes by default. Besides, if you do want to
block out many nodes, any reasonable text editor allows you to insert
'^' in front of any number of lines easily.

Alternatively, for the particular situation that Edgar mentions, it may
be good enough just to set rmaps_base_no_schedule_local in the mca
params default file.

One question though: If I am in a slurm allocation which contains n1,
and there is a default hostfile that contains "^n1", will I run on 'n1'?

I'm not sure what the answer is, I know we talked about the precedence
earlier...

Tim

Ralph H Castain wrote:
> I personally have no objection, but I would ask then that the wiki be
> modified to cover this case. All I require is that someone define the syntax
> to be used to indicate "this is a node I do -not- want used", or
> alternatively a flag that indicates "all nodes below are -not- to be used".
>
> Implementation isn't too hard once I have that...
>
>
> On 3/3/08 9:44 AM, "Edgar Gabriel" <gabriel_at_[hidden]> wrote:
>
>> Ralph,
>>
>> could this mechanism be used also to exclude a node, indicating to never
>> run a job there? Here is the problem that I face quite often: students
>> working on the homework forget to allocate a partition on the cluster,
>> and just type mpirun. Because of that, all jobs end up running on the
>> front-end node.
>>
>> If we would have now the ability to specify in a default hostfile, to
>> never run a job on a specified node (e.g. the front end node), users
>> would get an error message when trying to do that. I am aware that
>> that's a little ugly...
>>
>> THanks
>> edgar
>>
>> Ralph Castain wrote:
>>> I forget all the formatting we are supposed to use, so I hope you'll all
>>> just bear with me.
>>>
>>> George brought up the fact that we used to have an MCA param to specify a
>>> hostfile to use for a job. The hostfile behavior described on the wiki,
>>> however, doesn't provide for that option. It associates a hostfile with a
>>> specific app_context, and provides a detailed hierarchical layout of how
>>> mpirun is to interpret that information.
>>>
>>> What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
>>> to replace the deprecated capability. If found, the system's behavior will
>>> be:
>>>
>>> 1. in a managed environment, the default hostfile will be used to filter the
>>> discovered nodes to define the available node pool. Any hostfile and/or dash
>>> host options provided to an app_context will be used to further filter the
>>> node pool to define the specific nodes for use by that app_context. Thus,
>>> nodes in the hostfile and dash host options given to an app_context -must-
>>> also be in the default hostfile in order to be available for use by that
>>> app_context - any nodes in the app_context options that are not in the
>>> default hostfile will be ignored.
>>>
>>> 2. in an unmanaged environment, the default hostfile will be used to define
>>> the available node pool. Any hostfile and/or dash host options provided to
>>> an app_context will be used to filter the node pool to define the specific
>>> nodes for use by that app_context, subject to the previous caveat. However,
>>> add-hostfile and add-host options will add nodes to the node pool for use
>>> -only- by the associated app_context.
>>>
>>>
>>> I believe this proposed behavior is consistent with that described on the
>>> wiki, and would be relatively easy to implement. If nobody objects, I will
>>> do so by end-of-day 3/6.
>>>
>>> Comments, suggestions, objections - all are welcome!
>>> Ralph
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel