Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] [RFC] Default hostfile MCA param
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2008-03-04 11:05:07

Tim Prins wrote:
> We have used '^' elsewhere to indicate not, so maybe just have the
> syntax be if you put '^' at the beginning of a line, that node is not used.
> So we could have:
> n0
> n1
> ^headnode
> n3

this would sound fine for me.

> I understand the idea of having a flag to indicate that all nodes below
> a certain point should be ignored, but I think this might get confusing,
> and I'm unsure how useful it would be. I just see the usefulness of this
> to block out a couple of nodes by default. Besides, if you do want to
> block out many nodes, any reasonable text editor allows you to insert
> '^' in front of any number of lines easily.
> Alternatively, for the particular situation that Edgar mentions, it may
> be good enough just to set rmaps_base_no_schedule_local in the mca
> params default file.

hm, ok, here is another flag which I was not aware of. Anyway, I can
think of other scenarios where this feature could be useful, e.g. when
hunting down performance problems on a cluster and you would like to
avoid to have to get a new allocation or do a major rewrite of the
hostfile every time. Or including an I/O node into an allocation (in
order to have it exclusively), but make sure that no MPI process gets
scheduled onto the node.


> One question though: If I am in a slurm allocation which contains n1,
> and there is a default hostfile that contains "^n1", will I run on 'n1'?
> I'm not sure what the answer is, I know we talked about the precedence
> earlier...
> Tim
> Ralph H Castain wrote:
>> I personally have no objection, but I would ask then that the wiki be
>> modified to cover this case. All I require is that someone define the syntax
>> to be used to indicate "this is a node I do -not- want used", or
>> alternatively a flag that indicates "all nodes below are -not- to be used".
>> Implementation isn't too hard once I have that...
>> On 3/3/08 9:44 AM, "Edgar Gabriel" <gabriel_at_[hidden]> wrote:
>>> Ralph,
>>> could this mechanism be used also to exclude a node, indicating to never
>>> run a job there? Here is the problem that I face quite often: students
>>> working on the homework forget to allocate a partition on the cluster,
>>> and just type mpirun. Because of that, all jobs end up running on the
>>> front-end node.
>>> If we would have now the ability to specify in a default hostfile, to
>>> never run a job on a specified node (e.g. the front end node), users
>>> would get an error message when trying to do that. I am aware that
>>> that's a little ugly...
>>> THanks
>>> edgar
>>> Ralph Castain wrote:
>>>> I forget all the formatting we are supposed to use, so I hope you'll all
>>>> just bear with me.
>>>> George brought up the fact that we used to have an MCA param to specify a
>>>> hostfile to use for a job. The hostfile behavior described on the wiki,
>>>> however, doesn't provide for that option. It associates a hostfile with a
>>>> specific app_context, and provides a detailed hierarchical layout of how
>>>> mpirun is to interpret that information.
>>>> What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
>>>> to replace the deprecated capability. If found, the system's behavior will
>>>> be:
>>>> 1. in a managed environment, the default hostfile will be used to filter the
>>>> discovered nodes to define the available node pool. Any hostfile and/or dash
>>>> host options provided to an app_context will be used to further filter the
>>>> node pool to define the specific nodes for use by that app_context. Thus,
>>>> nodes in the hostfile and dash host options given to an app_context -must-
>>>> also be in the default hostfile in order to be available for use by that
>>>> app_context - any nodes in the app_context options that are not in the
>>>> default hostfile will be ignored.
>>>> 2. in an unmanaged environment, the default hostfile will be used to define
>>>> the available node pool. Any hostfile and/or dash host options provided to
>>>> an app_context will be used to filter the node pool to define the specific
>>>> nodes for use by that app_context, subject to the previous caveat. However,
>>>> add-hostfile and add-host options will add nodes to the node pool for use
>>>> -only- by the associated app_context.
>>>> I believe this proposed behavior is consistent with that described on the
>>>> wiki, and would be relatively easy to implement. If nobody objects, I will
>>>> do so by end-of-day 3/6.
>>>> Comments, suggestions, objections - all are welcome!
>>>> Ralph
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
> _______________________________________________
> devel mailing list
> devel_at_[hidden]

Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335