Tim Prins wrote:
> We have used '^' elsewhere to indicate not, so maybe just have the
> syntax be if you put '^' at the beginning of a line, that node is not used.
> So we could have:
this would sound fine for me.
> I understand the idea of having a flag to indicate that all nodes below
> a certain point should be ignored, but I think this might get confusing,
> and I'm unsure how useful it would be. I just see the usefulness of this
> to block out a couple of nodes by default. Besides, if you do want to
> block out many nodes, any reasonable text editor allows you to insert
> '^' in front of any number of lines easily.
> Alternatively, for the particular situation that Edgar mentions, it may
> be good enough just to set rmaps_base_no_schedule_local in the mca
> params default file.
hm, ok, here is another flag which I was not aware of. Anyway, I can
think of other scenarios where this feature could be useful, e.g. when
hunting down performance problems on a cluster and you would like to
avoid to have to get a new allocation or do a major rewrite of the
hostfile every time. Or including an I/O node into an allocation (in
order to have it exclusively), but make sure that no MPI process gets
scheduled onto the node.
> One question though: If I am in a slurm allocation which contains n1,
> and there is a default hostfile that contains "^n1", will I run on 'n1'?
> I'm not sure what the answer is, I know we talked about the precedence
> Ralph H Castain wrote:
>> I personally have no objection, but I would ask then that the wiki be
>> modified to cover this case. All I require is that someone define the syntax
>> to be used to indicate "this is a node I do -not- want used", or
>> alternatively a flag that indicates "all nodes below are -not- to be used".
>> Implementation isn't too hard once I have that...
>> On 3/3/08 9:44 AM, "Edgar Gabriel" <gabriel_at_[hidden]> wrote:
>>> could this mechanism be used also to exclude a node, indicating to never
>>> run a job there? Here is the problem that I face quite often: students
>>> working on the homework forget to allocate a partition on the cluster,
>>> and just type mpirun. Because of that, all jobs end up running on the
>>> front-end node.
>>> If we would have now the ability to specify in a default hostfile, to
>>> never run a job on a specified node (e.g. the front end node), users
>>> would get an error message when trying to do that. I am aware that
>>> that's a little ugly...
>>> Ralph Castain wrote:
>>>> I forget all the formatting we are supposed to use, so I hope you'll all
>>>> just bear with me.
>>>> George brought up the fact that we used to have an MCA param to specify a
>>>> hostfile to use for a job. The hostfile behavior described on the wiki,
>>>> however, doesn't provide for that option. It associates a hostfile with a
>>>> specific app_context, and provides a detailed hierarchical layout of how
>>>> mpirun is to interpret that information.
>>>> What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
>>>> to replace the deprecated capability. If found, the system's behavior will
>>>> 1. in a managed environment, the default hostfile will be used to filter the
>>>> discovered nodes to define the available node pool. Any hostfile and/or dash
>>>> host options provided to an app_context will be used to further filter the
>>>> node pool to define the specific nodes for use by that app_context. Thus,
>>>> nodes in the hostfile and dash host options given to an app_context -must-
>>>> also be in the default hostfile in order to be available for use by that
>>>> app_context - any nodes in the app_context options that are not in the
>>>> default hostfile will be ignored.
>>>> 2. in an unmanaged environment, the default hostfile will be used to define
>>>> the available node pool. Any hostfile and/or dash host options provided to
>>>> an app_context will be used to filter the node pool to define the specific
>>>> nodes for use by that app_context, subject to the previous caveat. However,
>>>> add-hostfile and add-host options will add nodes to the node pool for use
>>>> -only- by the associated app_context.
>>>> I believe this proposed behavior is consistent with that described on the
>>>> wiki, and would be relatively easy to implement. If nobody objects, I will
>>>> do so by end-of-day 3/6.
>>>> Comments, suggestions, objections - all are welcome!
>>>> devel mailing list
>> devel mailing list
> devel mailing list
Parallel Software Technologies Lab http://pstl.cs.uh.edu
Department of Computer Science University of Houston
Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335