I personally have no objection, but I would ask then that the wiki be
modified to cover this case. All I require is that someone define the syntax
to be used to indicate "this is a node I do -not- want used", or
alternatively a flag that indicates "all nodes below are -not- to be used".
Implementation isn't too hard once I have that...
On 3/3/08 9:44 AM, "Edgar Gabriel" <gabriel_at_[hidden]> wrote:
> could this mechanism be used also to exclude a node, indicating to never
> run a job there? Here is the problem that I face quite often: students
> working on the homework forget to allocate a partition on the cluster,
> and just type mpirun. Because of that, all jobs end up running on the
> front-end node.
> If we would have now the ability to specify in a default hostfile, to
> never run a job on a specified node (e.g. the front end node), users
> would get an error message when trying to do that. I am aware that
> that's a little ugly...
> Ralph Castain wrote:
>> I forget all the formatting we are supposed to use, so I hope you'll all
>> just bear with me.
>> George brought up the fact that we used to have an MCA param to specify a
>> hostfile to use for a job. The hostfile behavior described on the wiki,
>> however, doesn't provide for that option. It associates a hostfile with a
>> specific app_context, and provides a detailed hierarchical layout of how
>> mpirun is to interpret that information.
>> What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
>> to replace the deprecated capability. If found, the system's behavior will
>> 1. in a managed environment, the default hostfile will be used to filter the
>> discovered nodes to define the available node pool. Any hostfile and/or dash
>> host options provided to an app_context will be used to further filter the
>> node pool to define the specific nodes for use by that app_context. Thus,
>> nodes in the hostfile and dash host options given to an app_context -must-
>> also be in the default hostfile in order to be available for use by that
>> app_context - any nodes in the app_context options that are not in the
>> default hostfile will be ignored.
>> 2. in an unmanaged environment, the default hostfile will be used to define
>> the available node pool. Any hostfile and/or dash host options provided to
>> an app_context will be used to filter the node pool to define the specific
>> nodes for use by that app_context, subject to the previous caveat. However,
>> add-hostfile and add-host options will add nodes to the node pool for use
>> -only- by the associated app_context.
>> I believe this proposed behavior is consistent with that described on the
>> wiki, and would be relatively easy to implement. If nobody objects, I will
>> do so by end-of-day 3/6.
>> Comments, suggestions, objections - all are welcome!
>> devel mailing list