could this mechanism be used also to exclude a node, indicating to never
run a job there? Here is the problem that I face quite often: students
working on the homework forget to allocate a partition on the cluster,
and just type mpirun. Because of that, all jobs end up running on the
If we would have now the ability to specify in a default hostfile, to
never run a job on a specified node (e.g. the front end node), users
would get an error message when trying to do that. I am aware that
that's a little ugly...
Ralph Castain wrote:
> I forget all the formatting we are supposed to use, so I hope you'll all
> just bear with me.
> George brought up the fact that we used to have an MCA param to specify a
> hostfile to use for a job. The hostfile behavior described on the wiki,
> however, doesn't provide for that option. It associates a hostfile with a
> specific app_context, and provides a detailed hierarchical layout of how
> mpirun is to interpret that information.
> What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
> to replace the deprecated capability. If found, the system's behavior will
> 1. in a managed environment, the default hostfile will be used to filter the
> discovered nodes to define the available node pool. Any hostfile and/or dash
> host options provided to an app_context will be used to further filter the
> node pool to define the specific nodes for use by that app_context. Thus,
> nodes in the hostfile and dash host options given to an app_context -must-
> also be in the default hostfile in order to be available for use by that
> app_context - any nodes in the app_context options that are not in the
> default hostfile will be ignored.
> 2. in an unmanaged environment, the default hostfile will be used to define
> the available node pool. Any hostfile and/or dash host options provided to
> an app_context will be used to filter the node pool to define the specific
> nodes for use by that app_context, subject to the previous caveat. However,
> add-hostfile and add-host options will add nodes to the node pool for use
> -only- by the associated app_context.
> I believe this proposed behavior is consistent with that described on the
> wiki, and would be relatively easy to implement. If nobody objects, I will
> do so by end-of-day 3/6.
> Comments, suggestions, objections - all are welcome!
> devel mailing list
Parallel Software Technologies Lab http://pstl.cs.uh.edu
Department of Computer Science University of Houston
Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335