Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] using hnp_always_use_plm
From: Damien Guinier (damien.guinier_at_[hidden])
Date: 2009-12-18 11:10:31


Sorry , I am not clear. That true, the same PLM is use on all node.
my parameter name is incorrect , "mpirun_not_as_orted" is better. My
problem is simple:
- I want "mpirun" haven't the "orted" launch feature.
- To create processes on the "mpirun node", I want launch by "plm" an
"orted" ( on this "mpirun node"), and ask orted, to create processes.

With this way, process tracker, debug tools have no difference between
nodes.

Sorry for this confusion.
Damien

Ralph Castain a écrit :
> It isn't necessary. The orted already will open and use the local plm
> if you simply set OMPI_MCA_plm=foo in its environment. The rsh, tm,
> and slurm plm modules already do this so that they can execute a
> tree-like spawn (for rsh) and because I needed ssh on the backend
> nodes to locally launch "slaves" on RoadRunner and other machines.
>
> The required code (already in those modules) is:
>
> /* enable local launch by the orteds */
> var = mca_base_param_environ_variable("plm", NULL, NULL);
> opal_setenv(var, "rsh", true, &env);
> free(var);
>
>
> You don't want the orted using the hnp ess module as it will then try
> to track its own launches and totally forget that it is a remote orted
> with slightly different responsibilities.
>
> If you need it to execute a different plm on the backend, please let
> me know - it is a trivial change to allow specification of remote
> launch agents, and we should do it for them all if we do.
>
> Ralph
>
> On Dec 18, 2009, at 7:43 AM, Damien Guinier wrote:
>
>> Hi Ralph
>>
>> On Openmpi, I working on a new little feature: hnp_always_use_plm.
>> - To create final application , mpirun use on remote "orted via plm:
>> Process lifecycle managment module" or localy "fork()". So the first
>> compute node haven't the same methode than other compute node. Some
>> debug tools(padb ...) and management tools (squeus -s ...) are
>> impacted by this difference.
>> To simplify this cluster tools usage, I propose to add the
>> possibility to use "orted via plm" on remote and localy.
>>
>> I make a patch to add the parameter
>> "OMPI_MCA_ess_hnp_always_use_plm", to use the "plm" module
>> everywhere. On my patch , by default nothing is changed ( no impact).
>>
>> Can you say to me , if this feature( and the patch) is good ?
>>
>> thank you
>>
>> Damien
>>
>> diff orte/mca/ess/hnp/ess_hnp.h
>> --- a/orte/mca/ess/hnp/ess_hnp.h Tue Dec 15 15:31:24 2009 +0100
>> +++ b/orte/mca/ess/hnp/ess_hnp.h Tue Dec 15 18:19:18 2009 +0100
>> @@ -27,7 +27,7 @@
>> int orte_ess_hnp_component_open(void);
>> int orte_ess_hnp_component_close(void);
>> int orte_ess_hnp_component_query(mca_base_module_t **module, int
>> *priority);
>> -
>> +extern int mca_ess_hnp_always_use_plm;
>>
>> ORTE_MODULE_DECLSPEC extern orte_ess_base_component_t
>> mca_ess_hnp_component;
>>
>> diff orte/mca/ess/hnp/ess_hnp_component.c
>> --- a/orte/mca/ess/hnp/ess_hnp_component.c Tue Dec 15 15:31:24
>> 2009 +0100
>> +++ b/orte/mca/ess/hnp/ess_hnp_component.c Tue Dec 15 18:19:18
>> 2009 +0100
>> @@ -33,6 +33,7 @@
>> #include "orte/mca/ess/hnp/ess_hnp.h"
>>
>> extern orte_ess_base_module_t orte_ess_hnp_module;
>> +int mca_ess_hnp_always_use_plm = 0;
>>
>> /*
>> * Instantiate the public struct with all of our public information
>> @@ -63,6 +64,10 @@
>> int
>> orte_ess_hnp_component_open(void)
>> {
>> +
>> mca_base_param_reg_int(&mca_ess_hnp_component.base_version,
>> + "always_use_plm",
>> + "Used to force plm on all machine",
>> + false,false, mca_ess_hnp_always_use_plm
>> ,&mca_ess_hnp_always_use_plm);
>> return ORTE_SUCCESS;
>> }
>>
>> diff orte/mca/ess/hnp/ess_hnp_module.c
>> --- a/orte/mca/ess/hnp/ess_hnp_module.c Tue Dec 15 15:31:24 2009 +0100
>> +++ b/orte/mca/ess/hnp/ess_hnp_module.c Tue Dec 15 18:19:18 2009 +0100
>> @@ -442,9 +442,12 @@
>> * node object
>> */
>> OBJ_RETAIN(proc); /* keep accounting straight */
>> + if(mca_ess_hnp_always_use_plm==0)
>> + {
>> node->daemon = proc;
>> node->daemon_launched = true;
>> node->state = ORTE_NODE_STATE_UP;
>> + }
>>
>> /* record that the daemon job is running */
>> jdata->num_procs = 1;
>>
>