Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r21513
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-06-24 19:48:32


Just to be specific, here is how we handle the orte_launch_agent in
rsh that makes it work:

     /* now get the orted cmd - as specified by user - into our tmp
array.
      * The function returns the location where the actual orted
command is
      * located - usually in the final spot, but someone could
      * have added options. For example, it should be legal for them
to use
      * "orted --debug-devel" so they get debug output from the
orteds, but
      * not from mpirun. Also, they may have a customized version of
orted
      * that takes arguments in addition to the std ones we already
support
      */
     orted_argc = 0;
     orted_argv = NULL;
     orted_index = orte_plm_base_setup_orted_cmd(&orted_argc,
&orted_argv);

     /* look at the returned orted cmd argv to check several cases:
      *
      * - only "orted" was given. This is the default and thus most
common
      * case. In this situation, there is nothing we need to do
      *
      * - something was given that doesn't include "orted" - i.e.,
someone
      * has substituted their own daemon. There isn't anything we can
      * do here, so we want to avoid adding prefixes to the cmd
      *
      * - something was given that precedes "orted". For example,
someone
      * may have specified "valgrind [options] orted". In this case,
we
      * need to separate out that "orted_prefix" section so it can be
      * treated separately below
      *
      * - something was given that follows "orted". An example was
given above.
      * In this case, we need to construct the effective "orted_cmd"
so it
      * can be treated properly below
      *
      * Obviously, the latter two cases can be combined - just to make
it
      * even more interesting! Gotta love rsh/ssh...
      */
     if (0 == orted_index) {
         /* this is the default scenario, but there could be options
specified
          * so we need to account for that possibility
          */
         orted_cmd = opal_argv_join(orted_argv, ' ');
         orted_prefix = NULL;
     } else if (0 > orted_index) {
         /* no "orted" was included */
         orted_cmd = NULL;
         orted_prefix = opal_argv_join(orted_argv, ' ');
     } else {
         /* okay, so the "orted" cmd is somewhere in this array, with
          * something preceding it and perhaps things following it.
          */
         orted_prefix = opal_argv_join_range(orted_argv, 0,
orted_index, ' ');
         orted_cmd = opal_argv_join_range(orted_argv, orted_index,
opal_argv_count(orted_argv), ' ');
     }
     opal_argv_free(orted_argv); /* done with this */

     /* we now need to assemble the actual cmd that will be executed -
this depends
      * upon whether or not a prefix directory is being used
      */

As noted in prior email:

int orte_plm_base_setup_orted_cmd(int *argc, char ***argv)
{
     int i, loc;
     char **tmpv;

     /* set default location */
     loc = -1;
     /* split the command apart in case it is multi-word */
     tmpv = opal_argv_split(orte_launch_agent, ' ');
     for (i = 0; NULL != tmpv && NULL != tmpv[i]; ++i) {
         if (0 == strcmp(tmpv[i], "orted")) {
             loc = i;
         }
         opal_argv_append(argc, argv, tmpv[i]);
     }
     opal_argv_free(tmpv);

     return loc;
}

So as you can see, we deliberately split the cmd apart and reassemble
it to allow for any variation of the orted cmd you might like to use.
This was done because we can't support it in all environments in a
generic sense - every variant we did failed in at least one
environment, with either not enough quotes or too many.

We didn't do this just for the heck of it. Several of us spent a bunch
of time testing all environments, trying to find a way to support this
capability. After a lot of pain, we finally developed this method that
has been working for well over a year.

I really would rather not waste a lot of my time re-visiting this
rather lengthy demonstration/argument cycle again. For the purposes of
your tree spawn, the existing capability (prior to your commit) should
meet all requirements. You may have to do some work to ensure that the
child daemons properly flow through the provided code, but you most
certainly don't need the change made to the base functions.

So why don't we revert just that piece out for now so it quits
breaking existing functionality? You will find similar code already
exists in the rsh launcher anyway - see lines 673 and following. All
you have to do is enable those lines for daemons as well as the HNP so
that the params get passed to your tree children.

We can then continue this argument at leisure while you take us
through all the prior attempts and show how we were wrong.

I would just rather not derail everything I'm doing to go through this
yet again - especially when it isn't necessary.

Thanks
Ralph

On Jun 24, 2009, at 4:05 PM, George Bosilca wrote:

> Just for the sake of it. A funy command line to try:
>
> [bosilca_at_dancer ~]$ mpirun --mca routed_base_verbose 0 --leave-
> session-attached -np 1 --mca orte_launch_agent "orted --mca
> routed_base_verbose 1" uptime
>
> [node03:22355] [[14661,0],1] routed_linear: init routes for daemon
> job [14661,0]
> hnp_uri 960823296.0;tcp://192.168.1.254:58135;tcp://192.168.0.2:58135
> 18:02:59 up 26 days, 17:41, 0 users, load average: 0.97, 0.50, 0.53
> [bosilca_at_dancer ~]$ [node03:22355] [[14661,0],1]
> routed_linear_get([[14661,0],0]) --> [[14661,0],0]
> [node03:22355] [[14661,0],1] routed_linear: init routes for daemon
> job [14661,0]
> hnp_uri 960823296.0;tcp://192.168.1.254:58135;tcp://192.168.0.2:58135
> [node03:22355] [[14661,0],1] routed_linear_get([[14661,0],0]) -->
> [[14661,0],0]
> [node03:22355] [[14661,0],1] routed_linear_get([[14661,0],0]) -->
> [[14661,0],0]
> [node03:22355] [[14661,0],1] routed_linear_get([[14661,0],0]) -->
> [[14661,0],0]
>
> This set the routed_base_verbose to zero for the HNP, and to 1 for
> everybody else. As you can see from the output the orted output
> routed information which means it correctly interpreted the
> multiword argument.
>
> george.
>
> On Jun 24, 2009, at 17:52 , George Bosilca wrote:
>
>>
>> On Jun 24, 2009, at 17:41 , Jeff Squyres wrote:
>>
>>> -----
>>> [14:38] svbu-mpi:~/svn/ompi/orte % mpirun --mca plm_base_verbose
>>> 100 --leave-session-attached -np 1 --mca orte_launch_agent "$bogus/
>>> bin/orted -s" uptime
>>> ...lots of output...
>>> srun --nodes=1 --ntasks=1 --kill-on-bad-exit --nodelist=svbu-
>>> mpi062 /home/jsquyres/bogus/bin/orted -s -mca ess slurm -mca
>>> orte_ess_jobid 3195142144 -mca orte_ess_vpid 1 -mca
>>> orte_ess_num_procs 2 --hnp-uri "3195142144.0;tcp://
>>> 172.29.218.140:34489;tcp://10.10.20.250:34489;tcp://
>>> 10.10.30.250:34489;tcp://192.168.183.1:34489;tcp://
>>> 192.168.184.1:34489" -mca orte_nodelist svbu-mpi062 --mca
>>> plm_base_verbose 100 --mca orte_launch_agent "/home/jsquyres/bogus/
>>> bin/orted -s"
>>> ...
>>> -----
>>>
>>> and it hangs, because the argv[0]
>>>
>>> "/home/jsquyres/bogus/bin/orted -s"
>>>
>>> (including the quotes!) cannot be exec'ed.
>>
>> OK so maybe the -s option was a bad example (it's the one I use
>> regularly). It block the orted, you will have to log on each node,
>> attach with gdb to the orted, and release them by doing a "set
>> orted_spin_flag=0".
>>
>> george.
>>
>>>
>>>
>>>
>>>
>>> On Jun 24, 2009, at 5:15 PM, George Bosilca wrote:
>>>
>>>> I can't guarantee this for all PLM but I can confirm that rsh and
>>>> slurm (1.3.12) works well with this.
>>>>
>>>> We try with and without Open MPI, and the outcome is the same.
>>>>
>>>> [bosilca_at_dancer c]$ srun -n 4 echo "1 2 3 4 5 it works"
>>>> 1 2 3 4 5 it works
>>>> 1 2 3 4 5 it works
>>>> 1 2 3 4 5 it works
>>>> 1 2 3 4 5 it works
>>>>
>>>> [bosilca_at_dancer c]$ srun -N 2 -c 2 mpirun --mca plm slurm --mca
>>>> orte_launch_agent "orted -s" --mca plm_rsh_tree_spawn 1 --bynode
>>>> --mca
>>>> pml ob1 --mca orte_daemon_spin 0 ./hello
>>>> Hello, world, I am 0 of 2 on node03
>>>> Hello, world, I am 1 of 2 on node04
>>>>
>>>> *after releasing the orted from their spin.
>>>>
>>>> In fact what I find strange is the old behavior. Dropping arguments
>>>> without even letting the user know about it, is certainly not a
>>>> desirable approach.
>>>>
>>>> george.
>>>>
>>>> On Jun 24, 2009, at 16:15 , Ralph Castain wrote:
>>>>
>>>> > Yo George
>>>> >
>>>> > This commit is going to break non-rsh launchers. While it is true
>>>> > that the rsh launcher may handle multi-word options by putting
>>>> them
>>>> > in quotes, we specifically avoided it here because it breaks
>>>> SLURM,
>>>> > Torque, and others.
>>>> >
>>>> > This is why we specifically put the inclusion of multi-word
>>>> options
>>>> > in the rsh plm module, and not here. Would you please move it
>>>> back
>>>> > there?
>>>> >
>>>> > Thanks
>>>> > Ralph
>>>> >
>>>> >
>>>> > On Wed, Jun 24, 2009 at 1:51 PM, <bosilca_at_[hidden]> wrote:
>>>> > Author: bosilca
>>>> > Date: 2009-06-24 15:51:52 EDT (Wed, 24 Jun 2009)
>>>> > New Revision: 21513
>>>> > URL: https://svn.open-mpi.org/trac/ompi/changeset/21513
>>>> >
>>>> > Log:
>>>> > When we get a report from an orted about its state, don't use the
>>>> > sender of
>>>> > the message to update the structures, but instead use the
>>>> > information from
>>>> > the URI. The reason is that even the launch report messages can
>>>> get
>>>> > routed.
>>>> >
>>>> > Deal with the orted_cmd_line in a single location.
>>>> >
>>>> > Text files modified:
>>>> > trunk/orte/mca/plm/base/plm_base_launch_support.c | 69 ++++
>>>> +++++
>>>> > ++++++++++++++----------------
>>>> > 1 files changed, 41 insertions(+), 28 deletions(-)
>>>> >
>>>> > Modified: trunk/orte/mca/plm/base/plm_base_launch_support.c
>>>> > =
>>>> > =
>>>> > =
>>>> > =
>>>> > =
>>>> > =
>>>> > =
>>>> > =
>>>> >
>>>> =
>>>> =
>>>> =
>>>> ===================================================================
>>>> > --- trunk/orte/mca/plm/base/plm_base_launch_support.c
>>>> (original)
>>>> > +++ trunk/orte/mca/plm/base/plm_base_launch_support.c
>>>> 2009-06-24
>>>> > 15:51:52 EDT (Wed, 24 Jun 2009)
>>>> > @@ -433,7 +433,8 @@
>>>> > {
>>>> > orte_message_event_t *mev = (orte_message_event_t*)data;
>>>> > opal_buffer_t *buffer = mev->buffer;
>>>> > - char *rml_uri;
>>>> > + orte_process_name_t peer;
>>>> > + char *rml_uri = NULL;
>>>> > int rc, idx;
>>>> > int32_t arch;
>>>> > orte_node_t **nodes;
>>>> > @@ -442,19 +443,11 @@
>>>> > int64_t setupsec, setupusec;
>>>> > int64_t startsec, startusec;
>>>> >
>>>> > - OPAL_OUTPUT_VERBOSE((5, orte_plm_globals.output,
>>>> > - "%s plm:base:orted_report_launch from
>>>> > daemon %s",
>>>> > - ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>>>> > - ORTE_NAME_PRINT(&mev->sender)));
>>>> > -
>>>> > /* see if we need to timestamp this receipt */
>>>> > if (orte_timing) {
>>>> > gettimeofday(&recvtime, NULL);
>>>> > }
>>>> >
>>>> > - /* update state */
>>>> > - pdatorted[mev->sender.vpid]->state =
>>>> ORTE_PROC_STATE_RUNNING;
>>>> > -
>>>> > /* unpack its contact info */
>>>> > idx = 1;
>>>> > if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &rml_uri,
>>>> > &idx, OPAL_STRING))) {
>>>> > @@ -466,13 +459,26 @@
>>>> > /* set the contact info into the hash table */
>>>> > if (ORTE_SUCCESS != (rc =
>>>> orte_rml.set_contact_info(rml_uri))) {
>>>> > ORTE_ERROR_LOG(rc);
>>>> > - free(rml_uri);
>>>> > orted_failed_launch = true;
>>>> > goto CLEANUP;
>>>> > }
>>>> > - /* lookup and record this daemon's contact info */
>>>> > - pdatorted[mev->sender.vpid]->rml_uri = strdup(rml_uri);
>>>> > - free(rml_uri);
>>>> > +
>>>> > + rc = orte_rml_base_parse_uris(rml_uri, &peer, NULL );
>>>> > + if( ORTE_SUCCESS != rc ) {
>>>> > + ORTE_ERROR_LOG(rc);
>>>> > + orted_failed_launch = true;
>>>> > + goto CLEANUP;
>>>> > + }
>>>> > +
>>>> > + OPAL_OUTPUT_VERBOSE((5, orte_plm_globals.output,
>>>> > + "%s plm:base:orted_report_launch from
>>>> > daemon %s via %s",
>>>> > + ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>>>> > + ORTE_NAME_PRINT(&peer),
>>>> > + ORTE_NAME_PRINT(&mev->sender)));
>>>> > +
>>>> > + /* update state and record for this daemon contact info */
>>>> > + pdatorted[peer.vpid]->state = ORTE_PROC_STATE_RUNNING;
>>>> > + pdatorted[peer.vpid]->rml_uri = rml_uri;
>>>> >
>>>> > /* get the remote arch */
>>>> > idx = 1;
>>>> > @@ -555,31 +561,33 @@
>>>> >
>>>> > /* lookup the node */
>>>> > nodes = (orte_node_t**)orte_node_pool->addr;
>>>> > - if (NULL == nodes[mev->sender.vpid]) {
>>>> > + if (NULL == nodes[peer.vpid]) {
>>>> > ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
>>>> > orted_failed_launch = true;
>>>> > goto CLEANUP;
>>>> > }
>>>> > /* store the arch */
>>>> > - nodes[mev->sender.vpid]->arch = arch;
>>>> > + nodes[peer.vpid]->arch = arch;
>>>> >
>>>> > /* if a tree-launch is underway, send the cmd back */
>>>> > if (NULL != orte_tree_launch_cmd) {
>>>> > - orte_rml.send_buffer(&mev->sender, orte_tree_launch_cmd,
>>>> > ORTE_RML_TAG_DAEMON, 0);
>>>> > + orte_rml.send_buffer(&peer, orte_tree_launch_cmd,
>>>> > ORTE_RML_TAG_DAEMON, 0);
>>>> > }
>>>> >
>>>> > CLEANUP:
>>>> >
>>>> > OPAL_OUTPUT_VERBOSE((5, orte_plm_globals.output,
>>>> > - "%s plm:base:orted_report_launch %s for
>>>> > daemon %s at contact %s",
>>>> > + "%s plm:base:orted_report_launch %s for
>>>> > daemon %s (via %s) at contact %s",
>>>> > ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>>>> > orted_failed_launch ? "failed" :
>>>> "completed",
>>>> > - ORTE_NAME_PRINT(&mev->sender),
>>>> > pdatorted[mev->sender.vpid]->rml_uri));
>>>> > + ORTE_NAME_PRINT(&peer),
>>>> > + ORTE_NAME_PRINT(&mev->sender),
>>>> > pdatorted[peer.vpid]->rml_uri));
>>>> >
>>>> > /* release the message */
>>>> > OBJ_RELEASE(mev);
>>>> >
>>>> > if (orted_failed_launch) {
>>>> > + if( NULL != rml_uri ) free(rml_uri);
>>>> > orte_errmgr.incomplete_start(ORTE_PROC_MY_NAME->jobid,
>>>> > ORTE_ERROR_DEFAULT_EXIT_CODE);
>>>> > } else {
>>>> > orted_num_callback++;
>>>> > @@ -1133,18 +1141,23 @@
>>>> > * being sure to "purge" any that would cause problems
>>>> > * on backend nodes
>>>> > */
>>>> > - if (ORTE_PROC_IS_HNP) {
>>>> > + if (ORTE_PROC_IS_HNP || ORTE_PROC_IS_DAEMON) {
>>>> > cnt = opal_argv_count(orted_cmd_line);
>>>> > for (i=0; i < cnt; i+=3) {
>>>> > - /* if the specified option is more than one word, we
>>>> > don't
>>>> > - * have a generic way of passing it as some
>>>> > environments ignore
>>>> > - * any quotes we add, while others don't - so we
>>>> ignore
>>>> > any
>>>> > - * such options. In most cases, this won't be a
>>>> problem
>>>> > as
>>>> > - * they typically only apply to things of interest
>>>> to
>>>> > the HNP.
>>>> > - * Individual environments can add these back into
>>>> the
>>>> > cmd line
>>>> > - * as they know if it can be supported
>>>> > - */
>>>> > - if (NULL != strchr(orted_cmd_line[i+2], ' ')) {
>>>> > + /* in the rsh environment, we can append multi-word
>>>> > arguments
>>>> > + * by enclosing them in quotes. Check for any
>>>> multi-word
>>>> > + * mca params passed to mpirun and include them
>>>> > + */
>>>> > + if (NULL != strchr(orted_cmd_line[i+2], ' ')) {
>>>> > + char* param;
>>>> > +
>>>> > + /* must add quotes around it */
>>>> > + asprintf(&param, "\"%s\"", orted_cmd_line[i+2]);
>>>> > + /* now pass it along */
>>>> > + opal_argv_append(argc, argv, orted_cmd_line[i]);
>>>> > + opal_argv_append(argc, argv, orted_cmd_line[i
>>>> +1]);
>>>> > + opal_argv_append(argc, argv, param);
>>>> > + free(param);
>>>> > continue;
>>>> > }
>>>> > /* The daemon will attempt to open the PLM on the
>>>> remote
>>>> > _______________________________________________
>>>> > svn mailing list
>>>> > svn_at_[hidden]
>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/svn
>>>> >
>>>> > _______________________________________________
>>>> > devel mailing list
>>>> > devel_at_[hidden]
>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel