Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Multiworld MCA parameter values broken
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-11-19 10:11:35


On Nov 19, 2007, at 10:01 AM, Ralph H Castain wrote:

> Unfortunately, it -is- a significant problem when passing the params
> on to
> the orteds, as Tim has eloquently pointed out in his original
> posting. I
> guess I don't see why it would be a significant problem to just have
> opal_cmd_line_parse do the "right thing" - if a string param is
> quoted, then
> just remove the quotes.

How would opal_cmd_line_parse know when to remove quotes and when not
remove quotes? I.e., what if an MCA param (or some other value, such
as a user application argv) required quotes? This is not uncommon for
the wrapper compilers, for example (passing -D's to the C preprocessor
that take string arguments).

AFAIK, opal_cmd_line_parse() does exactly what I intended it to. :-)
It performs no shell-like quoting because it expects to be given a
list of "final" string tokens.

> Problem solved, and I wouldn't have to add a bunch of code to figure
> out
> when to assemble argv arrays in different ways.

Er... I don't understand: I joined this thread because I thought there
was still something to fix...?

I would be against changing opal_cmd_line_parse() in the manner that
you have described because it will break other things (e.g., the
wrapper compilers).

>
>
> On 11/19/07 7:52 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>
>> I guess I don't see why this is an opal_cmd_line_parse() problem.
>>
>> If you invoke executables through system(), then a shell is used and
>> quoting is necessary to preserve the individual string tokens (i.e.,
>> "my beloved string" would be passed to the application as one argv
>> token, without the quotes).
>>
>> But if you're building up an array of argv and calling some form of
>> exec(), then no shell is involved and quoting should not be
>> necessary. Specifically:
>>
>> opal_append_argv(&argc, &argv, "my beloved string");
>>
>> will be passed as one string token to the application.
>>
>> opal_cmd_line_parse() is passed an array of argv, meaning that the
>> command line have already been split into individual string
>> tokens. I
>> guess the question is whether these come in directly from the
>> arguments to main() or whether we are getting a single string and
>> breaking it up into tokens. If the latter is true, then we need to
>> re-
>> evaluate our break-into-tokens algorithm. I have a dim recollection
>> that these come in from the arguments to main(), though.
>>
>> I guess I can see where this would get complicated for rsh/ssh
>> invocations, because *both* models are used. (i.e., you exec locally
>> but it turns into a system-like invocation on the remote side). In
>> this case, I think you'll need to quote extended strings (e.g., those
>> containing spaces) for the non-local invocations not not quote it for
>> local invocations.
>>
>>
>>
>>
>> On Nov 19, 2007, at 9:19 AM, Ralph H Castain wrote:
>>
>>> My, you are joining late! ;-)
>>>
>>> The problem is with MCA params that take string arguments. If we
>>> pass the following:
>>>
>>> -mca foo "my beloved string"
>>>
>>> on the command line of an orted, we get a value for foo that
>>> includes the quote marks. I verified this rather painfully when
>>> attempting to pass a command line mca param for a uri. I eventually
>>> had to add specific code to clean the paramater up.
>>>
>>> This appears to be somehow related to the precise method used to
>>> register the param. For example, the following deprecated method
>>> works fine:
>>>
>>> On the setup end:
>>>
>>> opal_argv_append(argc, argv, "--gprreplica");
>>> if (NULL != orte_process_info.gpr_replica_uri) {
>>> contact_info = strdup(orte_process_info.gpr_replica_uri);
>>> } else {
>>> contact_info = orte_rml.get_contact_info();
>>> }
>>> asprintf(&param, "\"%s\"", contact_info);
>>> opal_argv_append(argc, argv, param);
>>>
>>> And on the receiving end:
>>> id = mca_base_param_register_string("gpr", "replica", "uri",
>>> NULL, orte_process_info.gpr_replica_uri);
>>> mca_base_param_lookup_string(id,
>>> &(orte_process_info.gpr_replica_uri));
>>> mca_base_param_set_internal(id, true);
>>>
>>>
>>> However, the following does NOT work cleanly:
>>>
>>> On the setup end:
>>> rml_uri = orte_rml.get_contact_info();
>>> asprintf(&param, "\"%s\"", rml_uri);
>>> opal_argv_append(argc, argv, "--hnp-uri");
>>> opal_argv_append(argc, argv, param);
>>>
>>> On the receiving end:
>>> mca_base_param_reg_string_name("orte", "hnp_uri",
>>> "HNP contact info",
>>> true, false, NULL, &uri);
>>>
>>> Thereby necessitating the addition of the following code to clean it
>>> up:
>>> if (NULL != uri) {
>>> /* the uri value passed to us will have quote marks around
>>> it to protect
>>> * the value if passed on the command line. We must remove
>>> those
>>> * to have a correct uri string
>>> */
>>> if ('"' == uri[0]) {
>>> /* if the first char is a quote, then so will the last
>>> one be */
>>> ptr = &uri[1];
>>> len = strlen(ptr) - 1;
>>> } else {
>>> ptr = &uri[0];
>>> len = strlen(uri);
>>> }
>>>
>>> /* we have to copy the string by hand as strndup is a GNU
>>> extension
>>> * and may not be generally available
>>> */
>>> orte_process_info.my_hnp_uri = (char*)malloc(len+1);
>>> for (i=0; i < len; i++) {
>>> orte_process_info.my_hnp_uri[i] = ptr[i];
>>> }
>>> orte_process_info.my_hnp_uri[len] = '\0'; /* NULL terminate
>>> */
>>> free(uri);
>>>
>>> }
>>>
>>> It was my understanding that you wanted us to move away from the
>>> deprecated interface – hence my comment that we cannot just quote
>>> all the strings as we would have to add this code all over the
>>> place, or (better) fix opal_cmd_line_parse.
>>>
>>> Hope that helps
>>> Ralph
>>>
>>>
>>> On 11/19/07 7:01 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>>>
>>>> Sorry -- I'm just joining this conversation late: what's the
>>>> problem
>>>> with opal_cmd_line_parse?
>>>>
>>>> It should obey all quoting from shells, etc. I.e., it shouldn't
>>> care
>>>> about tokens with special characters (to include spaces) because
>>>> the
>>>> shell divides all of that stuff up -- it just gets a char*[] that
>>>> it
>>>> treats as discrete tokens.
>>>>
>>>> Is it doing something wrong?
>>>>
>>>>
>>>>
>>>> On Nov 19, 2007, at 8:39 AM, Ralph H Castain wrote:
>>>>
>>>>> I'm not sure it is really necessary - the problem is solely within
>>>>> opal_cmd_line_parse and (if someone can parse that code ;-)) is
>>>>> truly simple
>>>>> to fix. The overly long cmd line issue is due to a bug that Josh
>>> was
>>>>> going
>>>>> to look at (may already have done so while I was out of touch).
>>>>>
>>>>> Ralph
>>>>>
>>>>>
>>>>>
>>>>> On 11/9/07 5:10 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>>>>>
>>>>>> Should there be another option for passing MCA parameters between
>>>>>> processes, such as via stdin (or any file descriptor)? I.e.,
>>> during
>>>>>> the command line parsing to check for command line MCA params,
>>>>>> perhaps
>>>>>> a new argument could be introduced: -mcauri <uri>, where <uri>
>>> could
>>>>>> be a few different forms:
>>>>>>
>>>>>> - file://stdin: (note the 2 //, not 3, so "stdin" would never
>>>>>> conflict
>>>>>> with a real file named /stdin) Read the parameters in off stdin.
>>>>>>
>>>>>> - rml://...rml contact info...: read in the MCA params via the
>>>>>> RML
>>>>>> (although I assume that reading via the RML would be *waaaay* to
>>> late
>>>>>> during the MCA setup process -- I mentioned this option for
>>>>>> completeness, even though I don't think it'll work)
>>>>>>
>>>>>> - ip://ipaddress:port: open a socket back and read the MCA
>>> params in
>>>>>> over a socket. This could have some scalability issues...? But
>>> who
>>>>>> knows; it could be tied into the hierarchical startup such that
>>>>>> we
>>>>>> wouldn't have to have an all-to-one connection scheme.
>>> Certainly it
>>>>>> would cause scalability problems when paired with today's all-to-
>>> one
>>>>>> RML connection scheme for the OOB.
>>>>>>
>>>>>> I'm not sure that the rml: and ip: schemes are worthwhile.
>>> Maybe a
>>>>>> file://stdin kind of approach could work? Or perhaps some other
>>> kind
>>>>>> of URI/IPC...? (I really haven't thought through the issues --
>>> this
>>>>>> is off the top of my head)
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote:
>>>>>>
>>>>>>> Might I suggest:
>>>>>>>
>>>>>>> https://svn.open-mpi.org/trac/ompi/ticket/1073
>>>>>>>
>>>>>>> It deals with some of these issues and explains the boundaries
>>>>>>> of
>>>>>>> the
>>>>>>> problem. As for what a string param can contain, I have no
>>> opinion.
>>>>>>> I only
>>>>>>> note that it must handle special characters such as ';', '/',
>>> etc.
>>>>>>> that are
>>>>>>> typically found in uri's. I cannot think of any reason it should
>>>>>>> have a
>>>>>>> quote in it.
>>>>>>>
>>>>>>> Ralph
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 11/8/07 12:25 PM, "Tim Prins" <tprins_at_[hidden]> wrote:
>>>>>>>
>>>>>>>> The alias option you presented does not work. I think we do
>>>>>>>> some
>>>>>>>> weird
>>>>>>>> things to find the absolute path for ssh, instead of just
>>> issuing
>>>>>>>> the
>>>>>>>> command.
>>>>>>>>
>>>>>>>> I would spend some time fixing this, but I don't want to do it
>>>>>>>> wrong. We
>>>>>>>> could quote all the param values, and change the parser to
>>> remove
>>>>>>>> the
>>>>>>>> quotes, but this is assuming that a mca param does not contain
>>>>>>>> quotes.
>>>>>>>>
>>>>>>>> So I guess there are 2 questions that need to be answered
>>> before a
>>>>>>>> fix
>>>>>>>> is made:
>>>>>>>>
>>>>>>>> 1. What exactly can a string mca param contain? Can it have
>>>>>>>> quotes or
>>>>>>>> spaces or?
>>>>>>>>
>>>>>>>> 2. Which mca parameters should be forwarded? Should it be just
>>> the
>>>>>>>> ones
>>>>>>>> from the command line? From the environment? From config files?
>>>>>>>>
>>>>>>>> Tim
>>>>>>>>
>>>>>>>> Ralph Castain wrote:
>>>>>>>>> What changed is that we never passed mca params to the orted
>>>>>>>>> before - they
>>>>>>>>> always went to the app, but it's the orted that has the issue.
>>>>>>>>> There is a
>>>>>>>>> bug ticket thread on this subject - I forget the number
>>>>>>>>> immediately.
>>>>>>>>>
>>>>>>>>> Basically, the problem was that we cannot generally pass the
>>> local
>>>>>>>>> environment to the orteds when we launch them. However, people
>>>>>>>>> needed
>>>>>>>>> various mca params to get to the orteds to control their
>>> behavior.
>>>>>>>>> The only
>>>>>>>>> way to resolve that problem was to pass the params via the
>>> command
>>>>>>>>> line,
>>>>>>>>> which is what was done.
>>>>>>>>>
>>>>>>>>> Except for a very few cases, all of our mca params are single
>>>>>>>>> values that do
>>>>>>>>> not include spaces, so this is not a problem that is causing
>>>>>>>>> widespread
>>>>>>>>> issues. As I said, I already had to deal with one special case
>>>>>>>>> that didn't
>>>>>>>>> involve spaces, but did have special characters that required
>>>>>>>>> quoting, which
>>>>>>>>> identified the larger problem of dealing with quoted strings.
>>>>>>>>>
>>>>>>>>> I have no objection to a more general fix. Like I said in my
>>> note,
>>>>>>>>> though,
>>>>>>>>> the general fix will take a larger effort. If someone is
>>> willing
>>>>>>>>> to do so,
>>>>>>>>> that is fine with me - I was only offering solutions that
>>>>>>>>> would
>>>>>>>>> fill the
>>>>>>>>> interim time as I haven't heard anyone step up to say they
>>> would
>>>>>>>>> fix it
>>>>>>>>> anytime soon.
>>>>>>>>>
>>>>>>>>> Please feel free to jump in and volunteer! ;-) I'm willing to
>>> put
>>>>>>>>> the quotes
>>>>>>>>> around things if you will fix the mca cmd line parser to
>>> cleanly
>>>>>>>>> remove them
>>>>>>>>> on the other end.
>>>>>>>>>
>>>>>>>>> Ralph
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/7/07 5:50 PM, "Tim Prins" <tprins_at_[hidden]> wrote:
>>>>>>>>>
>>>>>>>>>> I'm curious what changed to make this a problem. How were we
>>>>>>>>>> passing mca
>>>>>>>>>> param
>>>>>>>>>> from the base to the app before, and why did it change?
>>>>>>>>>>
>>>>>>>>>> I think that options 1 & 2 below are no good, since we, in
>>>>>>>>>> general, allow
>>>>>>>>>> string mca params to have spaces (as far as I understand
>>> it). So
>>>>>>>>>> a more
>>>>>>>>>> general approach is needed.
>>>>>>>>>>
>>>>>>>>>> Tim
>>>>>>>>>>
>>>>>>>>>> On Wednesday 07 November 2007 10:40:45 am Ralph H Castain
>>> wrote:
>>>>>>>>>>> Sorry for delay - wasn't ignoring the issue.
>>>>>>>>>>>
>>>>>>>>>>> There are several fixes to this problem - ranging in order
>>> from
>>>>>>>>>>> least to
>>>>>>>>>>> most work:
>>>>>>>>>>>
>>>>>>>>>>> 1. just alias "ssh" to be "ssh -Y" and run without setting
>>> the
>>>>>>>>>>> mca param.
>>>>>>>>>>> It won't affect anything on the backend because the daemon/
>>> procs
>>>>>>>>>>> don't use
>>>>>>>>>>> ssh.
>>>>>>>>>>>
>>>>>>>>>>> 2. include "pls_rsh_agent" in the array of mca params not
>>> to be
>>>>>>>>>>> passed to
>>>>>>>>>>> the orted in orte/mca/pls/base/
>>> pls_base_general_support_fns.c,
>>>>>>>>>>> the
>>>>>>>>>>> orte_pls_base_orted_append_basic_args function. This would
>>> fix
>>>>>>>>>>> the specific
>>>>>>>>>>> problem cited here, but I admit that listing every such
>>> param by
>>>>>>>>>>> name would
>>>>>>>>>>> get tedious.
>>>>>>>>>>>
>>>>>>>>>>> 3. we could easily detect that a "problem" character was in
>>> the
>>>>>>>>>>> mca param
>>>>>>>>>>> value when we add it to the orted's argv, and then put ""
>>> around
>>>>>>>>>>> it. The
>>>>>>>>>>> problem, however, is that the mca param parser on the far
>>>>>>>>>>> end
>>>>>>>>>>> doesn't
>>>>>>>>>>> remove those "" from the resulting string. At least, I spent
>>>>>>>>>>> over a day
>>>>>>>>>>> fighting with a problem only to discover that was happening.
>>>>>>>>>>> Could be an
>>>>>>>>>>> error in the way I was doing things, or could be a real
>>>>>>>>>>> characteristic of
>>>>>>>>>>> the parser. Anyway, we would have to ensure that the parser
>>>>>>>>>>> removes any
>>>>>>>>>>> surrounding "" before passing along the param value or this
>>>>>>>>>>> won't work.
>>>>>>>>>>>
>>>>>>>>>>> Ralph
>>>>>>>>>>>
>>>>>>>>>>> On 11/5/07 12:10 PM, "Tim Prins" <tprins_at_[hidden]>
>>> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Commit 16364 broke things when using multiword mca param
>>>>>>>>>>>> values. For
>>>>>>>>>>>> instance:
>>>>>>>>>>>>
>>>>>>>>>>>> mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca
>>>>>>>>>>>> pls_rsh_agent
>>>>>>>>>>>> "ssh -Y" xterm
>>>>>>>>>>>>
>>>>>>>>>>>> Will crash and burn, because the value "ssh -Y" is being
>>> stored
>>>>>>>>>>>> into the
>>>>>>>>>>>> argv orted_cmd_line in orterun.c:1506. This is then added
>>>>>>>>>>>> to
>>>>>>>>>>>> the launch
>>>>>>>>>>>> command for the orted:
>>>>>>>>>>>>
>>>>>>>>>>>> /usr/bin/ssh -Y odin004 PATH=/san/homedirs/tprins/usr/rsl/
>>> bin:
>>>>>>>>>>>> $PATH ;
>>>>>>>>>>>> export PATH ;
>>>>>>>>>>>> LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:
>>>>>>>>>>>> $LD_LIBRARY_PATH ;
>>>>>>>>>>>> export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/
>>> orted
>>>>>>>>>>>> --debug
>>>>>>>>>>>> --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --
>>>>>>>>>>>> nodename
>>>>>>>>>>>> odin004 --universe tprins_at_[hidden]:default-
>>>>>>>>>>>> universe-27872
>>>>>>>>>>>> --nsreplica
>>>>>>>>>>>> "0.0;tcp://
>>>>>>>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:
>>> 21a0
>>>>>>>>>>>> :4090 8"
>>>>>>>>>>>> --gprreplica
>>>>>>>>>>>> "0.0;tcp://
>>>>>>>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:
>>> 21a0
>>>>>>>>>>>> :4090 8"
>>>>>>>>>>>> -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
>>>>>>>>>>>> mca_base_param_file_path
>>>>>>>>>>>> /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/
>>> homedirs/
>>>>>>>>>>>> tprins/rsl/
>>>>>>>>>>>> examp les
>>>>>>>>>>>> -mca mca_base_param_file_path_force /san/homedirs/tprins/
>>> rsl/
>>>>>>>>>>>> examples
>>>>>>>>>>>>
>>>>>>>>>>>> Notice that in this command we now have "-mca
>>> pls_rsh_agent ssh
>>>>>>>>>>>> -Y". So
>>>>>>>>>>>> the quotes have been lost, as we die a horrible death.
>>>>>>>>>>>>
>>>>>>>>>>>> So we need to add the quotes back in somehow, or pass these
>>>>>>>>>>>> options
>>>>>>>>>>>> differently. I'm not sure what the best way to fix this.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Tim
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>
>>

-- 
Jeff Squyres
Cisco Systems