Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Multiworld MCA parameter values broken
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-11-19 09:01:19


Sorry -- I'm just joining this conversation late: what's the problem
with opal_cmd_line_parse?

It should obey all quoting from shells, etc. I.e., it shouldn't care
about tokens with special characters (to include spaces) because the
shell divides all of that stuff up -- it just gets a char*[] that it
treats as discrete tokens.

Is it doing something wrong?

On Nov 19, 2007, at 8:39 AM, Ralph H Castain wrote:

> I'm not sure it is really necessary - the problem is solely within
> opal_cmd_line_parse and (if someone can parse that code ;-)) is
> truly simple
> to fix. The overly long cmd line issue is due to a bug that Josh was
> going
> to look at (may already have done so while I was out of touch).
>
> Ralph
>
>
>
> On 11/9/07 5:10 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>
>> Should there be another option for passing MCA parameters between
>> processes, such as via stdin (or any file descriptor)? I.e., during
>> the command line parsing to check for command line MCA params,
>> perhaps
>> a new argument could be introduced: -mcauri <uri>, where <uri> could
>> be a few different forms:
>>
>> - file://stdin: (note the 2 //, not 3, so "stdin" would never
>> conflict
>> with a real file named /stdin) Read the parameters in off stdin.
>>
>> - rml://...rml contact info...: read in the MCA params via the RML
>> (although I assume that reading via the RML would be *waaaay* to late
>> during the MCA setup process -- I mentioned this option for
>> completeness, even though I don't think it'll work)
>>
>> - ip://ipaddress:port: open a socket back and read the MCA params in
>> over a socket. This could have some scalability issues...? But who
>> knows; it could be tied into the hierarchical startup such that we
>> wouldn't have to have an all-to-one connection scheme. Certainly it
>> would cause scalability problems when paired with today's all-to-one
>> RML connection scheme for the OOB.
>>
>> I'm not sure that the rml: and ip: schemes are worthwhile. Maybe a
>> file://stdin kind of approach could work? Or perhaps some other kind
>> of URI/IPC...? (I really haven't thought through the issues -- this
>> is off the top of my head)
>>
>>
>>
>> On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote:
>>
>>> Might I suggest:
>>>
>>> https://svn.open-mpi.org/trac/ompi/ticket/1073
>>>
>>> It deals with some of these issues and explains the boundaries of
>>> the
>>> problem. As for what a string param can contain, I have no opinion.
>>> I only
>>> note that it must handle special characters such as ';', '/', etc.
>>> that are
>>> typically found in uri's. I cannot think of any reason it should
>>> have a
>>> quote in it.
>>>
>>> Ralph
>>>
>>>
>>>
>>> On 11/8/07 12:25 PM, "Tim Prins" <tprins_at_[hidden]> wrote:
>>>
>>>> The alias option you presented does not work. I think we do some
>>>> weird
>>>> things to find the absolute path for ssh, instead of just issuing
>>>> the
>>>> command.
>>>>
>>>> I would spend some time fixing this, but I don't want to do it
>>>> wrong. We
>>>> could quote all the param values, and change the parser to remove
>>>> the
>>>> quotes, but this is assuming that a mca param does not contain
>>>> quotes.
>>>>
>>>> So I guess there are 2 questions that need to be answered before a
>>>> fix
>>>> is made:
>>>>
>>>> 1. What exactly can a string mca param contain? Can it have
>>>> quotes or
>>>> spaces or?
>>>>
>>>> 2. Which mca parameters should be forwarded? Should it be just the
>>>> ones
>>>> from the command line? From the environment? From config files?
>>>>
>>>> Tim
>>>>
>>>> Ralph Castain wrote:
>>>>> What changed is that we never passed mca params to the orted
>>>>> before - they
>>>>> always went to the app, but it's the orted that has the issue.
>>>>> There is a
>>>>> bug ticket thread on this subject - I forget the number
>>>>> immediately.
>>>>>
>>>>> Basically, the problem was that we cannot generally pass the local
>>>>> environment to the orteds when we launch them. However, people
>>>>> needed
>>>>> various mca params to get to the orteds to control their behavior.
>>>>> The only
>>>>> way to resolve that problem was to pass the params via the command
>>>>> line,
>>>>> which is what was done.
>>>>>
>>>>> Except for a very few cases, all of our mca params are single
>>>>> values that do
>>>>> not include spaces, so this is not a problem that is causing
>>>>> widespread
>>>>> issues. As I said, I already had to deal with one special case
>>>>> that didn't
>>>>> involve spaces, but did have special characters that required
>>>>> quoting, which
>>>>> identified the larger problem of dealing with quoted strings.
>>>>>
>>>>> I have no objection to a more general fix. Like I said in my note,
>>>>> though,
>>>>> the general fix will take a larger effort. If someone is willing
>>>>> to do so,
>>>>> that is fine with me - I was only offering solutions that would
>>>>> fill the
>>>>> interim time as I haven't heard anyone step up to say they would
>>>>> fix it
>>>>> anytime soon.
>>>>>
>>>>> Please feel free to jump in and volunteer! ;-) I'm willing to put
>>>>> the quotes
>>>>> around things if you will fix the mca cmd line parser to cleanly
>>>>> remove them
>>>>> on the other end.
>>>>>
>>>>> Ralph
>>>>>
>>>>>
>>>>>
>>>>> On 11/7/07 5:50 PM, "Tim Prins" <tprins_at_[hidden]> wrote:
>>>>>
>>>>>> I'm curious what changed to make this a problem. How were we
>>>>>> passing mca
>>>>>> param
>>>>>> from the base to the app before, and why did it change?
>>>>>>
>>>>>> I think that options 1 & 2 below are no good, since we, in
>>>>>> general, allow
>>>>>> string mca params to have spaces (as far as I understand it). So
>>>>>> a more
>>>>>> general approach is needed.
>>>>>>
>>>>>> Tim
>>>>>>
>>>>>> On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote:
>>>>>>> Sorry for delay - wasn't ignoring the issue.
>>>>>>>
>>>>>>> There are several fixes to this problem - ranging in order from
>>>>>>> least to
>>>>>>> most work:
>>>>>>>
>>>>>>> 1. just alias "ssh" to be "ssh -Y" and run without setting the
>>>>>>> mca param.
>>>>>>> It won't affect anything on the backend because the daemon/procs
>>>>>>> don't use
>>>>>>> ssh.
>>>>>>>
>>>>>>> 2. include "pls_rsh_agent" in the array of mca params not to be
>>>>>>> passed to
>>>>>>> the orted in orte/mca/pls/base/pls_base_general_support_fns.c,
>>>>>>> the
>>>>>>> orte_pls_base_orted_append_basic_args function. This would fix
>>>>>>> the specific
>>>>>>> problem cited here, but I admit that listing every such param by
>>>>>>> name would
>>>>>>> get tedious.
>>>>>>>
>>>>>>> 3. we could easily detect that a "problem" character was in the
>>>>>>> mca param
>>>>>>> value when we add it to the orted's argv, and then put "" around
>>>>>>> it. The
>>>>>>> problem, however, is that the mca param parser on the far end
>>>>>>> doesn't
>>>>>>> remove those "" from the resulting string. At least, I spent
>>>>>>> over a day
>>>>>>> fighting with a problem only to discover that was happening.
>>>>>>> Could be an
>>>>>>> error in the way I was doing things, or could be a real
>>>>>>> characteristic of
>>>>>>> the parser. Anyway, we would have to ensure that the parser
>>>>>>> removes any
>>>>>>> surrounding "" before passing along the param value or this
>>>>>>> won't work.
>>>>>>>
>>>>>>> Ralph
>>>>>>>
>>>>>>> On 11/5/07 12:10 PM, "Tim Prins" <tprins_at_[hidden]> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Commit 16364 broke things when using multiword mca param
>>>>>>>> values. For
>>>>>>>> instance:
>>>>>>>>
>>>>>>>> mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca
>>>>>>>> pls_rsh_agent
>>>>>>>> "ssh -Y" xterm
>>>>>>>>
>>>>>>>> Will crash and burn, because the value "ssh -Y" is being stored
>>>>>>>> into the
>>>>>>>> argv orted_cmd_line in orterun.c:1506. This is then added to
>>>>>>>> the launch
>>>>>>>> command for the orted:
>>>>>>>>
>>>>>>>> /usr/bin/ssh -Y odin004 PATH=/san/homedirs/tprins/usr/rsl/bin:
>>>>>>>> $PATH ;
>>>>>>>> export PATH ;
>>>>>>>> LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:
>>>>>>>> $LD_LIBRARY_PATH ;
>>>>>>>> export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted
>>>>>>>> --debug
>>>>>>>> --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --
>>>>>>>> nodename
>>>>>>>> odin004 --universe tprins_at_[hidden]:default-
>>>>>>>> universe-27872
>>>>>>>> --nsreplica
>>>>>>>> "0.0;tcp://
>>>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
>>>>>>>> :4090 8"
>>>>>>>> --gprreplica
>>>>>>>> "0.0;tcp://
>>>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
>>>>>>>> :4090 8"
>>>>>>>> -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
>>>>>>>> mca_base_param_file_path
>>>>>>>> /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/
>>>>>>>> tprins/rsl/
>>>>>>>> examp les
>>>>>>>> -mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/
>>>>>>>> examples
>>>>>>>>
>>>>>>>> Notice that in this command we now have "-mca pls_rsh_agent ssh
>>>>>>>> -Y". So
>>>>>>>> the quotes have been lost, as we die a horrible death.
>>>>>>>>
>>>>>>>> So we need to add the quotes back in somehow, or pass these
>>>>>>>> options
>>>>>>>> differently. I'm not sure what the best way to fix this.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Tim
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>

-- 
Jeff Squyres
Cisco Systems