Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Multiworld MCA parameter values broken
From: Ralph H Castain (rhc_at_[hidden])
Date: 2007-11-08 14:36:06


Might I suggest:

https://svn.open-mpi.org/trac/ompi/ticket/1073

It deals with some of these issues and explains the boundaries of the
problem. As for what a string param can contain, I have no opinion. I only
note that it must handle special characters such as ';', '/', etc. that are
typically found in uri's. I cannot think of any reason it should have a
quote in it.

Ralph

On 11/8/07 12:25 PM, "Tim Prins" <tprins_at_[hidden]> wrote:

> The alias option you presented does not work. I think we do some weird
> things to find the absolute path for ssh, instead of just issuing the
> command.
>
> I would spend some time fixing this, but I don't want to do it wrong. We
> could quote all the param values, and change the parser to remove the
> quotes, but this is assuming that a mca param does not contain quotes.
>
> So I guess there are 2 questions that need to be answered before a fix
> is made:
>
> 1. What exactly can a string mca param contain? Can it have quotes or
> spaces or?
>
> 2. Which mca parameters should be forwarded? Should it be just the ones
> from the command line? From the environment? From config files?
>
> Tim
>
> Ralph Castain wrote:
>> What changed is that we never passed mca params to the orted before - they
>> always went to the app, but it's the orted that has the issue. There is a
>> bug ticket thread on this subject - I forget the number immediately.
>>
>> Basically, the problem was that we cannot generally pass the local
>> environment to the orteds when we launch them. However, people needed
>> various mca params to get to the orteds to control their behavior. The only
>> way to resolve that problem was to pass the params via the command line,
>> which is what was done.
>>
>> Except for a very few cases, all of our mca params are single values that do
>> not include spaces, so this is not a problem that is causing widespread
>> issues. As I said, I already had to deal with one special case that didn't
>> involve spaces, but did have special characters that required quoting, which
>> identified the larger problem of dealing with quoted strings.
>>
>> I have no objection to a more general fix. Like I said in my note, though,
>> the general fix will take a larger effort. If someone is willing to do so,
>> that is fine with me - I was only offering solutions that would fill the
>> interim time as I haven't heard anyone step up to say they would fix it
>> anytime soon.
>>
>> Please feel free to jump in and volunteer! ;-) I'm willing to put the quotes
>> around things if you will fix the mca cmd line parser to cleanly remove them
>> on the other end.
>>
>> Ralph
>>
>>
>>
>> On 11/7/07 5:50 PM, "Tim Prins" <tprins_at_[hidden]> wrote:
>>
>>> I'm curious what changed to make this a problem. How were we passing mca
>>> param
>>> from the base to the app before, and why did it change?
>>>
>>> I think that options 1 & 2 below are no good, since we, in general, allow
>>> string mca params to have spaces (as far as I understand it). So a more
>>> general approach is needed.
>>>
>>> Tim
>>>
>>> On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote:
>>>> Sorry for delay - wasn't ignoring the issue.
>>>>
>>>> There are several fixes to this problem - ranging in order from least to
>>>> most work:
>>>>
>>>> 1. just alias "ssh" to be "ssh -Y" and run without setting the mca param.
>>>> It won't affect anything on the backend because the daemon/procs don't use
>>>> ssh.
>>>>
>>>> 2. include "pls_rsh_agent" in the array of mca params not to be passed to
>>>> the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the
>>>> orte_pls_base_orted_append_basic_args function. This would fix the specific
>>>> problem cited here, but I admit that listing every such param by name would
>>>> get tedious.
>>>>
>>>> 3. we could easily detect that a "problem" character was in the mca param
>>>> value when we add it to the orted's argv, and then put "" around it. The
>>>> problem, however, is that the mca param parser on the far end doesn't
>>>> remove those "" from the resulting string. At least, I spent over a day
>>>> fighting with a problem only to discover that was happening. Could be an
>>>> error in the way I was doing things, or could be a real characteristic of
>>>> the parser. Anyway, we would have to ensure that the parser removes any
>>>> surrounding "" before passing along the param value or this won't work.
>>>>
>>>> Ralph
>>>>
>>>> On 11/5/07 12:10 PM, "Tim Prins" <tprins_at_[hidden]> wrote:
>>>>> Hi,
>>>>>
>>>>> Commit 16364 broke things when using multiword mca param values. For
>>>>> instance:
>>>>>
>>>>> mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent
>>>>> "ssh -Y" xterm
>>>>>
>>>>> Will crash and burn, because the value "ssh -Y" is being stored into the
>>>>> argv orted_cmd_line in orterun.c:1506. This is then added to the launch
>>>>> command for the orted:
>>>>>
>>>>> /usr/bin/ssh -Y odin004 PATH=/san/homedirs/tprins/usr/rsl/bin:$PATH ;
>>>>> export PATH ;
>>>>> LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:$LD_LIBRARY_PATH ;
>>>>> export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted --debug
>>>>> --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --nodename
>>>>> odin004 --universe tprins_at_[hidden]:default-universe-27872
>>>>> --nsreplica
>>>>> "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
>>>>> :4090 8"
>>>>> --gprreplica
>>>>> "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
>>>>> :4090 8"
>>>>> -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
>>>>> mca_base_param_file_path
>>>>> /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/tprins/rsl/
>>>>> examp les
>>>>> -mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/examples
>>>>>
>>>>> Notice that in this command we now have "-mca pls_rsh_agent ssh -Y". So
>>>>> the quotes have been lost, as we die a horrible death.
>>>>>
>>>>> So we need to add the quotes back in somehow, or pass these options
>>>>> differently. I'm not sure what the best way to fix this.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Tim
>>>
>>
>