Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph H Castain (rhc_at_[hidden])
Date: 2007-11-07 10:40:45


Sorry for delay - wasn't ignoring the issue.

There are several fixes to this problem - ranging in order from least to
most work:

1. just alias "ssh" to be "ssh -Y" and run without setting the mca param. It
won't affect anything on the backend because the daemon/procs don't use ssh.

2. include "pls_rsh_agent" in the array of mca params not to be passed to
the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the
orte_pls_base_orted_append_basic_args function. This would fix the specific
problem cited here, but I admit that listing every such param by name would
get tedious.

3. we could easily detect that a "problem" character was in the mca param
value when we add it to the orted's argv, and then put "" around it. The
problem, however, is that the mca param parser on the far end doesn't remove
those "" from the resulting string. At least, I spent over a day fighting
with a problem only to discover that was happening. Could be an error in the
way I was doing things, or could be a real characteristic of the parser.
Anyway, we would have to ensure that the parser removes any surrounding ""
before passing along the param value or this won't work.

Ralph

On 11/5/07 12:10 PM, "Tim Prins" <tprins_at_[hidden]> wrote:

> Hi,
>
> Commit 16364 broke things when using multiword mca param values. For
> instance:
>
> mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent
> "ssh -Y" xterm
>
> Will crash and burn, because the value "ssh -Y" is being stored into the
> argv orted_cmd_line in orterun.c:1506. This is then added to the launch
> command for the orted:
>
> /usr/bin/ssh -Y odin004 PATH=/san/homedirs/tprins/usr/rsl/bin:$PATH ;
> export PATH ;
> LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:$LD_LIBRARY_PATH ;
> export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted --debug
> --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --nodename
> odin004 --universe tprins_at_[hidden]:default-universe-27872
> --nsreplica
> "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0:4090
> 8"
> --gprreplica
> "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0:4090
> 8"
> -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
> mca_base_param_file_path
> /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/tprins/rsl/examp
> les
> -mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/examples
>
> Notice that in this command we now have "-mca pls_rsh_agent ssh -Y". So
> the quotes have been lost, as we die a horrible death.
>
> So we need to add the quotes back in somehow, or pass these options
> differently. I'm not sure what the best way to fix this.
>
> Thanks,
>
> Tim