Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Tim Prins (tprins_at_[hidden])
Date: 2007-11-07 19:50:48


I'm curious what changed to make this a problem. How were we passing mca param
from the base to the app before, and why did it change?

I think that options 1 & 2 below are no good, since we, in general, allow
string mca params to have spaces (as far as I understand it). So a more
general approach is needed.

Tim

On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote:
> Sorry for delay - wasn't ignoring the issue.
>
> There are several fixes to this problem - ranging in order from least to
> most work:
>
> 1. just alias "ssh" to be "ssh -Y" and run without setting the mca param.
> It won't affect anything on the backend because the daemon/procs don't use
> ssh.
>
> 2. include "pls_rsh_agent" in the array of mca params not to be passed to
> the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the
> orte_pls_base_orted_append_basic_args function. This would fix the specific
> problem cited here, but I admit that listing every such param by name would
> get tedious.
>
> 3. we could easily detect that a "problem" character was in the mca param
> value when we add it to the orted's argv, and then put "" around it. The
> problem, however, is that the mca param parser on the far end doesn't
> remove those "" from the resulting string. At least, I spent over a day
> fighting with a problem only to discover that was happening. Could be an
> error in the way I was doing things, or could be a real characteristic of
> the parser. Anyway, we would have to ensure that the parser removes any
> surrounding "" before passing along the param value or this won't work.
>
> Ralph
>
> On 11/5/07 12:10 PM, "Tim Prins" <tprins_at_[hidden]> wrote:
> > Hi,
> >
> > Commit 16364 broke things when using multiword mca param values. For
> > instance:
> >
> > mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent
> > "ssh -Y" xterm
> >
> > Will crash and burn, because the value "ssh -Y" is being stored into the
> > argv orted_cmd_line in orterun.c:1506. This is then added to the launch
> > command for the orted:
> >
> > /usr/bin/ssh -Y odin004 PATH=/san/homedirs/tprins/usr/rsl/bin:$PATH ;
> > export PATH ;
> > LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:$LD_LIBRARY_PATH ;
> > export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted --debug
> > --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --nodename
> > odin004 --universe tprins_at_[hidden]:default-universe-27872
> > --nsreplica
> > "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
> >:4090 8"
> > --gprreplica
> > "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
> >:4090 8"
> > -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
> > mca_base_param_file_path
> > /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/tprins/rsl/
> >examp les
> > -mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/examples
> >
> > Notice that in this command we now have "-mca pls_rsh_agent ssh -Y". So
> > the quotes have been lost, as we die a horrible death.
> >
> > So we need to add the quotes back in somehow, or pass these options
> > differently. I'm not sure what the best way to fix this.
> >
> > Thanks,
> >
> > Tim