Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Ssh tunnelling broken in trunk?
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-04-02 21:04:10


Hmmm...something isn't making sense. Can I see the command line you used to
generate this?

I'll tell you why I'm puzzled. If orte_debug_flag is set, then the
"--daemonize" should NOT be there, and you should see "--debug" on that
command line. What I see is the reverse, which implies to me that
orte_debug_flag is NOT being set to "true".

When I tested here and on odin, though, I found that the -d option correctly
set the flag and everything works just fine.

So there is something in your environment or setup that is messing up that
orte_debug_flag. I have no idea what it could be - the command line should
override anything in your environment, but you could check. Otherwise, if
this diagnostic output came from a command line that included -d or
--debug-devel, or had OMPI_MCA_orte_debug=1 in the environment, then I am at
a loss - everywhere I've tried it, it works fine.

Ralph

On 4/2/08 5:41 PM, "Jon Mason" <jon_at_[hidden]> wrote:

> On Wednesday 02 April 2008 05:04:47 pm Ralph Castain wrote:
>> Here's a real simple diagnostic you can do: set -mca plm_base_verbose 1 and
>> look at the cmd line being executed (send it here). It will look like:
>>
>> [[xxx,1],0] plm:rsh: executing: jjkljks;jldfsaj;
>>
>> If the cmd line has --daemonize on it, then the ssh will close and xterm
>> won't work.
>
> [vic20:01863] [[40388,0],0] plm:rsh: executing: (//usr/bin/ssh) [/usr/bin/ssh
> vic12 orted --daemonize -mca ess env -mca orte_ess_jobid 2646867968 -mca
> orte_ess_vpid 1 -mca orte_ess_num_procs
> 2 --hnp-uri
> "2646867968.0;tcp://192.168.70.150:39057;tcp://10.10.0.150:39057;tcp://86.75.3
> 0.10:39057" --nodename
> vic12 -mca btl openib,self --mca btl_openib_receive_queues
> P,65536,256,128,128 -mca plm_base_verbose 1 -mca
> mca_base_param_file_path
> /usr/mpi/gcc/ompi-trunk/share/openmpi/amca-param-sets:/root -mca
> mca_base_param_file_path_force /root]
>
>
> It looks like what you say is happening. Is this configured somewhere, so
> that I can remove it?
>
> Thanks,
> Jon
>
>> Ralph
>>
>> On 4/2/08 3:14 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>>> Can you diagnose a little further:
>>>
>>> 1. in the case where it works, can you verify that the ssh to launch
>>> the orteds is still running?
>>>
>>> 2. in the case where it doesn't work, can you verify that the ssh to
>>> launch the orteds has actually died?
>>>
>>> On Apr 2, 2008, at 4:58 PM, Jon Mason wrote:
>>>> On Wednesday 02 April 2008 01:21:31 pm Jon Mason wrote:
>>>>> On Wednesday 02 April 2008 11:54:50 am Ralph H Castain wrote:
>>>>>> I remember that someone had found a bug that caused
>>>>>> orte_debug_flag to not
>>>>>> get properly set (local var covering over a global one) - could be
>>>>>> that
>>>>>> your tmp-public branch doesn't have that patch in it.
>>>>>>
>>>>>> You might try updating to the latest trunk
>>>>>
>>>>> I updated my ompi-trunk tree, did a clean build, and I still seem
>>>>> the same
>>>>> problem. I regressed trunk to rev 17589 and everything works as I
>>>>> expect.
>>>>> So I think the problem is still there in the top of trunk.
>>>>
>>>> I stepped through the revs of trunk and found the first failing rev
>>>> to be
>>>> 17632. Its a big patch, so I'll defer to those more in the know to
>>>> determine
>>>> what is breaking in there.
>>>>
>>>>> I don't discount user error, but I don't think I am doing anyting
>>>>> different.
>>>>> Did some setting change that perhaps I did not modify?
>>>>>
>>>>> Thanks,
>>>>> Jon
>>>>>
>>>>>> On 4/2/08 10:41 AM, "George Bosilca" <bosilca_at_[hidden]> wrote:
>>>>>>> I'm using this feature on the trunk with the version from
>>>>>>> yesterday.
>>>>>>> It works without problems ...
>>>>>>>
>>>>>>> george.
>>>>>>>
>>>>>>> On Apr 2, 2008, at 12:14 PM, Jon Mason wrote:
>>>>>>>> On Wednesday 02 April 2008 11:07:18 am Jeff Squyres wrote:
>>>>>>>>> Are these r numbers relevant on the /tmp-public branch, or the
>>>>>>>>> trunk?
>>>>>>>>
>>>>>>>> I pulled it out of the command used to update the branch, which
>>>>>>>> was:
>>>>>>>> svn merge -r 17590:17917 https://svn.open-mpi.org/svn/ompi/trunk .
>>>>>>>>
>>>>>>>> In the cpc tmp branch, it happened at r17920.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jon
>>>>>>>>
>>>>>>>>> On Apr 2, 2008, at 11:59 AM, Jon Mason wrote:
>>>>>>>>>> I regressed my tree and it looks like it happened between
>>>>>>>>>> 17590:17917
>>>>>>>>>>
>>>>>>>>>> On Wednesday 02 April 2008 10:22:52 am Jon Mason wrote:
>>>>>>>>>>> I am noticing that ssh seems to be broken on trunk (and my cpc
>>>>>>>>>>> branch, as
>>>>>>>>>>> it is based on trunk). When I try to use xterm and gdb to
>>>>>>>>>>> debug, I
>>>>>>>>>>> only
>>>>>>>>>>> successfully get 1 xterm. I have tried this on 2 different
>>>>>>>>>>> setups. I can
>>>>>>>>>>> successfully get the xterm's on the 1.2 svn branch.
>>>>>>>>>>>
>>>>>>>>>>> I am running the following command:
>>>>>>>>>>> mpirun --n 2 --host vic12,vic20 -mca btl tcp,self -d xterm -e
>>>>>>>>>>> gdb /usr/mpi/gcc/openmpi-1.2-svn/tests/IMB-3.0/IMB-MPI1
>>>>>>>>>>>
>>>>>>>>>>> Is anyone else seeing this problem?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Jon
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel