Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Ssh tunnelling broken in trunk?
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-04-02 21:11:17


There is one other thing you can check - check for stale libraries on your
backend nodes. The options on the daemons changed. They used to always
daemonize unless told otherwise. They now do NOT daemonize unless told to do
so.

If the orted executables back there are "stale", then you will get the
incorrect behavior. I don't think that is the problem here as your command
line looks simply wrong per my comments below, but it might be worth
checking out anyway.

Ralph

On 4/2/08 7:04 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:

> Hmmm...something isn't making sense. Can I see the command line you used to
> generate this?
>
> I'll tell you why I'm puzzled. If orte_debug_flag is set, then the
> "--daemonize" should NOT be there, and you should see "--debug" on that
> command line. What I see is the reverse, which implies to me that
> orte_debug_flag is NOT being set to "true".
>
> When I tested here and on odin, though, I found that the -d option correctly
> set the flag and everything works just fine.
>
> So there is something in your environment or setup that is messing up that
> orte_debug_flag. I have no idea what it could be - the command line should
> override anything in your environment, but you could check. Otherwise, if
> this diagnostic output came from a command line that included -d or
> --debug-devel, or had OMPI_MCA_orte_debug=1 in the environment, then I am at
> a loss - everywhere I've tried it, it works fine.
>
> Ralph
>
>
>
> On 4/2/08 5:41 PM, "Jon Mason" <jon_at_[hidden]> wrote:
>
>> On Wednesday 02 April 2008 05:04:47 pm Ralph Castain wrote:
>>> Here's a real simple diagnostic you can do: set -mca plm_base_verbose 1 and
>>> look at the cmd line being executed (send it here). It will look like:
>>>
>>> [[xxx,1],0] plm:rsh: executing: jjkljks;jldfsaj;
>>>
>>> If the cmd line has --daemonize on it, then the ssh will close and xterm
>>> won't work.
>>
>> [vic20:01863] [[40388,0],0] plm:rsh: executing: (//usr/bin/ssh) [/usr/bin/ssh
>> vic12 orted --daemonize -mca ess env -mca orte_ess_jobid 2646867968 -mca
>> orte_ess_vpid 1 -mca orte_ess_num_procs
>> 2 --hnp-uri
>>
"2646867968.0;tcp://192.168.70.150:39057;tcp://10.10.0.150:39057;tcp://86.75.>>
3
>> 0.10:39057" --nodename
>> vic12 -mca btl openib,self --mca btl_openib_receive_queues
>> P,65536,256,128,128 -mca plm_base_verbose 1 -mca
>> mca_base_param_file_path
>> /usr/mpi/gcc/ompi-trunk/share/openmpi/amca-param-sets:/root -mca
>> mca_base_param_file_path_force /root]
>>
>>
>> It looks like what you say is happening. Is this configured somewhere, so
>> that I can remove it?
>>
>> Thanks,
>> Jon
>>
>>> Ralph
>>>
>>> On 4/2/08 3:14 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>>>> Can you diagnose a little further:
>>>>
>>>> 1. in the case where it works, can you verify that the ssh to launch
>>>> the orteds is still running?
>>>>
>>>> 2. in the case where it doesn't work, can you verify that the ssh to
>>>> launch the orteds has actually died?
>>>>
>>>> On Apr 2, 2008, at 4:58 PM, Jon Mason wrote:
>>>>> On Wednesday 02 April 2008 01:21:31 pm Jon Mason wrote:
>>>>>> On Wednesday 02 April 2008 11:54:50 am Ralph H Castain wrote:
>>>>>>> I remember that someone had found a bug that caused
>>>>>>> orte_debug_flag to not
>>>>>>> get properly set (local var covering over a global one) - could be
>>>>>>> that
>>>>>>> your tmp-public branch doesn't have that patch in it.
>>>>>>>
>>>>>>> You might try updating to the latest trunk
>>>>>>
>>>>>> I updated my ompi-trunk tree, did a clean build, and I still seem
>>>>>> the same
>>>>>> problem. I regressed trunk to rev 17589 and everything works as I
>>>>>> expect.
>>>>>> So I think the problem is still there in the top of trunk.
>>>>>
>>>>> I stepped through the revs of trunk and found the first failing rev
>>>>> to be
>>>>> 17632. Its a big patch, so I'll defer to those more in the know to
>>>>> determine
>>>>> what is breaking in there.
>>>>>
>>>>>> I don't discount user error, but I don't think I am doing anyting
>>>>>> different.
>>>>>> Did some setting change that perhaps I did not modify?
>>>>>>
>>>>>> Thanks,
>>>>>> Jon
>>>>>>
>>>>>>> On 4/2/08 10:41 AM, "George Bosilca" <bosilca_at_[hidden]> wrote:
>>>>>>>> I'm using this feature on the trunk with the version from
>>>>>>>> yesterday.
>>>>>>>> It works without problems ...
>>>>>>>>
>>>>>>>> george.
>>>>>>>>
>>>>>>>> On Apr 2, 2008, at 12:14 PM, Jon Mason wrote:
>>>>>>>>> On Wednesday 02 April 2008 11:07:18 am Jeff Squyres wrote:
>>>>>>>>>> Are these r numbers relevant on the /tmp-public branch, or the
>>>>>>>>>> trunk?
>>>>>>>>>
>>>>>>>>> I pulled it out of the command used to update the branch, which
>>>>>>>>> was:
>>>>>>>>> svn merge -r 17590:17917 https://svn.open-mpi.org/svn/ompi/trunk .
>>>>>>>>>
>>>>>>>>> In the cpc tmp branch, it happened at r17920.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Jon
>>>>>>>>>
>>>>>>>>>> On Apr 2, 2008, at 11:59 AM, Jon Mason wrote:
>>>>>>>>>>> I regressed my tree and it looks like it happened between
>>>>>>>>>>> 17590:17917
>>>>>>>>>>>
>>>>>>>>>>> On Wednesday 02 April 2008 10:22:52 am Jon Mason wrote:
>>>>>>>>>>>> I am noticing that ssh seems to be broken on trunk (and my cpc
>>>>>>>>>>>> branch, as
>>>>>>>>>>>> it is based on trunk). When I try to use xterm and gdb to
>>>>>>>>>>>> debug, I
>>>>>>>>>>>> only
>>>>>>>>>>>> successfully get 1 xterm. I have tried this on 2 different
>>>>>>>>>>>> setups. I can
>>>>>>>>>>>> successfully get the xterm's on the 1.2 svn branch.
>>>>>>>>>>>>
>>>>>>>>>>>> I am running the following command:
>>>>>>>>>>>> mpirun --n 2 --host vic12,vic20 -mca btl tcp,self -d xterm -e
>>>>>>>>>>>> gdb /usr/mpi/gcc/openmpi-1.2-svn/tests/IMB-3.0/IMB-MPI1
>>>>>>>>>>>>
>>>>>>>>>>>> Is anyone else seeing this problem?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Jon
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel