Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-12-04 15:10:42


FWIW, I'll add an enhancement ticket for this issue (ability to leave
ssh sessions open without all the other debugging gorp).

As Ralph indicated, we can't promise when this will be done, but
it'll at least be on the list.

On Dec 4, 2006, at 1:56 PM, Dave Grote wrote:

>
> OK - I'll live with it for now. Fortunately, the extra output only
> occurs at the start and end of the run and doesn't interfere with
> the output of my code.
>
> An obvious suggestion for when you get to revamping that part of
> the code is to add a new command line flag to keep the ssh sessions
> running without turning on the debugging output. I know that others
> have the same XForwarding problem and this would offer a general
> solution.
> Thanks for all of your help!!
> Dave
>
> Ralph Castain wrote:
>> I’m afraid that would be a rather significant job as it plays a
>> rather significant role in the ssh startup procedure. We have
>> plans to revamp that portion of the code, but without someone who
>> knows exactly what is going on and where, you are more likely to
>> break it than revise it.
>>
>> If you can live with it as-is for now, I would strongly suggest
>> doing so until we get back to that area.
>>
>> Just my $0.02.
>> Ralph
>>
>>
>>
>> On 12/1/06 4:51 PM, "Dave Grote" <dpgrote_at_[hidden]> wrote:
>>
>>
>> Is there a place where I can hack the openmpi code to force it to
>> keep the ssh sessions open without the -d option? I looked through
>> some of the code, including orterun.c and a few other places, but
>> don't have the familiarity with the code to find the place.
>> Thanks!
>> Dave
>>
>> Galen Shipman wrote:
>> -d leaves the ssh session open
>> Try using:
>>
>>
>>
>>
>> mpirun -d -host boxtop2 -mca pls_rsh_agent "ssh -X -n" xterm -e cat
>>
>>
>>
>>
>>
>>
>>
>> Note the "ssh -X -n", this will tell ssh not to open stdin..
>>
>>
>>
>>
>> You should then be able to type characters in the resulting xterm
>> and have them echo'd back correctly.
>>
>>
>>
>>
>> - Galen
>>
>>
>>
>>
>>
>>
>>
>> On Dec 1, 2006, at 11:48 AM, Dave Grote wrote:
>>
>>
>>
>> Thanks for the suggestion, but it doesn't fix my problem. I did
>> the same thing you did and was able to get xterms open when using
>> the -d option. But when I run my code, the -d option seems to play
>> havoc with stdin. My code normally reads stdin from one processor
>> and it broadcasts it to the others. This failed when using the -d
>> option and the code wouldn't take input commands properly.
>>
>> But, since -d did get the X windows working, it must be doing
>> something differently. What is it about the -d option that allows
>> the windows to open? If I knew that, it would be the fix to my
>> problem.
>> Dave
>>
>> Galen Shipman wrote:
>>
>>
>>
>> I think this might be as simple as adding "-d" to the mpirun
>> command line....
>>
>>
>>
>> If I run:
>>
>>
>>
>>
>> mpirun -np 2 -d -mca pls_rsh_agent "ssh -X" xterm -e gdb ./mpi-
>> ping
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> All is well, I get the xterm's up..
>>
>>
>>
>>
>> If I run:
>>
>>
>>
>>
>> mpirun -np 2 -mca pls_rsh_agent "ssh -X" xterm -e gdb ./mpi-ping
>>
>>
>>
>>
>> I get the following:
>>
>>
>>
>>
>> /usr/bin/xauth: error in locking authority file /home/
>> gshipman/.Xauthority
>>
>> xterm Xt error: Can't open display: localhost:10.0
>>
>>
>>
>>
>>
>>
>>
>> Have you tried adding "-d"?
>>
>>
>>
>>
>>
>>
>>
>> Thanks,
>>
>>
>>
>>
>> Galen
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Nov 30, 2006, at 2:42 PM, Dave Grote wrote:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> I don't think that that is the problem. As far as I can tell, the
>> DISPLAY environment variable is being set properly on the slave
>> (it will sometimes have a different value than in the shell where
>> mpirun was executed).
>> Dave
>>
>> Ralph H Castain wrote:
>> Actually, I believe at least some of this may be a bug on our
>> part. We currently pickup the local environment and forward it on
>> to the remote nodes as the environment for use by the backend
>> processes. I have seen quite a few environment variables in that
>> list, including DISPLAY, which would create the problem you are
>> seeing.
>>
>> I’ll have to chat with folks here to understand what part of the
>> environment we absolutely need to carry forward, and what parts we
>> need to “cleanse” before passing it along.
>>
>> Ralph
>>
>>
>> On 11/30/06 10:50 AM, "Dave Grote" <dpgrote_at_[hidden]>
>> <mailto:dpgrote_at_[hidden]> wrote:
>>
>>
>>
>> I'm using caos linux (developed at LBL), which has the wrapper
>> wwmpirun around mpirun, so my command is something like
>> wwmpirun -np 8 -- -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"' /
>> usr/local/bin/pyMPI
>> This is essentially the same as
>> mpirun -np 8 -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"' /usr/
>> local/bin/pyMPI
>> but wwmpirun does the scheduling, for example looking for idle
>> nodes and creating the host file.
>> My system is setup with a master/login node which is running a
>> full version of linux and slave nodes that run a reduced linux
>> (that includes access to the X libraries). wwmmpirun always picks
>> the slaves nodes to run on. I've also tried "ssh -Y" and it
>> doesn't help. I've set xhost for the slave nodes in my login shell
>> on the master and that didn't work. XForwarding is enabled on all
>> of the nodes, so that's not the problem.
>>
>> I am able to get it to work by having wwmpirun do the command "ssh
>> -X nodennnn xclock" before starting the parallel program on that
>> same node, but this only works for the first person who logs into
>> the master and gets DISPLAY=localhost:10. When someone else tries
>> to run a parallel job, its seems that DISPLAY is set to localhost:
>> 10 on the slaves and tries to forward through that other persons
>> login with the same display number and the connection is refused
>> because of wrong authentication. This seems like very odd
>> behavior. I'm aware that this may be an issue with the X server
>> (xorg) or with the version of linux, so I am also seeking help
>> from the person who maintains caos linux. If it matters, the
>> machine uses myrinet for the interconnects.
>> Thanks!
>> Dave
>>
>> Galen Shipman wrote:
>>
>>
>> what does your command line look like?
>>
>> - Galen
>>
>> On Nov 29, 2006, at 7:53 PM, Dave Grote wrote:
>>
>>
>>
>>
>>
>> I cannot get X11 forwarding to work using mpirun. I've tried all of
>> the
>> standard methods, such as setting pls_rsh_agent = ssh -X, using
>> xhost,
>> and a few other things, but nothing works in general. In the FAQ,
>> http://www.open-mpi.org/faq/?category=running#mpirun-gui, a
>> reference is
>> made to other methods, but "they involve sophisticated X forwarding
>> through mpirun", and no further explanation is given. Can someone
>> tell
>> me what these other methods are or point me to where I can find
>> info on
>> them? I've done lots of google searching and havn't found anything
>> useful. This is a major issue since my parallel code heavily
>> depends on
>> having the ability to open X windows on the remote machine. Any and
>> all
>> help would be appreciated!
>> Thanks!
>> Dave
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>>
>> users mailing list
>>
>> users_at_[hidden]
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>>
>> users mailing list
>>
>> users_at_[hidden]
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________ users mailing list
>> users_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems