Actually, I believe at least some of this may be a bug on our part. We
currently pickup the local environment and forward it on to the remote nodes
as the environment for use by the backend processes. I have seen quite a few
environment variables in that list, including DISPLAY, which would create
the problem you are seeing.
I¹ll have to chat with folks here to understand what part of the environment
we absolutely need to carry forward, and what parts we need to ³cleanse²
before passing it along.
On 11/30/06 10:50 AM, "Dave Grote" <dpgrote_at_[hidden]> wrote:
> I'm using caos linux (developed at LBL), which has the wrapper wwmpirun around
> mpirun, so my command is something like
> wwmpirun -np 8 -- -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"'
> This is essentially the same as
> mpirun -np 8 -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"' /usr/local/bin/pyMPI
> but wwmpirun does the scheduling, for example looking for idle nodes and
> creating the host file.
> My system is setup with a master/login node which is running a full version of
> linux and slave nodes that run a reduced linux (that includes access to the X
> libraries). wwmmpirun always picks the slaves nodes to run on. I've also tried
> "ssh -Y" and it doesn't help. I've set xhost for the slave nodes in my login
> shell on the master and that didn't work. XForwarding is enabled on all of the
> nodes, so that's not the problem.
> I am able to get it to work by having wwmpirun do the command "ssh -X nodennnn
> xclock" before starting the parallel program on that same node, but this only
> works for the first person who logs into the master and gets
> DISPLAY=localhost:10. When someone else tries to run a parallel job, its seems
> that DISPLAY is set to localhost:10 on the slaves and tries to forward through
> that other persons login with the same display number and the connection is
> refused because of wrong authentication. This seems like very odd behavior.
> I'm aware that this may be an issue with the X server (xorg) or with the
> version of linux, so I am also seeking help from the person who maintains caos
> linux. If it matters, the machine uses myrinet for the interconnects.
> Galen Shipman wrote:
>> what does your command line look like?
>> - Galen
>> On Nov 29, 2006, at 7:53 PM, Dave Grote wrote:
>>> I cannot get X11 forwarding to work using mpirun. I've tried all of
>>> standard methods, such as setting pls_rsh_agent = ssh -X, using xhost,
>>> and a few other things, but nothing works in general. In the FAQ,
>>> http://www.open-mpi.org/faq/?category=running#mpirun-gui, a
>>> reference is
>>> made to other methods, but "they involve sophisticated X forwarding
>>> through mpirun", and no further explanation is given. Can someone tell
>>> me what these other methods are or point me to where I can find
>>> info on
>>> them? I've done lots of google searching and havn't found anything
>>> useful. This is a major issue since my parallel code heavily
>>> depends on
>>> having the ability to open X windows on the remote machine. Any and
>>> help would be appreciated!
>>> users mailing list
>> users mailing list
> users mailing list