This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
Actually, I believe at least some of this may be a bug on our part. We
currently pickup the local environment and forward it on to the remote nodes
as the environment for use by the backend processes. I have seen quite a few
environment variables in that list, including DISPLAY, which would create
the problem you are seeing.
I¹ll have to chat with folks here to understand what part of the environment
we absolutely need to carry forward, and what parts we need to ³cleanse²
before passing it along.
On 11/30/06 10:50 AM, "Dave Grote" <dpgrote_at_[hidden]> wrote:
> I'm using caos linux (developed at LBL), which has the wrapper wwmpirun around
> mpirun, so my command is something like
> wwmpirun -np 8 -- -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"'
> This is essentially the same as
> mpirun -np 8 -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"' /usr/local/bin/pyMPI
> but wwmpirun does the scheduling, for example looking for idle nodes and
> creating the host file.
> My system is setup with a master/login node which is running a full version of
> linux and slave nodes that run a reduced linux (that includes access to the X
> libraries). wwmmpirun always picks the slaves nodes to run on. I've also tried
> "ssh -Y" and it doesn't help. I've set xhost for the slave nodes in my login
> shell on the master and that didn't work. XForwarding is enabled on all of the
> nodes, so that's not the problem.
> I am able to get it to work by having wwmpirun do the command "ssh -X nodennnn
> xclock" before starting the parallel program on that same node, but this only
> works for the first person who logs into the master and gets
> DISPLAY=localhost:10. When someone else tries to run a parallel job, its seems
> that DISPLAY is set to localhost:10 on the slaves and tries to forward through
> that other persons login with the same display number and the connection is
> refused because of wrong authentication. This seems like very odd behavior.
> I'm aware that this may be an issue with the X server (xorg) or with the
> version of linux, so I am also seeking help from the person who maintains caos
> linux. If it matters, the machine uses myrinet for the interconnects.
> Galen Shipman wrote:
>> what does your command line look like?
>> - Galen
>> On Nov 29, 2006, at 7:53 PM, Dave Grote wrote:
>>> I cannot get X11 forwarding to work using mpirun. I've tried all of
>>> standard methods, such as setting pls_rsh_agent = ssh -X, using xhost,
>>> and a few other things, but nothing works in general. In the FAQ,
>>> http://www.open-mpi.org/faq/?category=running#mpirun-gui, a
>>> reference is
>>> made to other methods, but "they involve sophisticated X forwarding
>>> through mpirun", and no further explanation is given. Can someone tell
>>> me what these other methods are or point me to where I can find
>>> info on
>>> them? I've done lots of google searching and havn't found anything
>>> useful. This is a major issue since my parallel code heavily
>>> depends on
>>> having the ability to open X windows on the remote machine. Any and
>>> help would be appreciated!
>>> users mailing list
>> users mailing list
> users mailing list