Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Ralph H Castain (rhc_at_[hidden])
Date: 2006-11-30 14:30:20

Actually, I believe at least some of this may be a bug on our part. We
currently pickup the local environment and forward it on to the remote nodes
as the environment for use by the backend processes. I have seen quite a few
environment variables in that list, including DISPLAY, which would create
the problem you are seeing.

I¹ll have to chat with folks here to understand what part of the environment
we absolutely need to carry forward, and what parts we need to ³cleanse²
before passing it along.


On 11/30/06 10:50 AM, "Dave Grote" <dpgrote_at_[hidden]> wrote:

> I'm using caos linux (developed at LBL), which has the wrapper wwmpirun around
> mpirun, so my command is something like
> wwmpirun -np 8 -- -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"'
> /usr/local/bin/pyMPI
> This is essentially the same as
> mpirun -np 8 -x PYTHONPATH --mca pls_rsh_agent '"ssh -X"' /usr/local/bin/pyMPI
> but wwmpirun does the scheduling, for example looking for idle nodes and
> creating the host file.
> My system is setup with a master/login node which is running a full version of
> linux and slave nodes that run a reduced linux (that includes access to the X
> libraries). wwmmpirun always picks the slaves nodes to run on. I've also tried
> "ssh -Y" and it doesn't help. I've set xhost for the slave nodes in my login
> shell on the master and that didn't work. XForwarding is enabled on all of the
> nodes, so that's not the problem.
> I am able to get it to work by having wwmpirun do the command "ssh -X nodennnn
> xclock" before starting the parallel program on that same node, but this only
> works for the first person who logs into the master and gets
> DISPLAY=localhost:10. When someone else tries to run a parallel job, its seems
> that DISPLAY is set to localhost:10 on the slaves and tries to forward through
> that other persons login with the same display number and the connection is
> refused because of wrong authentication. This seems like very odd behavior.
> I'm aware that this may be an issue with the X server (xorg) or with the
> version of linux, so I am also seeking help from the person who maintains caos
> linux. If it matters, the machine uses myrinet for the interconnects.
> Thanks!
> Dave
> Galen Shipman wrote:
>> what does your command line look like?
>> - Galen
>> On Nov 29, 2006, at 7:53 PM, Dave Grote wrote:
>>> I cannot get X11 forwarding to work using mpirun. I've tried all of
>>> the
>>> standard methods, such as setting pls_rsh_agent = ssh -X, using xhost,
>>> and a few other things, but nothing works in general. In the FAQ,
>>>, a
>>> reference is
>>> made to other methods, but "they involve sophisticated X forwarding
>>> through mpirun", and no further explanation is given. Can someone tell
>>> me what these other methods are or point me to where I can find
>>> info on
>>> them? I've done lots of google searching and havn't found anything
>>> useful. This is a major issue since my parallel code heavily
>>> depends on
>>> having the ability to open X windows on the remote machine. Any and
>>> all
>>> help would be appreciated!
>>> Thanks!
>>> Dave
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]