Some good news and bad news. According to the information provided on http://www.open-mpi.org/faq/?category=running,
I have enabled X11Forward on all remote nodes, and
added the path to mpirun, which is "/usr/local/bin", on all node, and
"xhost +" on my localhost, and
set the DISPLAY on all remote nodes as the DISPLAY value of my localhost. Then,
I used "mpirun -n [numberPros] --hostfile [filename] arg .." to start the job on my localhost, it was still not working,
But, when I explicitly added the "--prefix /usr/local" and "-x DISPLAY=[localDISPLAYvalue]" to start the job, everything was working: all xwindows opened by remote nodes were displayed on my localhost machine. I was so excited! Moreover, with all these --prefix and -x options to do mpirun through XGrid (i.e. without adding --hostfile ) , it was still working!
Could you please tell me what things that I missed in setting all remote nodes as I mentioned in above, so I don't have to type all the options to start this job?
(I have also tried to add the prefix "/usr/local" to the PATH of each remote node as well, it was still not working if without --prefix option).
Thanks for any help.
XGrid does not forward X11 credentials, so you would have to setup an
X11 environment by yourself. Using ssh or a local starter does
forward X11 credentials, which is why it works in that case.
On Oct 25, 2007, at 10:23 PM, Jinhui Qin wrote:
> Hi Brian,
> I got another problem in running an MPI job through XGrid.
> During the execution of this MPI job it will call Xlib functions
> ( i.e. XOpenDisplay()) to open an X window. The XOpenDisplay()
> function call failed (return "null"), it can not open a display no
> matter how many processors that I requested.
> However, when I tuned off the xgrid controller, I used "mpirun -n 4
> " to start the job again, four X windows opened properly, but four
> processes were all running on the local machine instead of on any
> remote nodes.
> I have also tested to use "ssh -x" from a terminal of my local
> machine to login to any other node in the cluster to run the job
> (I have the copies of the same job on all nodes and in the same
> path), the X window can display on my local machine properly. I
> know it is "-x" option set up the environment properly for starting
> the xwindow. If only use "ssh" without "-x" option, it won't work.
> I am wondering why the xwindow can not open if the job is started
> through Xgrid. How does the Xgrid controller contact to each agent
> Is there anyone who has seen a similar problem?
> I have installed X11 and OpenMPI on all 8 mac mini nodes in my
> cluster, and have also tested running an MPI job, which has no X
> window function calls, through XGrid, it worked perfectly fine on
> all nodes.
> Thanks a lot for any suggestions!
> devel mailing list
devel mailing list