Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] delay in launch?
From: Reuti (reuti_at_[hidden])
Date: 2009-01-16 17:57:12


Am 16.01.2009 um 23:06 schrieb Reuti:

> Am 16.01.2009 um 22:20 schrieb Jeff Dusenberry:
>> Reuti wrote:
>>> Am 15.01.2009 um 16:20 schrieb Jeff Dusenberry:
>>>> I'm trying to launch multiple xterms under OpenMPI 1.2.8 and the
>>>> SGE job scheduler for purposes of running a serial debugger.
>>>> I'm experiencing file-locking problems on the .Xauthority file.
>>>>
>>>> I tried to fix this by asking for a delay between successive
>>>> launches, to reduce the chances of contention for the lock by:
>>>>
>>>> ~$ qrsh -pe mpi 4 -P CIS /share/apps/openmpi/bin/mpiexec --mca
>>>> pls_rsh_debug 1 --mca pls_rsh_delay 5 xterm
>>>>
>>>> The 'pls_rsh_delay 5' parameter seems to have no effect. I
>>>> tried replacing 'pls_rsh_debug 1' with 'orte_debug 1', which
>>>> gave me additional debugging output, but didn't fix the file
>>>> locking problem.
>>>>
>>>> Sometimes the above commands will work and I will get all 4
>>>> xterms, but more often I will get an error:
>>>>
>>>> /usr/bin/X11/xauth: error in locking authority file /export/
>>>> home/duse/.Xauthority
>>>>
>>>> followed by
>>>>
>>>> X11 connection rejected because of wrong authentication.
>>>> xterm Xt error: Can't open display: localhost:11.0
>>>>
>>>> and one or more of the xterms will fail to open.
>>>>
>>>> Am I missing something? Is there another debug flag I need to
>>>> set? Any suggestions for a better way to do this would be
>>>> appreciated.
>>> You are right that it's neither Open MPI's, nor SGE's fault, but
>>> a race condition in the SSH startup. You defined SSH with X11
>>> forwarding in SGE (qconf -mconf) - right? Then you have first a
>>> ssh connection from your workstation to the login-machine. Then
>>> from the login-machine to the node where the mpiexec runs. And
>>> then one for each slave node (means an additonal one on the
>>> machine where mpiexec is already executed).
>>
>> Yes, that's all correct. Clearly not very efficient, but I
>> haven't had any luck getting xauth or xhost to work more directly.
>>
>>> Although it might be possible to give every started sshd an
>>> unique .Xauthority file, it's not straight forward to implement
>>> due to SGE's startup of the daemons and you would need a
>>> sophisticated ~/.ssh/rc to create the files at different location
>>> and use it in the forthcoming xterm.
>>
>> Thanks, that helped a lot, but I still can't quite get it to
>> work. I do want the xterms to run mpi jobs.
>
> Do you need the X11 forwarding then for your application, and xterm
> was just an example?
>
>> I tried this sshrc script (modified from the sshd man page):
>>
>> XAUTHORITY=/local/$USER/.Xauthority${SSH_TTY##*/}
>> export XAUTHORITY
>> if read proto cookie && [ -n "$DISPLAY" ]; then
>> if [ `echo $DISPLAY | cut -c1-10` = 'localhost:' ]; then
>> # X11UseLocalhost=yes
>> echo add unix:`echo $DISPLAY | cut -c11-` $proto
>> $cookie
>> else
>> # X11UseLocalhost=no
>> echo add $DISPLAY $proto $cookie
>> fi | xauth -q -
>> fi
>
> Yes, but the created session also needs it. I mean: you login to a
> node with the above script. Then in the shell you execute:
>
> $ xauth list
>
> and you will get the default ~/.Xauthoriry Also in the shell you
> need to export the above variable to get the listing of the created
> special Xauthority file from the correct location. You can add:
>
> export XAUTHORITY=/local/$USER/.Xauthority${SSH_TTY##*/}
>
> to .bascrc and .profile (for non-interactive [mpiexec] and
> interactive use)
>
> For the SGE SSH_TTY issue I mentioned it's no straight forward.
> When the SSH starts there is nothing defined by SGE. You could try
> to look in the process chain (whether it's running under SGE), but
> it doesn't look nice. I look into another solution and let you
> know, when I found something.

What might be used is something to send and accept environment
variables and use it instead of the SSH_TTY. I.e. in SGE's setup:

rsh_command /usr/bin/ssh -osendenv=rank

and in the sshd_config:

AcceptEnv rank

Now the enviroment rank must be set for each mpi process and it
should work.

-- Reuti

> -- Reuti
>
>
>> and I am successful in creating a unique .Xauthority for each
>> process locally on each node when I log in via ssh directly.
>> Unfortunately, I do have to provide another definition of
>> XAUTHORITY somewhere in my startup scripts - the one above does
>> not get seen outside of the sshrc execution.
>>
>> When I try to run this under qrsh/mpiexec, it acts as if it
>> doesn't have the SSH_TTY environment variable (is that due to
>> SGE?), and we're back to a race condition. Is there another
>> variable I can use in the sge/mpi context? I also don't
>> understand where I would define the XAUTHORITY variable when
>> running under mpiexec.
>> I'm not sure this is the best way to approach this - I was
>> originally hoping that the mpiexec call would have a way to
>> introduce a delay between successive launches but that doesn't
>> seem to be working either.
>>
>> Jeff
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users