Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)
From: David Zhang (solarbikedz_at_[hidden])
Date: 2011-02-10 01:58:07


I don't really know what the problem is. It seems like you're doing things
correctly. I'm almost sure you've done all of the following, but just to be
sure:
having the ssh public keys in other computer's authorized_key file.
ssh keys generated without passphrases

On Wed, Feb 9, 2011 at 10:08 PM, Tena Sakai <tsakai_at_[hidden]> wrote:

> Hi,
>
> I have made a bit of progress(?)...
> I made a config file in my .ssh directory on the cloud. It looks like:
> # machine A
> Host domU-12-31-39-07-35-21.compute-1.internal
> HostName domU-12-31-39-07-35-21
> BatchMode yes
> IdentityFile /home/tsakai/.ssh/tsakai
> ChallengeResponseAuthentication no
> IdentitiesOnly yes
>
> # machine B
> Host domU-12-31-39-06-74-E2.compute-1.internal
> HostName domU-12-31-39-06-74-E2
> BatchMode yes
> IdentityFile /home/tsakai/.ssh/tsakai
> ChallengeResponseAuthentication no
> IdentitiesOnly yes
>
> This file exists on both machine A and machine B.
>
> Now When I issue mpirun command as below:
> [tsakai_at_domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2
>
> It hungs. I control-C out of it and I get:
>
> mpirun: killing job...
>
>
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
>
> --------------------------------------------------------------------------
> domU-12-31-39-07-35-21.compute-1.internal - daemon did not report
> back when launched
>
> Am I making progress?
>
> Does this mean I am past authentication and something else is the problem?
> Does someone have an example .ssh/config file I can look at? There are so
> many keyword-argument paris for this config file and I would like to look
> at
> some very basic one that works.
>
>
> Thank you.
>
> Tena Sakai
> tsakai_at_[hidden]
>
> On 2/9/11 7:52 PM, "Tena Sakai" <tsakai_at_[hidden]> wrote:
>
> Hi
>
> I have an app.ac1 file like below:
> [tsakai_at_vixen local]$ cat app.ac1
> -H vixen.egcrc.org -np 1 Rscript
> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5
> -H vixen.egcrc.org -np 1 Rscript
> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6
> -H blitzen.egcrc.org -np 1 Rscript
> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7
> -H blitzen.egcrc.org -np 1 Rscript
> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8
>
> The program I run is
> Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x
> Where x is [5..8]. The machines vixen and blitzen each run 2 runs.
>
> Here’s the program fib.R:
> [ tsakai_at_vixen local]$ cat fib.R
> # fib() computes, given index n, fibonacci number iteratively
> # here's the first dozen sequence (indexed from 0..11)
> # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
>
> fib <- function( n ) {
> a <- 0
> b <- 1
> for ( i in 1:n ) {
> t <- b
> b <- a
> a <- a + t
> }
> a
>
> arg <- commandArgs( TRUE )
> myHost <- system( 'hostname', intern=TRUE )
> cat( fib(arg), myHost, '\n' )
>
> It reads an argument from command line and produces a fibonacci number that
> corresponds to that index, followed by the machine name. Pretty simple
> stuff.
>
> Here’s the run output:
> [tsakai_at_vixen local]$ mpirun -app app.ac1
> 5 vixen.egcrc.org
> 8 vixen.egcrc.org
> 13 blitzen.egcrc.org
> 21 blitzen.egcrc.org
>
> Which is exactly what I expect. So far so good.
>
> Now I want to run the same thing on cloud. I launch 2 instances of the
> same
> virtual machine, to which I get to by:
> [tsakai_at_vixen local]$ ssh –A –I ~/.ssh/tsakai
> machine-instance-A-public-dns
>
> Now I am on machine A:
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B without
> password authentication,
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
> domU-12-31-39-00-D1-F2
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai
> domU-12-31-39-0C-C8-01
> Last login: Wed Feb 9 20:51:48 2011 from 10.254.214.4
> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B
> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ hostname
> domU-12-31-39-0C-C8-01
> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine A
> without using password
> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai
> domU-12-31-39-00-D1-F2
> The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)' can't
> be established.
> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
> Are you sure you want to continue connecting (yes/no)? yes
> Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the list
> of known hosts.
> Last login: Wed Feb 9 20:49:34 2011 from 10.215.203.239
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
> domU-12-31-39-00-D1-F2
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ exit
> logout
> Connection to domU-12-31-39-00-D1-F2 closed.
> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ exit
> logout
> Connection to domU-12-31-39-0C-C8-01 closed.
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # back at machine A
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
> domU-12-31-39-00-D1-F2
>
> As you can see, neither machine uses password for authentication; it uses
> public/private key pairs. There is no problem (that I can see) for ssh
> invocation
> from one machine to the other. This is so because I have a copy of public
> key
> and a copy of private key on each instance.
>
> The app.ac file is identical, except the node names:
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ cat app.ac1
> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5
> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6
> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7
> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8
>
> Here’s what happens with mpirun:
>
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1
> tsakai_at_domu-12-31-39-0c-c8-01's password:
> Permission denied, please try again.
> tsakai_at_domu-12-31-39-0c-c8-01's password: mpirun: killing job...
>
>
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
>
> --------------------------------------------------------------------------
>
> mpirun: clean termination accomplished
>
> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>
> Mpirun (or somebody else?) asks me password, which I don’t have.
> I end up typing control-C.
>
> Here’s my question:
> How can I get past authentication by mpirun where there is no password?
>
> I would appreciate your help/insight greatly.
>
> Thank you.
>
> Tena Sakai
> tsakai_at_[hidden]
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
David Zhang
University of California, San Diego