Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)
From: Reuti (reuti_at_[hidden])
Date: 2011-02-10 05:27:55


Hi,

your local machine is Linux like, but the execution hosts are Macs? I saw the /Users/tsakai/... in your output.

a) executing a command on them is also working, e.g.: ssh domU-12-31-39-07-35-21 ls

Am 10.02.2011 um 07:08 schrieb Tena Sakai:

> Hi,
>
> I have made a bit of progress(?)...
> I made a config file in my .ssh directory on the cloud. It looks like:
> # machine A
> Host domU-12-31-39-07-35-21.compute-1.internal

This is just an abbreviation or nickname above. To use the specified settings, it's necessary to specify exactly this name. When the settings are the same anyway for all machines, you can use:

Host *
    IdentityFile /home/tsakai/.ssh/tsakai
    IdentitiesOnly yes
    BatchMode yes

instead.

Is this a private cluster (or at least private interfaces)? It would also be an option to use hostbased authentication, which will avoid setting any known_hosts file or passphraseless ssh-keys for each user.

-- Reuti

> HostName domU-12-31-39-07-35-21
> BatchMode yes
> IdentityFile /home/tsakai/.ssh/tsakai
> ChallengeResponseAuthentication no
> IdentitiesOnly yes
>
> # machine B
> Host domU-12-31-39-06-74-E2.compute-1.internal
> HostName domU-12-31-39-06-74-E2
> BatchMode yes
> IdentityFile /home/tsakai/.ssh/tsakai
> ChallengeResponseAuthentication no
> IdentitiesOnly yes
>
> This file exists on both machine A and machine B.
>
> Now When I issue mpirun command as below:
> [tsakai_at_domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2
>
> It hungs. I control-C out of it and I get:
> mpirun: killing job...
>
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
> --------------------------------------------------------------------------
> domU-12-31-39-07-35-21.compute-1.internal - daemon did not report back when launched
>
> Am I making progress?
>
> Does this mean I am past authentication and something else is the problem?
> Does someone have an example .ssh/config file I can look at? There are so
> many keyword-argument paris for this config file and I would like to look at
> some very basic one that works.
>
> Thank you.
>
> Tena Sakai
> tsakai_at_[hidden]
>
> On 2/9/11 7:52 PM, "Tena Sakai" <tsakai_at_[hidden]> wrote:
>
>> Hi
>>
>> I have an app.ac1 file like below:
>> [tsakai_at_vixen local]$ cat app.ac1
>> -H vixen.egcrc.org -np 1 Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5
>> -H vixen.egcrc.org -np 1 Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6
>> -H blitzen.egcrc.org -np 1 Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7
>> -H blitzen.egcrc.org -np 1 Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8
>>
>> The program I run is
>> Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x
>> Where x is [5..8]. The machines vixen and blitzen each run 2 runs.
>>
>> Here’s the program fib.R:
>> [ tsakai_at_vixen local]$ cat fib.R
>> # fib() computes, given index n, fibonacci number iteratively
>> # here's the first dozen sequence (indexed from 0..11)
>> # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
>>
>> fib <- function( n ) {
>> a <- 0
>> b <- 1
>> for ( i in 1:n ) {
>> t <- b
>> b <- a
>> a <- a + t
>> }
>> a
>>
>> arg <- commandArgs( TRUE )
>> myHost <- system( 'hostname', intern=TRUE )
>> cat( fib(arg), myHost, '\n' )
>>
>> It reads an argument from command line and produces a fibonacci number that
>> corresponds to that index, followed by the machine name. Pretty simple stuff.
>>
>> Here’s the run output:
>> [tsakai_at_vixen local]$ mpirun -app app.ac1
>> 5 vixen.egcrc.org
>> 8 vixen.egcrc.org
>> 13 blitzen.egcrc.org
>> 21 blitzen.egcrc.org
>>
>> Which is exactly what I expect. So far so good.
>>
>> Now I want to run the same thing on cloud. I launch 2 instances of the same
>> virtual machine, to which I get to by:
>> [tsakai_at_vixen local]$ ssh –A –I ~/.ssh/tsakai machine-instance-A-public-dns
>>
>> Now I am on machine A:
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B without password authentication,
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
>> domU-12-31-39-00-D1-F2
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai domU-12-31-39-0C-C8-01
>> Last login: Wed Feb 9 20:51:48 2011 from 10.254.214.4
>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B
>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ hostname
>> domU-12-31-39-0C-C8-01
>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine A without using password
>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai domU-12-31-39-00-D1-F2
>> The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)' can't be established.
>> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
>> Are you sure you want to continue connecting (yes/no)? yes
>> Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the list of known hosts.
>> Last login: Wed Feb 9 20:49:34 2011 from 10.215.203.239
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
>> domU-12-31-39-00-D1-F2
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ exit
>> logout
>> Connection to domU-12-31-39-00-D1-F2 closed.
>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ exit
>> logout
>> Connection to domU-12-31-39-0C-C8-01 closed.
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # back at machine A
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
>> domU-12-31-39-00-D1-F2
>>
>> As you can see, neither machine uses password for authentication; it uses
>> public/private key pairs. There is no problem (that I can see) for ssh invocation
>> from one machine to the other. This is so because I have a copy of public key
>> and a copy of private key on each instance.
>>
>> The app.ac file is identical, except the node names:
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ cat app.ac1
>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5
>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6
>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7
>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8
>>
>> Here’s what happens with mpirun:
>>
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1
>> tsakai_at_domu-12-31-39-0c-c8-01's password:
>> Permission denied, please try again.
>> tsakai_at_domu-12-31-39-0c-c8-01's password: mpirun: killing job...
>>
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>>
>> mpirun: clean termination accomplished
>>
>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>
>> Mpirun (or somebody else?) asks me password, which I don’t have.
>> I end up typing control-C.
>>
>> Here’s my question:
>> How can I get past authentication by mpirun where there is no password?
>>
>> I would appreciate your help/insight greatly.
>>
>> Thank you.
>>
>> Tena Sakai
>> tsakai_at_[hidden]
>>
>>
>>
>>
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users