Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)
From: Reuti (reuti_at_[hidden])
Date: 2011-02-10 13:57:27


Hi,

Am 10.02.2011 um 19:11 schrieb Tena Sakai:

>> your local machine is Linux like, but the execution hosts
>> are Macs? I saw the /Users/tsakai/... in your output.
>
> No, my environment is entirely linux. The path to my home
> directory on one host (blitzen) has been known as /Users/tsakai,
> despite it is an nfs mount from vixen (which is known to
> itself as /home/tsakai). For historical reasons, I have
> chosen to give a symbolic link named /Users to vixen's /Home,
> so that I can use consistent path for both vixen and blitzen.

okay. Sometimes the protection of the home directory must be adjusted too, but as you can do it from the command line this shouldn't be an issue.

>> Is this a private cluster (or at least private interfaces)?
>> It would also be an option to use hostbased authentication,
>> which will avoid setting any known_hosts file or passphraseless
>> ssh-keys for each user.
>
> No, it is not a private cluster. It is Amazon EC2. When I
> Ssh from my local machine (vixen) I use its public interface,
> but to address from one amazon cluster node to the other I
> use nodes' private dns names: domU-12-31-39-07-35-21 and
> domU-12-31-39-06-74-E2. Both public and private dns names
> change from a launch to another. I am using passphrasesless
> ssh-keys for authentication in all cases, i.e., from vixen to
> Amazon node A, from amazon node A to amazon node B, and from
> Amazon node B back to A. (Please see my initail post. There
> is a session dialogue for this.) They all work without authen-
> tication dialogue, except a brief initial dialogue:
> The authenticity of host 'domu-xx-xx-xx-xx-xx-x (10.xx.xx.xx)'
> can't be established.
> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
> Are you sure you want to continue connecting (yes/no)?
> to which I say "yes."
> But I am unclear with what you mean by "hostbased authentication"?
> Doesn't that mean with password? If so, it is not an option.

No. It's convenient inside a private cluster as it won't fill each users' known_hosts file and you don't need to create any ssh-keys. But when the hostname changes every time it might also create new hostkeys. It uses hostkeys (private and public), this way it works for all users. Just for reference:

http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html

You could look into it later.

==

- Can you try to use a command when connecting from A to B? E.g. ssh `domU-12-31-39-06-74-E2 ls`. Is this working too?

- What about putting:

LogLevel DEBUG3

In your ~/.ssh/config. Maybe we can see what he's trying to negotiate before it fails in verbose mode.

-- Reuti

> Regards,
>
> Tena
>
>
> On 2/10/11 2:27 AM, "Reuti" <reuti_at_[hidden]> wrote:
>
>> Hi,
>>
>> your local machine is Linux like, but the execution hosts are Macs? I saw the
>> /Users/tsakai/... in your output.
>>
>> a) executing a command on them is also working, e.g.: ssh
>> domU-12-31-39-07-35-21 ls
>>
>> Am 10.02.2011 um 07:08 schrieb Tena Sakai:
>>
>>> Hi,
>>>
>>> I have made a bit of progress(?)...
>>> I made a config file in my .ssh directory on the cloud. It looks like:
>>> # machine A
>>> Host domU-12-31-39-07-35-21.compute-1.internal
>>
>> This is just an abbreviation or nickname above. To use the specified settings,
>> it's necessary to specify exactly this name. When the settings are the same
>> anyway for all machines, you can use:
>>
>> Host *
>> IdentityFile /home/tsakai/.ssh/tsakai
>> IdentitiesOnly yes
>> BatchMode yes
>>
>> instead.
>>
>> Is this a private cluster (or at least private interfaces)? It would also be
>> an option to use hostbased authentication, which will avoid setting any
>> known_hosts file or passphraseless ssh-keys for each user.
>>
>> -- Reuti
>>
>>
>>> HostName domU-12-31-39-07-35-21
>>> BatchMode yes
>>> IdentityFile /home/tsakai/.ssh/tsakai
>>> ChallengeResponseAuthentication no
>>> IdentitiesOnly yes
>>>
>>> # machine B
>>> Host domU-12-31-39-06-74-E2.compute-1.internal
>>> HostName domU-12-31-39-06-74-E2
>>> BatchMode yes
>>> IdentityFile /home/tsakai/.ssh/tsakai
>>> ChallengeResponseAuthentication no
>>> IdentitiesOnly yes
>>>
>>> This file exists on both machine A and machine B.
>>>
>>> Now When I issue mpirun command as below:
>>> [tsakai_at_domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2
>>>
>>> It hungs. I control-C out of it and I get:
>>> mpirun: killing job...
>>>
>>>
>>> --------------------------------------------------------------------------
>>> mpirun noticed that the job aborted, but has no info as to the process
>>> that caused that situation.
>>>
>>> --------------------------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------
>>> mpirun was unable to cleanly terminate the daemons on the nodes shown
>>> below. Additional manual cleanup may be required - please refer to
>>> the "orte-clean" tool for assistance.
>>>
>>> --------------------------------------------------------------------------
>>> domU-12-31-39-07-35-21.compute-1.internal - daemon did not report
>>> back when launched
>>>
>>> Am I making progress?
>>>
>>> Does this mean I am past authentication and something else is the problem?
>>> Does someone have an example .ssh/config file I can look at? There are so
>>> many keyword-argument paris for this config file and I would like to look at
>>> some very basic one that works.
>>>
>>> Thank you.
>>>
>>> Tena Sakai
>>> tsakai_at_[hidden]
>>>
>>> On 2/9/11 7:52 PM, "Tena Sakai" <tsakai_at_[hidden]> wrote:
>>>
>>>> Hi
>>>>
>>>> I have an app.ac1 file like below:
>>>> [tsakai_at_vixen local]$ cat app.ac1
>>>> -H vixen.egcrc.org -np 1 Rscript
>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5
>>>> -H vixen.egcrc.org -np 1 Rscript
>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6
>>>> -H blitzen.egcrc.org -np 1 Rscript
>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7
>>>> -H blitzen.egcrc.org -np 1 Rscript
>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8
>>>>
>>>> The program I run is
>>>> Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x
>>>> Where x is [5..8]. The machines vixen and blitzen each run 2 runs.
>>>>
>>>> Here’s the program fib.R:
>>>> [ tsakai_at_vixen local]$ cat fib.R
>>>> # fib() computes, given index n, fibonacci number iteratively
>>>> # here's the first dozen sequence (indexed from 0..11)
>>>> # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
>>>>
>>>> fib <- function( n ) {
>>>> a <- 0
>>>> b <- 1
>>>> for ( i in 1:n ) {
>>>> t <- b
>>>> b <- a
>>>> a <- a + t
>>>> }
>>>> a
>>>>
>>>> arg <- commandArgs( TRUE )
>>>> myHost <- system( 'hostname', intern=TRUE )
>>>> cat( fib(arg), myHost, '\n' )
>>>>
>>>> It reads an argument from command line and produces a fibonacci number that
>>>> corresponds to that index, followed by the machine name. Pretty simple
>>>> stuff.
>>>>
>>>> Here’s the run output:
>>>> [tsakai_at_vixen local]$ mpirun -app app.ac1
>>>> 5 vixen.egcrc.org
>>>> 8 vixen.egcrc.org
>>>> 13 blitzen.egcrc.org
>>>> 21 blitzen.egcrc.org
>>>>
>>>> Which is exactly what I expect. So far so good.
>>>>
>>>> Now I want to run the same thing on cloud. I launch 2 instances of the same
>>>> virtual machine, to which I get to by:
>>>> [tsakai_at_vixen local]$ ssh –A –I ~/.ssh/tsakai
>>>> machine-instance-A-public-dns
>>>>
>>>> Now I am on machine A:
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B without
>>>> password authentication,
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
>>>> domU-12-31-39-00-D1-F2
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai
>>>> domU-12-31-39-0C-C8-01
>>>> Last login: Wed Feb 9 20:51:48 2011 from 10.254.214.4
>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B
>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ hostname
>>>> domU-12-31-39-0C-C8-01
>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine A
>>>> without using password
>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai
>>>> domU-12-31-39-00-D1-F2
>>>> The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)' can't
>>>> be established.
>>>> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
>>>> Are you sure you want to continue connecting (yes/no)? yes
>>>> Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the list of
>>>> known hosts.
>>>> Last login: Wed Feb 9 20:49:34 2011 from 10.215.203.239
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
>>>> domU-12-31-39-00-D1-F2
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ exit
>>>> logout
>>>> Connection to domU-12-31-39-00-D1-F2 closed.
>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ exit
>>>> logout
>>>> Connection to domU-12-31-39-0C-C8-01 closed.
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # back at machine A
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
>>>> domU-12-31-39-00-D1-F2
>>>>
>>>> As you can see, neither machine uses password for authentication; it uses
>>>> public/private key pairs. There is no problem (that I can see) for ssh
>>>> invocation
>>>> from one machine to the other. This is so because I have a copy of public
>>>> key
>>>> and a copy of private key on each instance.
>>>>
>>>> The app.ac file is identical, except the node names:
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ cat app.ac1
>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5
>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6
>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7
>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8
>>>>
>>>> Here’s what happens with mpirun:
>>>>
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1
>>>> tsakai_at_domu-12-31-39-0c-c8-01's password:
>>>> Permission denied, please try again.
>>>> tsakai_at_domu-12-31-39-0c-c8-01's password: mpirun: killing job...
>>>>
>>>>
>>>> --------------------------------------------------------------------------
>>>> mpirun noticed that the job aborted, but has no info as to the process
>>>> that caused that situation.
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> mpirun: clean termination accomplished
>>>>
>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>
>>>> Mpirun (or somebody else?) asks me password, which I don’t have.
>>>> I end up typing control-C.
>>>>
>>>> Here’s my question:
>>>> How can I get past authentication by mpirun where there is no password?
>>>>
>>>> I would appreciate your help/insight greatly.
>>>>
>>>> Thank you.
>>>>
>>>> Tena Sakai
>>>> tsakai_at_[hidden]
>>>>
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>