Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)
From: Tena Sakai (tsakai_at_[hidden])
Date: 2011-02-10 16:03:00


Hi Reuti,

Thanks for suggesting "LogLevel DEBUG3." I did so and complete
session is captured in the attached file.

What I did is much similar to what I have done before: verify
that ssh works and then run mpirun command. In my a bit lengthy
session log, there are two responses from "LogLevel DEBUG3." First
from an scp invocation and then from mpirun invocation. They both
say
    debug1: Authentication succeeded (publickey).

>From mpirun invocation, I see a line:

    debug1: Sending command: orted --daemonize -mca ess env -mca
orte_ess_jobid 3344891904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs
    2 --hnp-uri "3344891904.0;tcp://10.194.95.239:54256"
The IP address at the end of the line is indeed that of machine B.
After that there was hanging and I controlled-C out of it, which
gave me more lines. But the lines after
    debug1: Sending command: orted bla bla bla
doesn't look good to me. But, in truth, I have no idea what they
mean.

If you could shed some light, I would appreciate it very much.

Regards,

Tena

On 2/10/11 10:57 AM, "Reuti" <reuti_at_[hidden]> wrote:

> Hi,
>
> Am 10.02.2011 um 19:11 schrieb Tena Sakai:
>
>>> your local machine is Linux like, but the execution hosts
>>> are Macs? I saw the /Users/tsakai/... in your output.
>>
>> No, my environment is entirely linux. The path to my home
>> directory on one host (blitzen) has been known as /Users/tsakai,
>> despite it is an nfs mount from vixen (which is known to
>> itself as /home/tsakai). For historical reasons, I have
>> chosen to give a symbolic link named /Users to vixen's /Home,
>> so that I can use consistent path for both vixen and blitzen.
>
> okay. Sometimes the protection of the home directory must be adjusted too, but
> as you can do it from the command line this shouldn't be an issue.
>
>
>>> Is this a private cluster (or at least private interfaces)?
>>> It would also be an option to use hostbased authentication,
>>> which will avoid setting any known_hosts file or passphraseless
>>> ssh-keys for each user.
>>
>> No, it is not a private cluster. It is Amazon EC2. When I
>> Ssh from my local machine (vixen) I use its public interface,
>> but to address from one amazon cluster node to the other I
>> use nodes' private dns names: domU-12-31-39-07-35-21 and
>> domU-12-31-39-06-74-E2. Both public and private dns names
>> change from a launch to another. I am using passphrasesless
>> ssh-keys for authentication in all cases, i.e., from vixen to
>> Amazon node A, from amazon node A to amazon node B, and from
>> Amazon node B back to A. (Please see my initail post. There
>> is a session dialogue for this.) They all work without authen-
>> tication dialogue, except a brief initial dialogue:
>> The authenticity of host 'domu-xx-xx-xx-xx-xx-x (10.xx.xx.xx)'
>> can't be established.
>> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
>> Are you sure you want to continue connecting (yes/no)?
>> to which I say "yes."
>> But I am unclear with what you mean by "hostbased authentication"?
>> Doesn't that mean with password? If so, it is not an option.
>
> No. It's convenient inside a private cluster as it won't fill each users'
> known_hosts file and you don't need to create any ssh-keys. But when the
> hostname changes every time it might also create new hostkeys. It uses
> hostkeys (private and public), this way it works for all users. Just for
> reference:
>
> http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html
>
> You could look into it later.
>
> ==
>
> - Can you try to use a command when connecting from A to B? E.g. ssh
> `domU-12-31-39-06-74-E2 ls`. Is this working too?
>
> - What about putting:
>
> LogLevel DEBUG3
>
> In your ~/.ssh/config. Maybe we can see what he's trying to negotiate before
> it fails in verbose mode.
>
>
> -- Reuti
>
>
>
>> Regards,
>>
>> Tena
>>
>>
>> On 2/10/11 2:27 AM, "Reuti" <reuti_at_[hidden]> wrote:
>>
>>> Hi,
>>>
>>> your local machine is Linux like, but the execution hosts are Macs? I saw
>>> the
>>> /Users/tsakai/... in your output.
>>>
>>> a) executing a command on them is also working, e.g.: ssh
>>> domU-12-31-39-07-35-21 ls
>>>
>>> Am 10.02.2011 um 07:08 schrieb Tena Sakai:
>>>
>>>> Hi,
>>>>
>>>> I have made a bit of progress(?)...
>>>> I made a config file in my .ssh directory on the cloud. It looks like:
>>>> # machine A
>>>> Host domU-12-31-39-07-35-21.compute-1.internal
>>>
>>> This is just an abbreviation or nickname above. To use the specified
>>> settings,
>>> it's necessary to specify exactly this name. When the settings are the same
>>> anyway for all machines, you can use:
>>>
>>> Host *
>>> IdentityFile /home/tsakai/.ssh/tsakai
>>> IdentitiesOnly yes
>>> BatchMode yes
>>>
>>> instead.
>>>
>>> Is this a private cluster (or at least private interfaces)? It would also be
>>> an option to use hostbased authentication, which will avoid setting any
>>> known_hosts file or passphraseless ssh-keys for each user.
>>>
>>> -- Reuti
>>>
>>>
>>>> HostName domU-12-31-39-07-35-21
>>>> BatchMode yes
>>>> IdentityFile /home/tsakai/.ssh/tsakai
>>>> ChallengeResponseAuthentication no
>>>> IdentitiesOnly yes
>>>>
>>>> # machine B
>>>> Host domU-12-31-39-06-74-E2.compute-1.internal
>>>> HostName domU-12-31-39-06-74-E2
>>>> BatchMode yes
>>>> IdentityFile /home/tsakai/.ssh/tsakai
>>>> ChallengeResponseAuthentication no
>>>> IdentitiesOnly yes
>>>>
>>>> This file exists on both machine A and machine B.
>>>>
>>>> Now When I issue mpirun command as below:
>>>> [tsakai_at_domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2
>>>>
>>>> It hungs. I control-C out of it and I get:
>>>> mpirun: killing job...
>>>>
>>>>
>>>> --------------------------------------------------------------------------
>>>> mpirun noticed that the job aborted, but has no info as to the process
>>>> that caused that situation.
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> --------------------------------------------------------------------------
>>>> mpirun was unable to cleanly terminate the daemons on the nodes shown
>>>> below. Additional manual cleanup may be required - please refer to
>>>> the "orte-clean" tool for assistance.
>>>>
>>>> --------------------------------------------------------------------------
>>>> domU-12-31-39-07-35-21.compute-1.internal - daemon did not report
>>>> back when launched
>>>>
>>>> Am I making progress?
>>>>
>>>> Does this mean I am past authentication and something else is the problem?
>>>> Does someone have an example .ssh/config file I can look at? There are so
>>>> many keyword-argument paris for this config file and I would like to look
>>>> at
>>>> some very basic one that works.
>>>>
>>>> Thank you.
>>>>
>>>> Tena Sakai
>>>> tsakai_at_[hidden]
>>>>
>>>> On 2/9/11 7:52 PM, "Tena Sakai" <tsakai_at_[hidden]> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I have an app.ac1 file like below:
>>>>> [tsakai_at_vixen local]$ cat app.ac1
>>>>> -H vixen.egcrc.org -np 1 Rscript
>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5
>>>>> -H vixen.egcrc.org -np 1 Rscript
>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6
>>>>> -H blitzen.egcrc.org -np 1 Rscript
>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7
>>>>> -H blitzen.egcrc.org -np 1 Rscript
>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8
>>>>>
>>>>> The program I run is
>>>>> Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x
>>>>> Where x is [5..8]. The machines vixen and blitzen each run 2 runs.
>>>>>
>>>>> Here¹s the program fib.R:
>>>>> [ tsakai_at_vixen local]$ cat fib.R
>>>>> # fib() computes, given index n, fibonacci number iteratively
>>>>> # here's the first dozen sequence (indexed from 0..11)
>>>>> # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
>>>>>
>>>>> fib <- function( n ) {
>>>>> a <- 0
>>>>> b <- 1
>>>>> for ( i in 1:n ) {
>>>>> t <- b
>>>>> b <- a
>>>>> a <- a + t
>>>>> }
>>>>> a
>>>>>
>>>>> arg <- commandArgs( TRUE )
>>>>> myHost <- system( 'hostname', intern=TRUE )
>>>>> cat( fib(arg), myHost, '\n' )
>>>>>
>>>>> It reads an argument from command line and produces a fibonacci number
>>>>> that
>>>>> corresponds to that index, followed by the machine name. Pretty simple
>>>>> stuff.
>>>>>
>>>>> Here¹s the run output:
>>>>> [tsakai_at_vixen local]$ mpirun -app app.ac1
>>>>> 5 vixen.egcrc.org
>>>>> 8 vixen.egcrc.org
>>>>> 13 blitzen.egcrc.org
>>>>> 21 blitzen.egcrc.org
>>>>>
>>>>> Which is exactly what I expect. So far so good.
>>>>>
>>>>> Now I want to run the same thing on cloud. I launch 2 instances of the
>>>>> same
>>>>> virtual machine, to which I get to by:
>>>>> [tsakai_at_vixen local]$ ssh ­A ­I ~/.ssh/tsakai
>>>>> machine-instance-A-public-dns
>>>>>
>>>>> Now I am on machine A:
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B without
>>>>> password authentication,
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>> domU-12-31-39-00-D1-F2
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai
>>>>> domU-12-31-39-0C-C8-01
>>>>> Last login: Wed Feb 9 20:51:48 2011 from 10.254.214.4
>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B
>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ hostname
>>>>> domU-12-31-39-0C-C8-01
>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine A
>>>>> without using password
>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai
>>>>> domU-12-31-39-00-D1-F2
>>>>> The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)' can't
>>>>> be established.
>>>>> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
>>>>> Are you sure you want to continue connecting (yes/no)? yes
>>>>> Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the list
>>>>> of
>>>>> known hosts.
>>>>> Last login: Wed Feb 9 20:49:34 2011 from 10.215.203.239
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>> domU-12-31-39-00-D1-F2
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ exit
>>>>> logout
>>>>> Connection to domU-12-31-39-00-D1-F2 closed.
>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ exit
>>>>> logout
>>>>> Connection to domU-12-31-39-0C-C8-01 closed.
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # back at machine A
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>> domU-12-31-39-00-D1-F2
>>>>>
>>>>> As you can see, neither machine uses password for authentication; it uses
>>>>> public/private key pairs. There is no problem (that I can see) for ssh
>>>>> invocation
>>>>> from one machine to the other. This is so because I have a copy of public
>>>>> key
>>>>> and a copy of private key on each instance.
>>>>>
>>>>> The app.ac file is identical, except the node names:
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ cat app.ac1
>>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5
>>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6
>>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7
>>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8
>>>>>
>>>>> Here¹s what happens with mpirun:
>>>>>
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1
>>>>> tsakai_at_domu-12-31-39-0c-c8-01's password:
>>>>> Permission denied, please try again.
>>>>> tsakai_at_domu-12-31-39-0c-c8-01's password: mpirun: killing job...
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> mpirun noticed that the job aborted, but has no info as to the process
>>>>> that caused that situation.
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> mpirun: clean termination accomplished
>>>>>
>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>>
>>>>> Mpirun (or somebody else?) asks me password, which I don¹t have.
>>>>> I end up typing control-C.
>>>>>
>>>>> Here¹s my question:
>>>>> How can I get past authentication by mpirun where there is no password?
>>>>>
>>>>> I would appreciate your help/insight greatly.
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Tena Sakai
>>>>> tsakai_at_[hidden]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users