Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)
From: Gus Correa (gus_at_[hidden])
Date: 2011-02-11 13:07:32


Hi Tena

Since root can but you can't,
is is a directory permission problem perhaps?
Check the execution directory permission (on both machines,
if this is not NFS mounted dir).
I am not sure, but IIRR OpenMPI also uses /tmp for
under-the-hood stuff, worth checking permissions there also.
Just a naive guess.

Congrats for all the progress with the cloudy MPI!

Gus Correa

Tena Sakai wrote:
> Hi,
>
> I have made a bit more progress. I think I can say ssh authenti-
> cation problem is behind me now. I am still having a problem running
> mpirun, but the latest discovery, which I can reproduce, is that
> I can run mpirun as root. Here's the session log:
>
> [tsakai_at_vixen ec2]$ 2ec2 ec2-184-73-104-242.compute-1.amazonaws.com
> Last login: Fri Feb 11 00:41:11 2011 from 10.100.243.195
> [tsakai_at_ip-10-195-198-31 ~]$
> [tsakai_at_ip-10-195-198-31 ~]$ ll
> total 8
> -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:47 app.ac
> -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:48 fib.R
> [tsakai_at_ip-10-195-198-31 ~]$
> [tsakai_at_ip-10-195-198-31 ~]$ ll .ssh
> total 16
> -rw------- 1 tsakai tsakai 232 Feb 5 23:19 authorized_keys
> -rw------- 1 tsakai tsakai 102 Feb 11 00:34 config
> -rw-r--r-- 1 tsakai tsakai 1302 Feb 11 00:36 known_hosts
> -rw------- 1 tsakai tsakai 887 Feb 8 22:03 tsakai
> [tsakai_at_ip-10-195-198-31 ~]$
> [tsakai_at_ip-10-195-198-31 ~]$ ssh ip-10-100-243-195.ec2.internal
> Last login: Fri Feb 11 00:36:20 2011 from 10.195.198.31
> [tsakai_at_ip-10-100-243-195 ~]$
> [tsakai_at_ip-10-100-243-195 ~]$ # I am on machine B
> [tsakai_at_ip-10-100-243-195 ~]$ hostname
> ip-10-100-243-195
> [tsakai_at_ip-10-100-243-195 ~]$
> [tsakai_at_ip-10-100-243-195 ~]$ ll
> total 8
> -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:44 app.ac
> -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:47 fib.R
> [tsakai_at_ip-10-100-243-195 ~]$
> [tsakai_at_ip-10-100-243-195 ~]$
> [tsakai_at_ip-10-100-243-195 ~]$ cat app.ac
> -H ip-10-195-198-31.ec2.internal -np 1 Rscript /home/tsakai/fib.R 5
> -H ip-10-195-198-31.ec2.internal -np 1 Rscript /home/tsakai/fib.R 6
> -H ip-10-100-243-195.ec2.internal -np 1 Rscript /home/tsakai/fib.R 7
> -H ip-10-100-243-195.ec2.internal -np 1 Rscript /home/tsakai/fib.R 8
> [tsakai_at_ip-10-100-243-195 ~]$
> [tsakai_at_ip-10-100-243-195 ~]$ # go back to machine A
> [tsakai_at_ip-10-100-243-195 ~]$
> [tsakai_at_ip-10-100-243-195 ~]$ exit
> logout
> Connection to ip-10-100-243-195.ec2.internal closed.
> [tsakai_at_ip-10-195-198-31 ~]$
> [tsakai_at_ip-10-195-198-31 ~]$ hostname
> ip-10-195-198-31
> [tsakai_at_ip-10-195-198-31 ~]$
> [tsakai_at_ip-10-195-198-31 ~]$ # Execute mpirun
> [tsakai_at_ip-10-195-198-31 ~]$
> [tsakai_at_ip-10-195-198-31 ~]$ mpirun -app app.ac
> --------------------------------------------------------------------------
> mpirun was unable to launch the specified application as it encountered an
> error:
>
> Error: pipe function call failed when setting up I/O forwarding subsystem
> Node: ip-10-195-198-31
>
> while attempting to start process rank 0.
> --------------------------------------------------------------------------
> [tsakai_at_ip-10-195-198-31 ~]$
> [tsakai_at_ip-10-195-198-31 ~]$ # try it as root
> [tsakai_at_ip-10-195-198-31 ~]$
> [tsakai_at_ip-10-195-198-31 ~]$ sudo su
> bash-3.2#
> bash-3.2# pwd
> /home/tsakai
> bash-3.2#
> bash-3.2# ls -l /root/.ssh/config
> -rw------- 1 root root 103 Feb 11 00:56 /root/.ssh/config
> bash-3.2#
> bash-3.2# cat /root/.ssh/config
> Host *
> IdentityFile /root/.ssh/.derobee/.kagi
> IdentitiesOnly yes
> BatchMode yes
> bash-3.2#
> bash-3.2# pwd
> /home/tsakai
> bash-3.2#
> bash-3.2# ls -l
> total 8
> -rw-rw-r-- 1 tsakai tsakai 274 Feb 11 00:47 app.ac
> -rwxr-xr-x 1 tsakai tsakai 379 Feb 11 00:48 fib.R
> bash-3.2#
> bash-3.2# # now is the time for mpirun
> bash-3.2#
> bash-3.2# mpirun --app ./app.ac
> 13 ip-10-100-243-195
> 21 ip-10-100-243-195
> 5 ip-10-195-198-31
> 8 ip-10-195-198-31
> bash-3.2#
> bash-3.2# # It works (being root)!
> bash-3.2#
> bash-3.2# exit
> exit
> [tsakai_at_ip-10-195-198-31 ~]$
> [tsakai_at_ip-10-195-198-31 ~]$ # try it one more time as tsakai
> [tsakai_at_ip-10-195-198-31 ~]$
> [tsakai_at_ip-10-195-198-31 ~]$ mpirun --app app.ac
> --------------------------------------------------------------------------
> mpirun was unable to launch the specified application as it encountered an
> error:
>
> Error: pipe function call failed when setting up I/O forwarding subsystem
> Node: ip-10-195-198-31
>
> while attempting to start process rank 0.
> --------------------------------------------------------------------------
> [tsakai_at_ip-10-195-198-31 ~]$
> [tsakai_at_ip-10-195-198-31 ~]$ # I don't get it.
> [tsakai_at_ip-10-195-198-31 ~]$
> [tsakai_at_ip-10-195-198-31 ~]$ exit
> logout
> [tsakai_at_vixen ec2]$
>
> So, why does it say "pipe function call failed when setting up
> I/O forwarding subsystem Node: ip-10-195-198-31" ?
> The node it is referring to is not the remote machine. It is
> What I call machine A. I first thought maybe this is a problem
> With PATH variable. But I don't think so. I compared root's
> Path to that of tsaki's and made them identical and retried.
> I got the same behavior.
>
> If you could enlighten me why this is happening, I would really
> Appreciate it.
>
> Thank you.
>
> Tena
>
>
> On 2/10/11 4:12 PM, "Tena Sakai" <tsakai_at_[hidden]> wrote:
>
>> Hi jeff,
>>
>> Thanks for the firewall tip. I tried it while allowing all tip traffic
>> and got interesting and preplexing result. Here's what's interesting
>> (BTW, I got rid of "LogLevel DEBUG3" from ./ssh/config on this run):
>>
>> [tsakai_at_ip-10-203-21-132 ~]$
>> [tsakai_at_ip-10-203-21-132 ~]$ mpirun --app app.ac2
>> Host key verification failed.
>>
>> --------------------------------------------------------------------------
>> A daemon (pid 2743) died unexpectedly with status 255 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>> the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>>
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>>
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>>
>> [tsakai_at_ip-10-203-21-132 ~]$
>> [tsakai_at_ip-10-203-21-132 ~]$ env | grep LD_LIB
>> [tsakai_at_ip-10-203-21-132 ~]$
>> [tsakai_at_ip-10-203-21-132 ~]$ # Let's set LD_LIBRARY_PATH to
>> /usr/local/lib
>> [tsakai_at_ip-10-203-21-132 ~]$
>> [tsakai_at_ip-10-203-21-132 ~]$
>> [tsakai_at_ip-10-203-21-132 ~]$ export LD_LIBRARY_PATH='/usr/local/lib'
>> [tsakai_at_ip-10-203-21-132 ~]$
>> [tsakai_at_ip-10-203-21-132 ~]$ # I better to this on machine B as well
>> [tsakai_at_ip-10-203-21-132 ~]$
>> [tsakai_at_ip-10-203-21-132 ~]$ ssh -i tsakai ip-10-195-171-159
>> Warning: Identity file tsakai not accessible: No such file or directory.
>> Last login: Thu Feb 10 18:31:20 2011 from 10.203.21.132
>> [tsakai_at_ip-10-195-171-159 ~]$
>> [tsakai_at_ip-10-195-171-159 ~]$ export LD_LIBRARY_PATH='/usr/local/lib'
>> [tsakai_at_ip-10-195-171-159 ~]$
>> [tsakai_at_ip-10-195-171-159 ~]$ env | grep LD_LIB
>> LD_LIBRARY_PATH=/usr/local/lib
>> [tsakai_at_ip-10-195-171-159 ~]$
>> [tsakai_at_ip-10-195-171-159 ~]$ # OK, now go bak to machine A
>> [tsakai_at_ip-10-195-171-159 ~]$ exit
>> logout
>> Connection to ip-10-195-171-159 closed.
>> [tsakai_at_ip-10-203-21-132 ~]$
>> [tsakai_at_ip-10-203-21-132 ~]$ hostname
>> ip-10-203-21-132
>> [tsakai_at_ip-10-203-21-132 ~]$ # try mpirun again
>> [tsakai_at_ip-10-203-21-132 ~]$
>> [tsakai_at_ip-10-203-21-132 ~]$ mpirun --app app.ac2
>> Host key verification failed.
>>
>> --------------------------------------------------------------------------
>> A daemon (pid 2789) died unexpectedly with status 255 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>> the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>>
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>>
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>>
>> [tsakai_at_ip-10-203-21-132 ~]$
>> [tsakai_at_ip-10-203-21-132 ~]$ # I thought openmpi library was in
>> /usr/local/lib...
>> [tsakai_at_ip-10-203-21-132 ~]$
>> [tsakai_at_ip-10-203-21-132 ~]$ ll -t /usr/local/lib | less
>> total 16604
>> lrwxrwxrwx 1 root root 16 Feb 8 23:06 libfuse.so ->
>> libfuse.so.2.8.5
>> lrwxrwxrwx 1 root root 16 Feb 8 23:06 libfuse.so.2 ->
>> libfuse.so.2.8.5
>> lrwxrwxrwx 1 root root 25 Feb 8 23:06 libmca_common_sm.so ->
>> libmca_common_sm.so.1.0.0
>> lrwxrwxrwx 1 root root 25 Feb 8 23:06 libmca_common_sm.so.1 ->
>> libmca_common_sm.so.1.0.0
>> lrwxrwxrwx 1 root root 15 Feb 8 23:06 libmpi.so -> libmpi.so.0.0.2
>> lrwxrwxrwx 1 root root 15 Feb 8 23:06 libmpi.so.0 ->
>> libmpi.so.0.0.2
>> lrwxrwxrwx 1 root root 19 Feb 8 23:06 libmpi_cxx.so ->
>> libmpi_cxx.so.0.0.1
>> lrwxrwxrwx 1 root root 19 Feb 8 23:06 libmpi_cxx.so.0 ->
>> libmpi_cxx.so.0.0.1
>> lrwxrwxrwx 1 root root 19 Feb 8 23:06 libmpi_f77.so ->
>> libmpi_f77.so.0.0.1
>> lrwxrwxrwx 1 root root 19 Feb 8 23:06 libmpi_f77.so.0 ->
>> libmpi_f77.so.0.0.1
>> lrwxrwxrwx 1 root root 19 Feb 8 23:06 libmpi_f90.so ->
>> libmpi_f90.so.0.0.1
>> lrwxrwxrwx 1 root root 19 Feb 8 23:06 libmpi_f90.so.0 ->
>> libmpi_f90.so.0.0.1
>> lrwxrwxrwx 1 root root 20 Feb 8 23:06 libopen-pal.so ->
>> libopen-pal.so.0.0.0
>> lrwxrwxrwx 1 root root 20 Feb 8 23:06 libopen-pal.so.0 ->
>> libopen-pal.so.0.0.0
>> lrwxrwxrwx 1 root root 20 Feb 8 23:06 libopen-rte.so ->
>> libopen-rte.so.0.0.0
>> lrwxrwxrwx 1 root root 20 Feb 8 23:06 libopen-rte.so.0 ->
>> libopen-rte.so.0.0.0
>> lrwxrwxrwx 1 root root 26 Feb 8 23:06 libopenmpi_malloc.so ->
>> libopenmpi_malloc.so.0.0.0
>> lrwxrwxrwx 1 root root 26 Feb 8 23:06 libopenmpi_malloc.so.0 ->
>> libopenmpi_malloc.so.0.0.0
>> lrwxrwxrwx 1 root root 20 Feb 8 23:06 libulockmgr.so ->
>> libulockmgr.so.1.0.1
>> lrwxrwxrwx 1 root root 20 Feb 8 23:06 libulockmgr.so.1 ->
>> libulockmgr.so.1.0.1
>> lrwxrwxrwx 1 root root 16 Feb 8 23:06 libxml2.so ->
>> libxml2.so.2.7.2
>> lrwxrwxrwx 1 root root 16 Feb 8 23:06 libxml2.so.2 ->
>> libxml2.so.2.7.2
>> -rw-r--r-- 1 root root 385912 Jan 26 01:00 libvt.a
>> [tsakai_at_ip-10-203-21-132 ~]$
>> [tsakai_at_ip-10-203-21-132 ~]$ # Now, I am really confused...
>> [tsakai_at_ip-10-203-21-132 ~]$
>>
>> Do you know why it's complaining about shared libraries?
>>
>> Thank you.
>>
>> Tena
>>
>>
>> On 2/10/11 1:05 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>>
>>> Your prior mails were about ssh issues, but this one sounds like you might
>>> have firewall issues.
>>>
>>> That is, the "orted" command attempts to open a TCP socket back to mpirun for
>>> various command and control reasons. If it is blocked from doing so by a
>>> firewall, Open MPI won't run. In general, you can either disable your
>>> firewall or you can setup a trust relationship for TCP connections within
>>> your
>>> cluster.
>>>
>>>
>>>
>>> On Feb 10, 2011, at 1:03 PM, Tena Sakai wrote:
>>>
>>>> Hi Reuti,
>>>>
>>>> Thanks for suggesting "LogLevel DEBUG3." I did so and complete
>>>> session is captured in the attached file.
>>>>
>>>> What I did is much similar to what I have done before: verify
>>>> that ssh works and then run mpirun command. In my a bit lengthy
>>>> session log, there are two responses from "LogLevel DEBUG3." First
>>>> from an scp invocation and then from mpirun invocation. They both
>>>> say
>>>> debug1: Authentication succeeded (publickey).
>>>>
>>>>> From mpirun invocation, I see a line:
>>>> debug1: Sending command: orted --daemonize -mca ess env -mca
>>>> orte_ess_jobid 3344891904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs
>>>> 2 --hnp-uri "3344891904.0;tcp://10.194.95.239:54256"
>>>> The IP address at the end of the line is indeed that of machine B.
>>>> After that there was hanging and I controlled-C out of it, which
>>>> gave me more lines. But the lines after
>>>> debug1: Sending command: orted bla bla bla
>>>> doesn't look good to me. But, in truth, I have no idea what they
>>>> mean.
>>>>
>>>> If you could shed some light, I would appreciate it very much.
>>>>
>>>> Regards,
>>>>
>>>> Tena
>>>>
>>>>
>>>> On 2/10/11 10:57 AM, "Reuti" <reuti_at_[hidden]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Am 10.02.2011 um 19:11 schrieb Tena Sakai:
>>>>>
>>>>>>> your local machine is Linux like, but the execution hosts
>>>>>>> are Macs? I saw the /Users/tsakai/... in your output.
>>>>>> No, my environment is entirely linux. The path to my home
>>>>>> directory on one host (blitzen) has been known as /Users/tsakai,
>>>>>> despite it is an nfs mount from vixen (which is known to
>>>>>> itself as /home/tsakai). For historical reasons, I have
>>>>>> chosen to give a symbolic link named /Users to vixen's /Home,
>>>>>> so that I can use consistent path for both vixen and blitzen.
>>>>> okay. Sometimes the protection of the home directory must be adjusted too,
>>>>> but
>>>>> as you can do it from the command line this shouldn't be an issue.
>>>>>
>>>>>
>>>>>>> Is this a private cluster (or at least private interfaces)?
>>>>>>> It would also be an option to use hostbased authentication,
>>>>>>> which will avoid setting any known_hosts file or passphraseless
>>>>>>> ssh-keys for each user.
>>>>>> No, it is not a private cluster. It is Amazon EC2. When I
>>>>>> Ssh from my local machine (vixen) I use its public interface,
>>>>>> but to address from one amazon cluster node to the other I
>>>>>> use nodes' private dns names: domU-12-31-39-07-35-21 and
>>>>>> domU-12-31-39-06-74-E2. Both public and private dns names
>>>>>> change from a launch to another. I am using passphrasesless
>>>>>> ssh-keys for authentication in all cases, i.e., from vixen to
>>>>>> Amazon node A, from amazon node A to amazon node B, and from
>>>>>> Amazon node B back to A. (Please see my initail post. There
>>>>>> is a session dialogue for this.) They all work without authen-
>>>>>> tication dialogue, except a brief initial dialogue:
>>>>>> The authenticity of host 'domu-xx-xx-xx-xx-xx-x (10.xx.xx.xx)'
>>>>>> can't be established.
>>>>>> RSA key fingerprint is e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
>>>>>> Are you sure you want to continue connecting (yes/no)?
>>>>>> to which I say "yes."
>>>>>> But I am unclear with what you mean by "hostbased authentication"?
>>>>>> Doesn't that mean with password? If so, it is not an option.
>>>>> No. It's convenient inside a private cluster as it won't fill each users'
>>>>> known_hosts file and you don't need to create any ssh-keys. But when the
>>>>> hostname changes every time it might also create new hostkeys. It uses
>>>>> hostkeys (private and public), this way it works for all users. Just for
>>>>> reference:
>>>>>
>>>>> http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html
>>>>>
>>>>> You could look into it later.
>>>>>
>>>>> ==
>>>>>
>>>>> - Can you try to use a command when connecting from A to B? E.g. ssh
>>>>> `domU-12-31-39-06-74-E2 ls`. Is this working too?
>>>>>
>>>>> - What about putting:
>>>>>
>>>>> LogLevel DEBUG3
>>>>>
>>>>> In your ~/.ssh/config. Maybe we can see what he's trying to negotiate
>>>>> before
>>>>> it fails in verbose mode.
>>>>>
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Tena
>>>>>>
>>>>>>
>>>>>> On 2/10/11 2:27 AM, "Reuti" <reuti_at_[hidden]> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> your local machine is Linux like, but the execution hosts are Macs? I saw
>>>>>>> the
>>>>>>> /Users/tsakai/... in your output.
>>>>>>>
>>>>>>> a) executing a command on them is also working, e.g.: ssh
>>>>>>> domU-12-31-39-07-35-21 ls
>>>>>>>
>>>>>>> Am 10.02.2011 um 07:08 schrieb Tena Sakai:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have made a bit of progress(?)...
>>>>>>>> I made a config file in my .ssh directory on the cloud. It looks like:
>>>>>>>> # machine A
>>>>>>>> Host domU-12-31-39-07-35-21.compute-1.internal
>>>>>>> This is just an abbreviation or nickname above. To use the specified
>>>>>>> settings,
>>>>>>> it's necessary to specify exactly this name. When the settings are the
>>>>>>> same
>>>>>>> anyway for all machines, you can use:
>>>>>>>
>>>>>>> Host *
>>>>>>> IdentityFile /home/tsakai/.ssh/tsakai
>>>>>>> IdentitiesOnly yes
>>>>>>> BatchMode yes
>>>>>>>
>>>>>>> instead.
>>>>>>>
>>>>>>> Is this a private cluster (or at least private interfaces)? It would also
>>>>>>> be
>>>>>>> an option to use hostbased authentication, which will avoid setting any
>>>>>>> known_hosts file or passphraseless ssh-keys for each user.
>>>>>>>
>>>>>>> -- Reuti
>>>>>>>
>>>>>>>
>>>>>>>> HostName domU-12-31-39-07-35-21
>>>>>>>> BatchMode yes
>>>>>>>> IdentityFile /home/tsakai/.ssh/tsakai
>>>>>>>> ChallengeResponseAuthentication no
>>>>>>>> IdentitiesOnly yes
>>>>>>>>
>>>>>>>> # machine B
>>>>>>>> Host domU-12-31-39-06-74-E2.compute-1.internal
>>>>>>>> HostName domU-12-31-39-06-74-E2
>>>>>>>> BatchMode yes
>>>>>>>> IdentityFile /home/tsakai/.ssh/tsakai
>>>>>>>> ChallengeResponseAuthentication no
>>>>>>>> IdentitiesOnly yes
>>>>>>>>
>>>>>>>> This file exists on both machine A and machine B.
>>>>>>>>
>>>>>>>> Now When I issue mpirun command as below:
>>>>>>>> [tsakai_at_domU-12-31-39-06-74-E2 ~]$ mpirun -app app.ac2
>>>>>>>>
>>>>>>>> It hungs. I control-C out of it and I get:
>>>>>>>> mpirun: killing job...
>>>>>>>>
>>>>>>>>
>>>>>>>>
> ------------------------------------------------------------------------->>>>>>
>> -
>>>>>>>> mpirun noticed that the job aborted, but has no info as to the process
>>>>>>>> that caused that situation.
>>>>>>>>
>>>>>>>>
> ------------------------------------------------------------------------->>>>>>
>> -
>>>>>>>>
> ------------------------------------------------------------------------->>>>>>
>> -
>>>>>>>> mpirun was unable to cleanly terminate the daemons on the nodes shown
>>>>>>>> below. Additional manual cleanup may be required - please refer to
>>>>>>>> the "orte-clean" tool for assistance.
>>>>>>>>
>>>>>>>>
> ------------------------------------------------------------------------->>>>>>
>> -
>>>>>>>> domU-12-31-39-07-35-21.compute-1.internal - daemon did not report
>>>>>>>> back when launched
>>>>>>>>
>>>>>>>> Am I making progress?
>>>>>>>>
>>>>>>>> Does this mean I am past authentication and something else is the
>>>>>>>> problem?
>>>>>>>> Does someone have an example .ssh/config file I can look at? There are
>>>>>>>> so
>>>>>>>> many keyword-argument paris for this config file and I would like to
>>>>>>>> look
>>>>>>>> at
>>>>>>>> some very basic one that works.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>> Tena Sakai
>>>>>>>> tsakai_at_[hidden]
>>>>>>>>
>>>>>>>> On 2/9/11 7:52 PM, "Tena Sakai" <tsakai_at_[hidden]> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> I have an app.ac1 file like below:
>>>>>>>>> [tsakai_at_vixen local]$ cat app.ac1
>>>>>>>>> -H vixen.egcrc.org -np 1 Rscript
>>>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 5
>>>>>>>>> -H vixen.egcrc.org -np 1 Rscript
>>>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 6
>>>>>>>>> -H blitzen.egcrc.org -np 1 Rscript
>>>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 7
>>>>>>>>> -H blitzen.egcrc.org -np 1 Rscript
>>>>>>>>> /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R 8
>>>>>>>>>
>>>>>>>>> The program I run is
>>>>>>>>> Rscript /Users/tsakai/Notes/R/parallel/Rmpi/local/fib.R x
>>>>>>>>> Where x is [5..8]. The machines vixen and blitzen each run 2 runs.
>>>>>>>>>
>>>>>>>>> Here¹s the program fib.R:
>>>>>>>>> [ tsakai_at_vixen local]$ cat fib.R
>>>>>>>>> # fib() computes, given index n, fibonacci number iteratively
>>>>>>>>> # here's the first dozen sequence (indexed from 0..11)
>>>>>>>>> # 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89
>>>>>>>>>
>>>>>>>>> fib <- function( n ) {
>>>>>>>>> a <- 0
>>>>>>>>> b <- 1
>>>>>>>>> for ( i in 1:n ) {
>>>>>>>>> t <- b
>>>>>>>>> b <- a
>>>>>>>>> a <- a + t
>>>>>>>>> }
>>>>>>>>> a
>>>>>>>>>
>>>>>>>>> arg <- commandArgs( TRUE )
>>>>>>>>> myHost <- system( 'hostname', intern=TRUE )
>>>>>>>>> cat( fib(arg), myHost, '\n' )
>>>>>>>>>
>>>>>>>>> It reads an argument from command line and produces a fibonacci number
>>>>>>>>> that
>>>>>>>>> corresponds to that index, followed by the machine name. Pretty simple
>>>>>>>>> stuff.
>>>>>>>>>
>>>>>>>>> Here¹s the run output:
>>>>>>>>> [tsakai_at_vixen local]$ mpirun -app app.ac1
>>>>>>>>> 5 vixen.egcrc.org
>>>>>>>>> 8 vixen.egcrc.org
>>>>>>>>> 13 blitzen.egcrc.org
>>>>>>>>> 21 blitzen.egcrc.org
>>>>>>>>>
>>>>>>>>> Which is exactly what I expect. So far so good.
>>>>>>>>>
>>>>>>>>> Now I want to run the same thing on cloud. I launch 2 instances of the
>>>>>>>>> same
>>>>>>>>> virtual machine, to which I get to by:
>>>>>>>>> [tsakai_at_vixen local]$ ssh ­A ­I ~/.ssh/tsakai
>>>>>>>>> machine-instance-A-public-dns
>>>>>>>>>
>>>>>>>>> Now I am on machine A:
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # and I can go to machine B
>>>>>>>>> without
>>>>>>>>> password authentication,
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # i.e., use public/private key
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>>>>>> domU-12-31-39-00-D1-F2
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ ssh -i .ssh/tsakai
>>>>>>>>> domU-12-31-39-0C-C8-01
>>>>>>>>> Last login: Wed Feb 9 20:51:48 2011 from 10.254.214.4
>>>>>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>>>>>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ # I am now on machine B
>>>>>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ hostname
>>>>>>>>> domU-12-31-39-0C-C8-01
>>>>>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>>>>>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ # now show I can get to machine A
>>>>>>>>> without using password
>>>>>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>>>>>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ ssh -i .ssh/tsakai
>>>>>>>>> domU-12-31-39-00-D1-F2
>>>>>>>>> The authenticity of host 'domu-12-31-39-00-d1-f2 (10.254.214.4)'
>>>>>>>>> can't
>>>>>>>>> be established.
>>>>>>>>> RSA key fingerprint is
>>>>>>>>> e3:ad:75:b1:a4:63:7f:0f:c4:0b:10:71:f3:2f:21:81.
>>>>>>>>> Are you sure you want to continue connecting (yes/no)? yes
>>>>>>>>> Warning: Permanently added 'domu-12-31-39-00-d1-f2' (RSA) to the list
>>>>>>>>> of
>>>>>>>>> known hosts.
>>>>>>>>> Last login: Wed Feb 9 20:49:34 2011 from 10.215.203.239
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>>>>>> domU-12-31-39-00-D1-F2
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ exit
>>>>>>>>> logout
>>>>>>>>> Connection to domU-12-31-39-00-D1-F2 closed.
>>>>>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$
>>>>>>>>> [tsakai_at_domU-12-31-39-0C-C8-01 ~]$ exit
>>>>>>>>> logout
>>>>>>>>> Connection to domU-12-31-39-0C-C8-01 closed.
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ # back at machine A
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ hostname
>>>>>>>>> domU-12-31-39-00-D1-F2
>>>>>>>>>
>>>>>>>>> As you can see, neither machine uses password for authentication; it
>>>>>>>>> uses
>>>>>>>>> public/private key pairs. There is no problem (that I can see) for ssh
>>>>>>>>> invocation
>>>>>>>>> from one machine to the other. This is so because I have a copy of
>>>>>>>>> public
>>>>>>>>> key
>>>>>>>>> and a copy of private key on each instance.
>>>>>>>>>
>>>>>>>>> The app.ac file is identical, except the node names:
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ cat app.ac1
>>>>>>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 5
>>>>>>>>> -H domU-12-31-39-00-D1-F2 -np 1 Rscript /home/tsakai/fib.R 6
>>>>>>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 7
>>>>>>>>> -H domU-12-31-39-0C-C8-01 -np 1 Rscript /home/tsakai/fib.R 8
>>>>>>>>>
>>>>>>>>> Here¹s what happens with mpirun:
>>>>>>>>>
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$ mpirun -app app.ac1
>>>>>>>>> tsakai_at_domu-12-31-39-0c-c8-01's password:
>>>>>>>>> Permission denied, please try again.
>>>>>>>>> tsakai_at_domu-12-31-39-0c-c8-01's password: mpirun: killing job...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
> ----------------------------------------------------------------------->>>>>>>>
> -
>>>>>>>>> --
>>>>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>>>>>> process
>>>>>>>>> that caused that situation.
>>>>>>>>>
>>>>>>>>>
> ----------------------------------------------------------------------->>>>>>>>
> -
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> mpirun: clean termination accomplished
>>>>>>>>>
>>>>>>>>> [tsakai_at_domU-12-31-39-00-D1-F2 ~]$
>>>>>>>>>
>>>>>>>>> Mpirun (or somebody else?) asks me password, which I don¹t have.
>>>>>>>>> I end up typing control-C.
>>>>>>>>>
>>>>>>>>> Here¹s my question:
>>>>>>>>> How can I get past authentication by mpirun where there is no password?
>>>>>>>>>
>>>>>>>>> I would appreciate your help/insight greatly.
>>>>>>>>>
>>>>>>>>> Thank you.
>>>>>>>>>
>>>>>>>>> Tena Sakai
>>>>>>>>> tsakai_at_[hidden]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> <session4Reuti.text>_______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users