Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] This must be ssh problem, but I can't figure out what it is...
From: Tena Sakai (tsakai_at_[hidden])
Date: 2011-02-15 18:33:47


Hi,

I am trying to reproduce what I was able to show last Friday on Amazon
EC2 instances, but I am having a problem. What I was able to show last
Friday as root was with this command:
  mpirun –app app.ac
with app.ac being:
  -H dns-entry-A –np 1 (linux command)
  -H dns-entry-A –np 1 (linux command)
  -H dns-entry-B –np 1 (linux command)
  -H dns-entry-B –np 1 (linux command)

Here’s the config file in root’s .ssh directory:
  Host *
        IdentityFile /root/.ssh/.derobee/.kagi
        IdentitiesOnly yes
        BatchMode yes

Yesterday and today I can’t get this to work. I made the last part of app.ac
file simpler (it now says /bin/hostname). Below is the session:

  -bash-3.2#
  -bash-3.2# # I am on instance A, host name for inst A is:
  -bash-3.2# hostname
  domU-12-31-39-09-CD-C2
  -bash-3.2#
  -bash-3.2# nslookup domU-12-31-39-09-CD-C2
  Server: 172.16.0.23
  Address: 172.16.0.23#53

  Non-authoritative answer:
  Name: domU-12-31-39-09-CD-C2.compute-1.internal
  Address: 10.210.210.48

  -bash-3.2# cd .ssh
  -bash-3.2#
  -bash-3.2# cat config
  Host *
          IdentityFile /root/.ssh/.derobee/.kagi
          IdentitiesOnly yes
          BatchMode yes
  -bash-3.2#
  -bash-3.2# ll config
  -rw-r--r-- 1 root root 103 Feb 15 17:18 config
  -bash-3.2#
  -bash-3.2# chmod 600 config
  -bash-3.2#
  -bash-3.2# # show I can go to inst B without password/passphrase
  -bash-3.2#
  -bash-3.2# ssh domU-12-31-39-09-E6-71.compute-1.internal
  Last login: Tue Feb 15 17:18:46 2011 from 10.210.210.48
  -bash-3.2#
  -bash-3.2# hostname
  domU-12-31-39-09-E6-71
  -bash-3.2#
  -bash-3.2# nslookup `hostname`
  Server: 172.16.0.23
  Address: 172.16.0.23#53

  Non-authoritative answer:
  Name: domU-12-31-39-09-E6-71.compute-1.internal
  Address: 10.210.233.123

  -bash-3.2# # and back to inst A is also no problem
  -bash-3.2#
  -bash-3.2# ssh domU-12-31-39-09-CD-C2.compute-1.internal
  Last login: Tue Feb 15 17:36:19 2011 from 63.193.205.1
  -bash-3.2#
  -bash-3.2# hostname
  domU-12-31-39-09-CD-C2
  -bash-3.2#
  -bash-3.2# # log out twice to go back to inst A
  -bash-3.2# exit
  logout
  Connection to domU-12-31-39-09-CD-C2.compute-1.internal closed.
  -bash-3.2#
  -bash-3.2# exit
  logout
  Connection to domU-12-31-39-09-E6-71.compute-1.internal closed.
  -bash-3.2#
  -bash-3.2# hostname
  domU-12-31-39-09-CD-C2
  -bash-3.2#
  -bash-3.2# cd ..
  -bash-3.2#
  -bash-3.2# pwd
  /root
  -bash-3.2#
  -bash-3.2# ll
  total 8
  -rw-r--r-- 1 root root 260 Feb 15 17:24 app.ac
  -rw-r--r-- 1 root root 130 Feb 15 17:34 app.ac2
  -bash-3.2#
  -bash-3.2# cat app.ac
  -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
  -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
  -H domU-12-31-39-09-E6-71.compute-1.internal -np 1 /bin/hostname
  -H domU-12-31-39-09-E6-71.compute-1.internal -np 1 /bin/hostname
  -bash-3.2#
  -bash-3.2# # when there is a remote machine (bottome 2 lines) it hangs
  -bash-3.2# mpirun -app app.ac
  mpirun: killing job...

  --------------------------------------------------------------------------
  mpirun noticed that the job aborted, but has no info as to the process
  that caused that situation.
  --------------------------------------------------------------------------
  --------------------------------------------------------------------------
  mpirun was unable to cleanly terminate the daemons on the nodes shown
  below. Additional manual cleanup may be required - please refer to
  the "orte-clean" tool for assistance.
  --------------------------------------------------------------------------
        domU-12-31-39-09-E6-71.compute-1.internal - daemon did not report back when launched
  -bash-3.2#
  -bash-3.2# cat app.ac2
  -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
  -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
  -bash-3.2#
  -bash-3.2# # when there is no remote machine, then mpirun works:
  -bash-3.2# mpirun -app app.ac2
  domU-12-31-39-09-CD-C2
  domU-12-31-39-09-CD-C2
  -bash-3.2#
  -bash-3.2# hostname
  domU-12-31-39-09-CD-C2
  -bash-3.2#
  -bash-3.2# # this gotta be ssh problem....
  -bash-3.2#
  -bash-3.2# # show no firewall is used
  -bash-3.2# iptables --list
  Chain INPUT (policy ACCEPT)
   target prot opt source destination

  Chain FORWARD (policy ACCEPT)
  target prot opt source destination

  Chain OUTPUT (policy ACCEPT)
  target prot opt source destination
  -bash-3.2#
  -bash-3.2# exit
  logout
  [tsakai_at_vixen ec2]$

Would someone please point out what I am doing wrong?

Thank you.

Regards,

Tena