Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] This must be ssh problem, but I can't figure out what it is...
From: Gus Correa (gus_at_[hidden])
Date: 2011-02-15 20:09:38


Tena Sakai wrote:
> Hi,
>
> I am trying to reproduce what I was able to show last Friday on Amazon
> EC2 instances, but I am having a problem. What I was able to show last
> Friday as root was with this command:
> mpirun –app app.ac
> with app.ac being:
> -H dns-entry-A –np 1 (linux command)
> -H dns-entry-A –np 1 (linux command)
> -H dns-entry-B –np 1 (linux command)
> -H dns-entry-B –np 1 (linux command)
>
> Here’s the config file in root’s .ssh directory:
> Host *
> IdentityFile /root/.ssh/.derobee/.kagi
> IdentitiesOnly yes
> BatchMode yes
>
> Yesterday and today I can’t get this to work. I made the last part of
> app.ac
> file simpler (it now says /bin/hostname). Below is the session:
>
> -bash-3.2#
> -bash-3.2# # I am on instance A, host name for inst A is:
> -bash-3.2# hostname
> domU-12-31-39-09-CD-C2
> -bash-3.2#
> -bash-3.2# nslookup domU-12-31-39-09-CD-C2
> Server: 172.16.0.23
> Address: 172.16.0.23#53
>
> Non-authoritative answer:
> Name: domU-12-31-39-09-CD-C2.compute-1.internal
> Address: 10.210.210.48
>
> -bash-3.2# cd .ssh
> -bash-3.2#
> -bash-3.2# cat config
> Host *
> IdentityFile /root/.ssh/.derobee/.kagi
> IdentitiesOnly yes
> BatchMode yes
> -bash-3.2#
> -bash-3.2# ll config
> -rw-r--r-- 1 root root 103 Feb 15 17:18 config
> -bash-3.2#
> -bash-3.2# chmod 600 config
> -bash-3.2#
> -bash-3.2# # show I can go to inst B without password/passphrase
> -bash-3.2#
> -bash-3.2# ssh domU-12-31-39-09-E6-71.compute-1.internal
> Last login: Tue Feb 15 17:18:46 2011 from 10.210.210.48
> -bash-3.2#
> -bash-3.2# hostname
> domU-12-31-39-09-E6-71
> -bash-3.2#
> -bash-3.2# nslookup `hostname`
> Server: 172.16.0.23
> Address: 172.16.0.23#53
>
> Non-authoritative answer:
> Name: domU-12-31-39-09-E6-71.compute-1.internal
> Address: 10.210.233.123
>
> -bash-3.2# # and back to inst A is also no problem
> -bash-3.2#
> -bash-3.2# ssh domU-12-31-39-09-CD-C2.compute-1.internal
> Last login: Tue Feb 15 17:36:19 2011 from 63.193.205.1
> -bash-3.2#
> -bash-3.2# hostname
> domU-12-31-39-09-CD-C2
> -bash-3.2#
> -bash-3.2# # log out twice to go back to inst A
> -bash-3.2# exit
> logout
> Connection to domU-12-31-39-09-CD-C2.compute-1.internal closed.
> -bash-3.2#
> -bash-3.2# exit
> logout
> Connection to domU-12-31-39-09-E6-71.compute-1.internal closed.
> -bash-3.2#
> -bash-3.2# hostname
> domU-12-31-39-09-CD-C2
> -bash-3.2#
> -bash-3.2# cd ..
> -bash-3.2#
> -bash-3.2# pwd
> /root
> -bash-3.2#
> -bash-3.2# ll
> total 8
> -rw-r--r-- 1 root root 260 Feb 15 17:24 app.ac
> -rw-r--r-- 1 root root 130 Feb 15 17:34 app.ac2
> -bash-3.2#
> -bash-3.2# cat app.ac
> -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
> -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
> -H domU-12-31-39-09-E6-71.compute-1.internal -np 1 /bin/hostname
> -H domU-12-31-39-09-E6-71.compute-1.internal -np 1 /bin/hostname
> -bash-3.2#
> -bash-3.2# # when there is a remote machine (bottome 2 lines) it hangs
> -bash-3.2# mpirun -app app.ac
> mpirun: killing job...
>
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
> --------------------------------------------------------------------------
> domU-12-31-39-09-E6-71.compute-1.internal - daemon did not
> report back when launched
> -bash-3.2#
> -bash-3.2# cat app.ac2
> -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
> -H domU-12-31-39-09-CD-C2.compute-1.internal -np 1 /bin/hostname
> -bash-3.2#
> -bash-3.2# # when there is no remote machine, then mpirun works:
> -bash-3.2# mpirun -app app.ac2
> domU-12-31-39-09-CD-C2
> domU-12-31-39-09-CD-C2
> -bash-3.2#
> -bash-3.2# hostname
> domU-12-31-39-09-CD-C2
> -bash-3.2#
> -bash-3.2# # this gotta be ssh problem....
> -bash-3.2#
> -bash-3.2# # show no firewall is used
> -bash-3.2# iptables --list
> Chain INPUT (policy ACCEPT)
> target prot opt source destination
>
> Chain FORWARD (policy ACCEPT)
> target prot opt source destination
>
> Chain OUTPUT (policy ACCEPT)
> target prot opt source destination
> -bash-3.2#
> -bash-3.2# exit
> logout
> [tsakai_at_vixen ec2]$
>
> Would someone please point out what I am doing wrong?
>
> Thank you.
>
> Regards,
>
> Tena
>
Hi Tena

Nothing wrong that I can see.
Just another couple of suggestions,
based on somewhat vague possibilities.

A slight difference is that on vixen and dashen you ran the
MPI hostname tests as a regular user, not as root, right?
Not sure if this will make much of a difference,
but it may be worth trying to run it as a regular user in EC2 also.
I general most people avoid running user applications (MPI programs
included) as root.
Mostly for safety, but I wonder if there are any
implications in the 'rootly powers'
regarding the under-the-hood processes that OpenMPI
launches along with the actual user programs.

This may make no difference either,
but you could do a 'service iptables status',
to see if the service is running, even though there are
no explicit iptable rules (as per your email).
If the service is not running you get
'Firewall is stopped.' (in CentOS).
I *think* 'iptables --list' loads the iptables module into the
kernel, as a side effect, whereas the service command does not.
So, it may be cleaner (safer?) to use the service version
instead of 'iptables --list'.
I don't know if it will make any difference,
but just in case, if the service is running,
why not do 'service iptables stop',
and perhaps also 'chkconfig iptables off' to be completely
free of iptables?

Gus Correa