Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How are IP addresses determined?
From: Barnet Wagman (bw_at_[hidden])
Date: 2011-02-17 13:05:05


Tena,

If I understand you correctly, the configuration you're trying to use is

    submission host[ec2 instance 0] <-> slave [ec2 instance 1]

I haven't tried this yet (although I will in the next few days).

I've tried

    (a) submission host[non-ec2 system with static IP, direct net
    connection] <-> slave [ec2 instance 1]
    (b) submission host[non-ec2 system with local static IP, connected
    to net via router] <-> slave [ec2 instance 1]

(a) works, (b) does not, presumably because opmpi does not support NAT
(see Jeff Squyres comments, later in the thread).

I notice that you're using the 'internal' uri to specify hostnames. This
makes sense in principle, but have you tried using the public/external
uri? Presumably opmpi has to lookup these hostnames. I don't know how
that's done, but trying to lookup the internal uri might be a problem.

If you try this (or anything else), I'd appreciate it if you'd post your
results.

bw

On 2/17/11 4:08 AM, Tena Sakai wrote:
> Hi Barnet,
>
> Allow me to interject.
> Are you saying that you run master on your local machine and launching
> openMPI process on EC2? You are saying that 1) tcp port
> tcp://192.168.1.101:35272 is on your local system and 2) the ec2
> instance is trying to connect your local machine’s port 35272 , and
> hanging. Is that correct?
>
> I have just a bit different situation. I am running 2 ec2 instances
> and trying to run mpirun on both instances. My ssh debug output looks
> quite similar to yours and mpirun behavior also very similar. Here’s
> what I captured:
> Sending command: orted --daemonize -mca ess env -mca orte_ess_jobid
> 1025769472 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri
> "1025769472.0;tcp://10.118.23.4:60941"
> And here’s what I did on the instance from which I issued mpirun:
> [tsakai_at_ip-10-118-23-4 ~]$ nslookup `hostname`
> Server: 172.16.0.23
> Address: 172.16.0.23#53
>
> Non-authoritative answer:
> Name: ip-10-118-23-4.ec2.internal
> Address: 10.118.23.4
>
> So that tcp port does belong to this instance. Furthermore, it cannot
> come into it. No router (which may perform address translation?) is
> involved and it appears the same thing as what you describe is
> happening. Incidentally, here’s how I ran mpirun:
> [tsakai_at_ip-10-118-23-4 ~]$ mpirun -app app.ac
> With app.ac file:
> [tsakai_at_ip-10-118-23-4 ~]$ cat app.ac
> -H ip-10-118-23-4.ec2.internal -np 1 /bin/hostname
> -H ip-10-118-23-4.ec2.internal -np 1 /bin/hostname
> -H ip-10-118-18-172.ec2.internal -np 1 /bin/hostname
> -H ip-10-118-18-172.ec2.internal -np 1 /bin/hostname
>
> The first two lines spawns /bin/hostname on this instance
> (ip-10-118-23-4.ec2.internal) and the bottom 2 lines on the remote
> instance.
> Here’s the security group used for these instances:
>
> connetion protocol from to source
> ------------- ----------- ------ ----- ------------
> *SSH *tcp 22 22 0.0.0.0/0
>
> Am I making sense?
>
> Regards,
>
> Tena
>
>
>
>
> On 2/16/11 8:56 PM, "Barnet Wagman" <bw_at_[hidden]> wrote:
>
> I've run into a problem involving accessing a remote host via a
> router and I think need to understand how opmpi determines ip
> addresses. If there's anything posted on this subject, please
> point me to it.
>
> Here's the problem:
>
> I've installed opmpi (1.4.3) on a remote system (an Amazon ec2
> instance). If the local system I'm working on has a static ip
> address (and a direct connection to the internet), there's no
> problem. But if the local system accesses the internet through a
> router (which itself gets it's ip via dhcp), a call to runmpi
> command hangs.
>
> This is not firewall problem - I've disabled the firewalls on all
> the system that are involved (and the router).
>
> It is also not an ssh problem. The ssh connection is being made
> and it appears that the application has been launched on the
> remote system. After the runmpi command has been launched
> locally, a ps on the remote system shows a process
>
>
> orted --daemonize -mca ess env -mca orte_ess_jobid 1187643392
> -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri
> 1187643392.0;tcp://192.168.1.101:35272
>
>
>
> While I don't really understand the orted process, I assume this
> indicates that a command to execute an app has been received and
> that opmpi is trying to run it.
>
> I suspect that the problem is related to the '--hnp-uri ...
> tcp://192.168.1.101' argument. 192.168.1.101 is the address of my
> local system on my local network (attached to the router), which
> of course is not accessible over the net. It appears that opmpi
> is transmitting the local (static) ip address to the remote host.
>
> It would help to know how opmpi determines and distributes IP
> addresses. And if there's any way to control this.
>
> Any thoughts on dealing with this would be greatly appreciated.
>
> Thanks,
>
> bw
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users