Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Private and public IP mixing.
From: George Bosilca (bosilca_at_[hidden])
Date: 2011-10-10 22:36:55


The current version of Open MPI doesn't handle such situations. You either have to configure your NAT differently or try to get your hands on one of the NAT-aware versions as described here http://www-lipn.univ-paris13.fr/~coti/QosCosGrid/qcgompi.php.

  george.

On Oct 10, 2011, at 12:14 , (.-=Kiwi=-.) wrote:

> I'm confused... my IPs right now are:
>
> Computer 1 (192.168.31.2 internal / 210.1.1.39 external)
> Computer 2 (192.168.31.3 internal / 210.1.1.40 external)
> Computer 3 (210.1.1.137)
>
> I want Computer 1 to launch mpirun and Computer 3 to do the task.
>
> I tried both these commands first on Computer 1 and then also on Computer 3:
>
> ompi_info --mca btl_tcp_if_include "210.0.0.0/8" --mca oob_tcp_if_include "210.0.0.0/8" (didn't work, Computer 3 tries to answer to 192.168.31.2 instead of 210.1.1.39)
> ompi_info --mca btl_tcp_if_include "210.1.1.0/8" --mca oob_tcp_if_include "210.1.1.0/8" (the same, still answering to the wrong IP).
>
> What am I doing wrong?
>
> ---
> 
>
>
> On Wed, Oct 5, 2011 at 8:08 PM, George Bosilca <bosilca_at_[hidden]> wrote:
> The real solution is to evict the private addresses from both levels (MPI and ORTE). However, based on the ordering of the interfaces, I guess you cannot do it by name (eth0 has a private address on one side but a public one on the other).
>
> No panic! There is support for this.
>
> Look at the output of "ompi_info --param btw tcp" attached below:
>
> > MCA btl: parameter "btl_tcp_if_include" (current value: <none>, data
> > source: default value)
> > Comma-delimited list of devices or CIDR notation of networks
> > to use for MPI communication (e.g., "eth0,eth1" or
> > "192.168.0.0/16,10.1.4.0/24"). Mutually exclusive with
> > btl_tcp_if_exclude.
> > MCA btl: parameter "btl_tcp_if_exclude" (current value: <lo,sppp>, data
> > source: default value)
> > Comma-delimited list of devices or CIDR notation of networks
> > to NOT use for MPI communication -- all devices not matching
> > these specifications will be used (e.g., "eth0,eth1" or
> > "192.168.0.0/16,10.1.4.0/24"). Mutually exclusive with
> > btl_tcp_if_include.
>
> You can use the [btl|oob]_tcp_if_[include|exclude] either with names or with IP ranges. Add the following to your mpirun:
>
> --mca btl_tcp_if_include "210.0.0.0/8" --mca oob_tcp_if_include "210.0.0.0/8"
>
> and everything should work in all cases.
>
> george.
>
> On Oct 5, 2011, at 12:13 , Jeff Squyres wrote:
>
> > Check out this FAQ entry:
> >
> > http://www.open-mpi.org/faq/?category=tcp#tcp-selection
> >
> > Note that there are btl_tcp_if_include / btl_tcp_if_exclude: these control MPI-level communications. There's also oob_tcp_if_include / oob_tcp_if_exclude (that take the same kinds of values as btl_tcp_if_include/exclude) that control OMPI's run-time environment communications.
> >
> >
> > On Oct 5, 2011, at 12:01 PM, (.-=Kiwi=-.) wrote:
> >
> >> "OMPI always tries to use the lowest numbered address first - just a natural ordering."
> >>
> >> That doesn't seem to be the reason. We changed the private IPs to 212... (a higher number than the public 210... IPs) and still MPI tries to go to 212 afterwards.
> >>
> >> We're reading the oob_tcp and btl_tcp parameters but we're not sure how to do it.
> >>
> >> "But if hello world doesn't even run, then try running with "mpirun --mca oob_tcp_if_include <the interface(s) you want to use> ...", per Ralph's suggestion. If *that* doesn't work, also add "--mca btl_tcp_if_include ..." as well."
> >>
> >> We tried doing from Computer 1:
> >>
> >> orterun -mca oob_tcp_debug 1 -np 1 -host 212...3 ifconfig
> >>
> >> and everything was ok
> >>
> >> We tried doing from Computer 1:
> >>
> >> orterun -mca oob_tcp_debug 1 -np 1 -host 210...101 ifconfig
> >>
> >> and it says:
> >>
> >> There are no allocated resources for the application
> >> ifconfig
> >> that match the requested mapping:
> >>
> >>
> >> Verify that you have mapped the allocated resources properly using the
> >> --host or --hostfile specification.
> >> --------------------------------------------------------------------------
> >> --------------------------------------------------------------------------
> >> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
> >> launch so we are aborting. [...]
> >>
> >> Any other ideas?
> >>
> >>
> >> On Wed, Oct 5, 2011 at 1:54 AM, Ralph Castain <rhc.openmpi_at_[hidden]> wrote:
> >> OMPI always tries to use the lowest numbered address first - just a natural ordering. You need to tell it to use just the public ones for this topology. Use the oob_tcp and btl_tcp parameters to do this. See "ompi_info --param oob tcp" and "ompi_info --param btl tcp" for the exact syntax.
> >>
> >>
> >> Sent from my iPad
> >>
> >> On Oct 4, 2011, at 10:21 AM, "(.-=Kiwi=-.)" <heffeque_at_[hidden]> wrote:
> >>
> >>> We are constructing a set of computers with Open MPI and there's a small problem with mixing public and private IPs.
> >>>
> >>> We aren't sure about what's causing the problem or how to solve it.
> >>>
> >>> The files are shared thanks to NFS and we have a couple computers with private IPs and public IPs that we want them to send MPI work to some machines that have public IPs.
> >>>
> >>> I'm going to try to describe with example IPs.
> >>>
> >>> Computer 1 sees itself as eth0: 172...2 but has a public IP assigned: 210...2
> >>> Computer 2 sees itself as eth0: 172...3 but has a public IP assigned: 210...3
> >>> Computers outside the subnet directly have public IPs assigned: 210...100+
> >>>
> >>> The computers outside see Computer 1 and 2 only with 210... they can't see the 172... internal IPs.
> >>>
> >>> If an outside computer launches mpirun to Computer 1, it works ok.
> >>> If Computer 1 tries to launch mpirun to Computer 2 (with 172...) it also works ok (not with 210... because they don't know that that's their public IP, but that's not an issue).
> >>>
> >>> The problem comes when Computer 1 or 2 try to launch mpirun to outside computers.
> >>>
> >>> We tried to check out what was happening and installed wireshark on an outside computer and it seems that the ssh part works ok (the ssh talk between 210...2 and 210...101 is ok), but after that the outside computer tries to send a TCP SYN package to 172...2 instead of 210...2 and the rest of the packets onward the same.
> >>>
> >>> Is there a way to solve this problem?
> >>>
> >>> I've read this ( http://www.open-mpi.org/community/lists/users/2009/11/11184.php ) but I'm not really sure what he's doing there.
> >>>
> >>> We have the option of plugging Computer 1 and Computer 2 directly to the switch that the outside computers are on, but we'd rather not because we'd prefer the computers to stay on the private network, but if there's no other way, I guess we can.
> >>>
> >>> Can it be done without having to change the network topology?
> >>>
> >>> Thanks in advance.
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users