Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Private and public IP mixing.
From: .-=Kiwi=-. (heffeque_at_[hidden])
Date: 2011-10-10 12:14:20


I'm confused... my IPs right now are:

Computer 1 (192.168.31.2 internal / 210.1.1.39 external)
Computer 2 (192.168.31.3 internal / 210.1.1.40 external)
Computer 3 (210.1.1.137)

I want Computer 1 to launch mpirun and Computer 3 to do the task.

I tried both these commands first on Computer 1 and then also on Computer 3:

ompi_info --mca btl_tcp_if_include "210.0.0.0/8" --mca oob_tcp_if_include "
210.0.0.0/8" (didn't work, Computer 3 tries to answer to 192.168.31.2
instead of 210.1.1.39)
ompi_info --mca btl_tcp_if_include "210.1.1.0/8" --mca oob_tcp_if_include "
210.1.1.0/8" (the same, still answering to the wrong IP).

What am I doing wrong?

---

On Wed, Oct 5, 2011 at 8:08 PM, George Bosilca <bosilca_at_[hidden]> wrote:
> The real solution is to evict the private addresses from both levels (MPI
> and ORTE). However, based on the ordering of the interfaces, I guess you
> cannot do it by name (eth0 has a private address on one side but a public
> one on the other).
>
> No panic! There is support for this.
>
> Look at the output of "ompi_info --param btw tcp" attached below:
>
> >  MCA btl: parameter "btl_tcp_if_include" (current value: <none>, data
> >           source: default value)
> >           Comma-delimited list of devices or CIDR notation of networks
> >           to use for MPI communication (e.g., "eth0,eth1" or
> >           "192.168.0.0/16,10.1.4.0/24").  Mutually exclusive with
> >           btl_tcp_if_exclude.
> >  MCA btl: parameter "btl_tcp_if_exclude" (current value: <lo,sppp>, data
> >           source: default value)
> >           Comma-delimited list of devices or CIDR notation of networks
> >           to NOT use for MPI communication -- all devices not matching
> >           these specifications will be used (e.g., "eth0,eth1" or
> >           "192.168.0.0/16,10.1.4.0/24").  Mutually exclusive with
> >           btl_tcp_if_include.
>
> You can use the [btl|oob]_tcp_if_[include|exclude] either with names or
> with IP ranges. Add the following to your mpirun:
>
> --mca btl_tcp_if_include "210.0.0.0/8" --mca oob_tcp_if_include "
> 210.0.0.0/8"
>
> and everything should work in all cases.
>
>  george.
>
> On Oct 5, 2011, at 12:13 , Jeff Squyres wrote:
>
> > Check out this FAQ entry:
> >
> >    http://www.open-mpi.org/faq/?category=tcp#tcp-selection
> >
> > Note that there are btl_tcp_if_include / btl_tcp_if_exclude: these
> control MPI-level communications.  There's also oob_tcp_if_include /
> oob_tcp_if_exclude (that take the same kinds of values as
> btl_tcp_if_include/exclude) that control OMPI's run-time environment
> communications.
> >
> >
> > On Oct 5, 2011, at 12:01 PM, (.-=Kiwi=-.) wrote:
> >
> >> "OMPI always tries to use the lowest numbered address first - just a
> natural ordering."
> >>
> >> That doesn't seem to be the reason. We changed the private IPs to 212...
> (a higher number than the public 210... IPs) and still MPI tries to go to
> 212 afterwards.
> >>
> >> We're reading the oob_tcp and btl_tcp parameters but we're not sure how
> to do it.
> >>
> >> "But if hello world doesn't even run, then try running with "mpirun
> --mca oob_tcp_if_include <the interface(s) you want to use> ...", per
> Ralph's suggestion.  If *that* doesn't work, also add "--mca
> btl_tcp_if_include ..." as well."
> >>
> >> We tried doing from Computer 1:
> >>
> >> orterun -mca oob_tcp_debug 1 -np 1 -host 212...3 ifconfig
> >>
> >> and everything was ok
> >>
> >> We tried doing from Computer 1:
> >>
> >> orterun -mca oob_tcp_debug 1 -np 1 -host 210...101 ifconfig
> >>
> >> and it says:
> >>
> >> There are no allocated resources for the application
> >>  ifconfig
> >> that match the requested mapping:
> >>
> >>
> >> Verify that you have mapped the allocated resources properly using the
> >> --host or --hostfile specification.
> >>
> --------------------------------------------------------------------------
> >>
> --------------------------------------------------------------------------
> >> A daemon (pid unknown) died unexpectedly on signal 1  while attempting
> to
> >> launch so we are aborting. [...]
> >>
> >> Any other ideas?
> >>
> >>
> >> On Wed, Oct 5, 2011 at 1:54 AM, Ralph Castain <rhc.openmpi_at_[hidden]>
> wrote:
> >> OMPI always tries to use the lowest numbered address first - just a
> natural ordering. You need to tell it to use just the public ones for this
> topology. Use the oob_tcp and btl_tcp parameters to do this. See "ompi_info
> --param oob tcp" and "ompi_info --param btl tcp" for the exact syntax.
> >>
> >>
> >> Sent from my iPad
> >>
> >> On Oct 4, 2011, at 10:21 AM, "(.-=Kiwi=-.)" <heffeque_at_[hidden]> wrote:
> >>
> >>> We are constructing a set of computers with Open MPI and there's a
> small problem with mixing public and private IPs.
> >>>
> >>> We aren't sure about what's causing the problem or how to solve it.
> >>>
> >>> The files are shared thanks to NFS and we have a couple computers with
> private IPs and public IPs that we want them to send MPI work to some
> machines that have public IPs.
> >>>
> >>> I'm going to try to describe with example IPs.
> >>>
> >>> Computer 1 sees itself as eth0:  172...2  but has a public IP assigned:
>  210...2
> >>> Computer 2 sees itself as eth0:  172...3  but has a public IP assigned:
>  210...3
> >>> Computers outside the subnet directly have public IPs assigned:
>  210...100+
> >>>
> >>> The computers outside see Computer 1 and 2 only with 210... they can't
> see the 172... internal IPs.
> >>>
> >>> If an outside computer launches mpirun to Computer 1, it works ok.
> >>> If Computer 1 tries to launch mpirun to Computer 2 (with 172...) it
> also works ok (not with 210... because they don't know that that's their
> public IP, but that's not an issue).
> >>>
> >>> The problem comes when Computer 1 or 2 try to launch mpirun to outside
> computers.
> >>>
> >>> We tried to check out what was happening and installed wireshark on an
> outside computer and it seems that the ssh part works ok (the ssh talk
> between 210...2 and 210...101 is ok), but after that the outside computer
> tries to send a TCP SYN package to 172...2 instead of 210...2 and the rest
> of the packets onward the same.
> >>>
> >>> Is there a way to solve this problem?
> >>>
> >>> I've read this (
> http://www.open-mpi.org/community/lists/users/2009/11/11184.php ) but I'm
> not really sure what he's doing there.
> >>>
> >>> We have the option of plugging Computer 1 and Computer 2 directly to
> the switch that the outside computers are on, but we'd rather not because
> we'd prefer the computers to stay on the private network, but if there's no
> other way, I guess we can.
> >>>
> >>> Can it be done without having to change the network topology?
> >>>
> >>> Thanks in advance.
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>