Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Private and public IP mixing.
From: .-=Kiwi=-. (heffeque_at_[hidden])
Date: 2011-10-05 12:24:44


The thing is that there's just one interface: eth0.

Computer 1 thinks that it has 212... but it actually has a 210 when accessed
from outside. There's no other interface to choose from, just the one that
thinks it's a 212, the eth0.

Or maybe I'm just not understanding correctly.

---

On Wed, Oct 5, 2011 at 6:13 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> Check out this FAQ entry:
>
>    http://www.open-mpi.org/faq/?category=tcp#tcp-selection
>
> Note that there are btl_tcp_if_include / btl_tcp_if_exclude: these control
> MPI-level communications.  There's also oob_tcp_if_include /
> oob_tcp_if_exclude (that take the same kinds of values as
> btl_tcp_if_include/exclude) that control OMPI's run-time environment
> communications.
>
>
> On Oct 5, 2011, at 12:01 PM, (.-=Kiwi=-.) wrote:
>
> > "OMPI always tries to use the lowest numbered address first - just a
> natural ordering."
> >
> > That doesn't seem to be the reason. We changed the private IPs to 212...
> (a higher number than the public 210... IPs) and still MPI tries to go to
> 212 afterwards.
> >
> > We're reading the oob_tcp and btl_tcp parameters but we're not sure how
> to do it.
> >
> > "But if hello world doesn't even run, then try running with "mpirun --mca
> oob_tcp_if_include <the interface(s) you want to use> ...", per Ralph's
> suggestion.  If *that* doesn't work, also add "--mca btl_tcp_if_include ..."
> as well."
> >
> > We tried doing from Computer 1:
> >
> > orterun -mca oob_tcp_debug 1 -np 1 -host 212...3 ifconfig
> >
> > and everything was ok
> >
> > We tried doing from Computer 1:
> >
> > orterun -mca oob_tcp_debug 1 -np 1 -host 210...101 ifconfig
> >
> > and it says:
> >
> > There are no allocated resources for the application
> >   ifconfig
> > that match the requested mapping:
> >
> >
> > Verify that you have mapped the allocated resources properly using the
> > --host or --hostfile specification.
> >
> --------------------------------------------------------------------------
> >
> --------------------------------------------------------------------------
> > A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
> > launch so we are aborting. [...]
> >
> > Any other ideas?
> >
> >
> > On Wed, Oct 5, 2011 at 1:54 AM, Ralph Castain <rhc.openmpi_at_[hidden]>
> wrote:
> > OMPI always tries to use the lowest numbered address first - just a
> natural ordering. You need to tell it to use just the public ones for this
> topology. Use the oob_tcp and btl_tcp parameters to do this. See "ompi_info
> --param oob tcp" and "ompi_info --param btl tcp" for the exact syntax.
> >
> >
> > Sent from my iPad
> >
> > On Oct 4, 2011, at 10:21 AM, "(.-=Kiwi=-.)" <heffeque_at_[hidden]> wrote:
> >
> >> We are constructing a set of computers with Open MPI and there's a small
> problem with mixing public and private IPs.
> >>
> >> We aren't sure about what's causing the problem or how to solve it.
> >>
> >> The files are shared thanks to NFS and we have a couple computers with
> private IPs and public IPs that we want them to send MPI work to some
> machines that have public IPs.
> >>
> >> I'm going to try to describe with example IPs.
> >>
> >> Computer 1 sees itself as eth0:  172...2  but has a public IP assigned:
>  210...2
> >> Computer 2 sees itself as eth0:  172...3  but has a public IP assigned:
>  210...3
> >> Computers outside the subnet directly have public IPs assigned:
>  210...100+
> >>
> >> The computers outside see Computer 1 and 2 only with 210... they can't
> see the 172... internal IPs.
> >>
> >> If an outside computer launches mpirun to Computer 1, it works ok.
> >> If Computer 1 tries to launch mpirun to Computer 2 (with 172...) it also
> works ok (not with 210... because they don't know that that's their public
> IP, but that's not an issue).
> >>
> >> The problem comes when Computer 1 or 2 try to launch mpirun to outside
> computers.
> >>
> >> We tried to check out what was happening and installed wireshark on an
> outside computer and it seems that the ssh part works ok (the ssh talk
> between 210...2 and 210...101 is ok), but after that the outside computer
> tries to send a TCP SYN package to 172...2 instead of 210...2 and the rest
> of the packets onward the same.
> >>
> >> Is there a way to solve this problem?
> >>
> >> I've read this (
> http://www.open-mpi.org/community/lists/users/2009/11/11184.php ) but I'm
> not really sure what he's doing there.
> >>
> >> We have the option of plugging Computer 1 and Computer 2 directly to the
> switch that the outside computers are on, but we'd rather not because we'd
> prefer the computers to stay on the private network, but if there's no other
> way, I guess we can.
> >>
> >> Can it be done without having to change the network topology?
> >>
> >> Thanks in advance.
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>