On Dec 4, 2013, at 7:25 AM, "Meredith, Karl" <karl.meredith_at_[hidden]> wrote:
> Before turning off my firewall, I have these rules
> $ )sudo ipfw list
> 05000 allow ip from any to any via lo*
This is an interesting rule. Perhaps you can try:
mpirun --mca oob_tcp_if_include lo0 ...
Which would force OMPI to use the loopback interface for TCP connections (it's normally excluded, because it's not viable for off-node communications). This would only be useful for single-node runs, of course.
> Our local IT expert believes that this problem is related to this bug from way back in openmpi 1.2.3, but it seems like the patch was never implemented:
No, I don't believe that's the issue. Here's why:
- OMPI currently ignores loopback interfaces by default. This is done because the norm is to have multi-server runs, and loopback interfaces are not useful for such runs. Put differently: OMPI defaults to using external IP interfaces.
- However, all your external IP interfaces are firewalled. So when OMPI tries to make a loopback connection on the external IP interfaces, it's blocked. Kaboom. But this makes it easy to understand why when you disable the firewall, it works.
- That bug report you cited (good research, BTW!) is because we had a problem in parsing the oob_tcp_if_include MCA parameter way back in the 1.2.x series, which has since been fixed. The user was trying to explicitly tell OMPI "use the lo0 interface" (i.e., override the default of *not* using the lo0 interface), and the bug prevented that from working. That bug has long since been fixed: you can override OMPI's default of not using lo0. You should then be able to run without disabling your firewall (that's what the mpirun syntax I cited above above is doing).
- As noted above, using lo0 for multi-server runs is a bad idea; it won't work (OMPI may get confused and think that it can use 127.0.0.0/8 to contact multiple servers, because by the netmask, it hypothetically can). But you can do it for runs limited to your local laptop with no problem.
- The real solution, as Ralph implied is to stop using external IP interfaces for single-server control messages (we talked about this off-list). Let me explain this statement a bit... OMPI has 2 main channels for communication: a) control messages and b) MPI traffic. MPI traffic is already smart enough to use shared memory for single-server MPI traffic and some form of network for off-server MPI traffic. The control message plane doesn't currently make that distinction -- it uses IP interfaces for *all* traffic (and defaults to not using loopback interfaces), regardless of destination. So the real solution is to make the control message plane a little smarter: put a named unix domain socket in the filesystem on the local server and let local control messages use that (instead of external IP addresses). FWIW, this is what LAM/MPI used to do; we just never adopted that into Open MPI (LAM/MPI was one of Open MPI's predecessors).
This feature may take a little time to implement, and may or may not make it into the v1.7.x series. But you should be able to use the oob_tcp_if_include MCA param in the meantime (see the FAQ for different ways to set MCA params; you can stick it in an environment variable or file instead of manually including it on the mpirun command line all the time, if that's more convenient).
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/